AI Bug Hunting Delivers: 271 Firefox Flaws and the Open-Source Reckoning
Mozilla used Anthropic's Mythos Preview to find and fix 271 security vulnerabilities in Firefox 150 before release — a 12x improvement over the previous model's 22 bugs. Firefox CTO Bobby Holley declared that defenders have "rounded the curve." But the first production proof of AI-driven vulnerability discovery raises a harder question: what happens to the open-source projects that cannot afford to put their software through the same bootcamp?
The 12x Jump
On April 15, 2026, Anthropic announced Claude Mythos Preview as part of Project Glasswing — a coordinated effort with Amazon, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, Microsoft, NVIDIA, and the Linux Foundation to secure critical software before the model's capabilities proliferate. The company described Mythos as a general-purpose model with "striking" cybersecurity capabilities, restricting access to a limited group of partners while committing up to $100 million in usage credits.
Six days later, Mozilla provided the most concrete evidence yet that the capabilities are real. Firefox CTO Bobby Holley wrote in a blog post that Mythos Preview identified 271 security vulnerabilities in Firefox 150 by analyzing the unreleased source code. By comparison, Anthropic's Opus 4.6 model found only 22 security-sensitive bugs when analyzing Firefox 148 the previous month. That is not an incremental improvement. It is a 12x capability increase in a single model generation.
What makes the number significant is not just the raw count. It is what the vulnerabilities represent. Holley wrote that the bugs Mythos found could each have been discovered through either automated fuzzing or by an elite security researcher reasoning through the browser's complex source code. But where fuzzing requires months of continuous automated testing and an elite researcher requires months of focused human effort per bug, Mythos found 271 in a single pass through the codebase.
"Computers were completely incapable of doing this a few months ago, and now they excel at it," Holley wrote. "We have many years of experience picking apart the work of the world's best security researchers, and Mythos Preview is every bit as capable."
What the Vulnerabilities Actually Looked Like
Neither Mozilla nor Anthropic has published the full list of 271 Firefox vulnerabilities, and for good reason: the coordinated disclosure process requires giving vendors time to patch before details go public. But Anthropic's own assessment of Mythos Preview, published alongside the Glasswing announcement, provides categories that map directly to what was found in Firefox and other codebases.
Mythos Preview wrote a web browser exploit that chained together four separate vulnerabilities, constructing a complex JIT heap spray that escaped both the renderer and OS sandboxes. It autonomously obtained local privilege escalation on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypass techniques. It wrote a remote code execution exploit on FreeBSD's NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain across multiple packets. On the Firefox JavaScript engine specifically — the same engine Mozilla tested — Opus 4.6 built working exploits two times out of several hundred attempts. Mythos succeeded 181 times.
The Firefox 150 fixes were shipped before the vulnerabilities became publicly known. That is the ideal outcome. It is also an outcome that required Mozilla to have early access to Mythos, dedicated security engineers to triage and fix 271 findings, and the engineering capacity to coordinate a release around them — resources that most software projects do not have.
The Bootcamp Every Software Must Now Run
Holley's most consequential statement was not about Firefox. It was about the transition every piece of software is about to go through.
"Every piece of software is going to have to make this transition," he said in an interview with Wired, "because every piece of software has a lot of bugs buried underneath the surface that are now discoverable. This is a transitory moment that is difficult and requires coordinated focus and a lot of grit to get through, but I think that it is a finite moment."
The word "finite" is doing important work. Holley is arguing that the flood of discoverable vulnerabilities is a one-time event — a bootcamp that exhausts the latent bug population in a codebase. Once the bugs that have survived years of human review and automated fuzzing are found and fixed, subsequent models will find progressively fewer new issues. The curve rounds.
That framing is plausible for a well-resourced project like Firefox. Mozilla had direct access to Mythos through collaboration with Anthropic. They could triage 271 findings with dedicated security engineers. They could ship patches in a coordinated Firefox release. Most software projects cannot do any of those things.
| Resource | Mozilla | Typical Open-Source Project |
|---|---|---|
| Mythos access | Direct collaboration with Anthropic | Not available (Glasswing is invite-only) |
| Security engineers | Dedicated full-time team | Maintainer working evenings |
| Triage throughput | 271 findings in weeks | Triage bottleneck blocks fixes |
| Release process | Coordinated browser release cycle | Ad hoc maintainer releases |
| Funding | $600M+ annual revenue | Maintainer working for free |
The gap is not theoretical. Mozilla CTO Raffi Krikorian made this explicit in a New York Times essay published the same week: "The programmer who gave 20 years of his life to maintain code that runs inside products used by billions of people? He doesn't have access to Mythos yet. He should."
The Open-Source Maintainer Problem
The structural problem is that open-source software has a dual vulnerability. Its codebases are public, which means AI systems can scan them without restriction. Its maintenance is often underfunded or volunteer-driven, which means the projects lack the engineering capacity to triage and fix what the AI finds. The worst-case scenario is not that attackers discover vulnerabilities before defenders. It is that the vulnerabilities are discoverable by both sides simultaneously, and only one side has the capacity to act.
Linux kernel maintainer Greg Kroah-Hartman described the shift in blunt terms at a recent security conference. "Months ago, we were getting what we called 'AI slop' — AI-generated security reports that were obviously wrong or low quality. It was kind of funny. It didn't really worry us. Something happened a month ago, and the world switched. Now we have real reports. All open source projects have real reports that are made with AI, but they're good, and they're real."
Daniel Stenberg, creator and maintainer of curl (used by billions of devices worldwide), reported that the challenge had already shifted from AI-generated noise to a flood of valid reports. "Less slop but lots of reports. Many of them really good. I'm spending hours per day on this now. It's intense."
This is the asymmetry that the Firefox 271 bugs reveal. The discovery side has been compressed from months to hours. The remediation side remains at human speed. And the projects most exposed — the open-source libraries that run the internet's infrastructure — are the ones least equipped to absorb the volume.
Holley acknowledged this directly: "I've talked to engineering leaders at very large companies who are saying that they're going to be pulling thousands of engineers off of everything to be working on this for the next six months. So it is going to be a big challenge for industry, and the concern is for smaller projects and open source. It's difficult for these maintainers to not only have the wherewithal and the access to be able to use these tools, but also to actually do anything with them."
The Reproduction Question
One week before Mozilla's announcement, Vidoc Security Labs published a replication study that tested whether public models could achieve results comparable to Mythos. Using GPT-5.4 and Claude Opus 4.6 in opencode, an open-source coding agent, with a standardized security-review workflow, Vidoc reproduced Mythos's findings on FreeBSD and Botan in three out of three runs with both models. Claude Opus 4.6 also reproduced the OpenBSD case in three out of three runs (GPT-5.4 went zero for three). On FFmpeg and wolfSSL, both models achieved only partial results.
Vidoc's conclusion is nuanced. The capabilities Anthropic gates are already available in public models, so defenders should prepare for that reality. But Vidoc tested reproduction of known findings — the model was told where to look. Discovery of unknown vulnerabilities in unknown locations is a different problem, and that is where Mythos appears to hold its lead. Public models pointed at patched code regularly hallucinate vulnerabilities that do not exist.
The practical implication is that two distinct workflows are now viable. Organizations with Mythos-level access can run discovery: pointing the model at a codebase with no prior knowledge and receiving a prioritized list of real vulnerabilities. Organizations without that access can run triage validation: using public models to verify and classify known vulnerability patterns at $30 or less per file. The gap between discovery and validation is where the access gate currently sits.
Honest Assessment
The Firefox 271 bugs are a milestone, but they raise three concerns that most coverage of the announcement has missed.
First, the bootcamp assumption depends on software being static. Firefox — like any actively developed codebase — is not static. Every new feature, every refactor, every dependency update introduces new surface area. The "latent bug population" is not a finite reservoir that can be drained once and sealed. It is replenished with every release cycle. The bootcamp is not a one-time event; it is a continuous obligation, and the cost of sustaining it falls disproportionately on the maintainers who can least afford it.
Second, the access gap between Mythos and public models is closing in weeks, not years. Vidoc reproduced three out of five of Mythos's public benchmark cases with GPT-5.4 — a model that anyone with an API key can use today. The Vidoc study was published on April 14. Mozilla's results were announced on April 21. That is a seven-day gap between "public models can already do part of this" and "the most resourced defender on earth just proved it works end-to-end." The next generation of public models — whatever ships after GPT-5.4 and Opus 4.6 — will close the remaining gap.
Third, the Firefox case study is the best-case scenario. Mozilla has a dedicated security team, coordinated release cycles, direct Anthropic access, and hundreds of millions in annual revenue. The median open-source project has one maintainer, no security team, no AI access, and no release discipline. The 271 bugs that Mozilla fixed before release are the same 271 bugs that will be found in other projects after release — by whoever gets there first.
Actionable Takeaways
- Run public-model vulnerability scanning against your codebases today. Vidoc's replication study demonstrated that GPT-5.4 and Opus 4.6 can find real vulnerabilities at under $30 per file when pointed at the right code segments. Use opencode or the Clearwing community project as scaffolding. You do not need Mythos to start; you need a repeatable workflow that batches your codebase into reviewable segments and routes the output through a human triage process.
- Establish a triage SLA for AI-reported findings. TheMozilla model — receive findings, triage, and ship patches in weeks — works because Mozilla had pre-allocated security engineering capacity. Define a triage window for your project: how quickly will you evaluate a model-reported vulnerability? Who is responsible? What is the escalation path? Without a defined process, the volume of valid findings will overwhelm the available response capacity.
- Demand structured access for open-source maintainers. Project Glasswing committed $4 million to open-source security organizations but does not include most individual maintainers. If you maintain widely-used open-source software, petition for access through organizations like the Open Source Security Foundation or the Linux Foundation. If you work at a Glasswing partner company, advocate for maintainer access programs within your organization.
- Distinguish validation from discovery. Public models can validate known vulnerability patterns today. They cannot reliably discover unknown vulnerabilities in unknown locations. Invest in the discovery workflow first — either by securing access to frontier models through structured programs or by building internal agentic scanning pipelines that compensate for individual model limitations through breadth and repetition.
- Treat the bootcamp as continuous, not finite. Every release cycle introduces new surface area. Build automated vulnerability scanning into your CI pipeline rather than treating it as a one-time audit. The organizations that round the curve are not the ones that run a single scan; they are the ones that scan continuously, triage systematically, and treat every new commit as a potential vulnerability introduction.
- Allocate the same urgency to AI-reported findings as to researcher-reported findings. The instinct to deprioritize automated reports is understandable given the history of low-quality AI-generated submissions. That instinct is now counterproductive. Kroah-Hartman and Stenberg have both confirmed that the quality of AI vulnerability reports has crossed a meaningful threshold. Treat each valid finding with the same severity classification and patch timeline you would apply to a human researcher report.
Mozilla fixed 271 vulnerabilities in Firefox before any attacker could exploit them. That is the success story the AI security industry needs, and it is real. But the success depended on resources, access, and engineering discipline that most software projects cannot currently marshal. The Firefox case proves that AI bug hunting works at production scale. The question is whether it works for the projects that need it most — and the answer, right now, is that it does not. The tools exist. The access does not. The organizations that change this equation fastest — through structured maintainer programs, public-model scanning workflows, and systematic triage investment — will be the ones that actually round the curve.