Spec-Driven Development: How AGENTS.md Makes AI Coding Reliable
More than 60,000 repositories now ship AGENTS.md files. The pattern has a name — spec-driven development — and it has measurable results: 35 to 55 percent fewer AI-generated bugs at GitHub, a 100 percent pass rate at Vercel, and a cross-tool standard supported by Codex CLI, Claude Code, Gemini CLI, and Cursor. Here is the six-element framework, the evidence behind it, and how to implement it this week.
The Problem Is Not the Code — It Is the Context
Eighty-four percent of developers now use AI coding tools. Thirty percent or more of the code shipped at Google and Microsoft is AI-generated. The adoption curve has outpaced the methodology. Teams paste prompts into agents, accept the output, and discover later that 45 percent of AI-generated code contains security vulnerabilities — 72 percent in Java. High-risk vulnerabilities in AI-generated code are up 36 percent year over year. Forty-three to forty-five percent of AI-generated changes break in production.
The numbers tell a specific story: the AI is generating code faster than teams can specify what they actually need. The code is not broken because the model is insufficiently capable. It is broken because the model was given insufficient context. Vibe coding — describing what you want in conversational prose and accepting whatever comes back — produces code that works for the happy path and collapses at the edges.
Spec-driven development is the methodology that closes this gap. It replaces unstructured prompting with a structured context file — AGENTS.md — that gives AI coding agents the same information a senior engineer would provide to a new team member: the role, the stack, the commands, the boundaries, the environment, and the conventions. The pattern is now supported across four major toolchains, governed by a Linux Foundation working group, and backed by production case studies. Here is how it works.
The Six-Element Framework
The AGENTS.md specification converges on six sections. Not all are required for every project, but the most effective AGENTS.md files include all six. The structure comes from production implementations at GitHub, Vercel, and Augment Code, and it matches the canonical guidance published by the agents.md open-source project.
| Element | Purpose | Example |
|---|---|---|
| 1. Agent Role | Define the persona and principles the agent should follow | "You are a senior backend engineer who prioritizes readability over cleverness" |
| 2. Tech Stack | Specify exact frameworks, languages, and versions | Python 3.12, FastAPI 0.115, PostgreSQL 16, SQLAlchemy 2.0 |
| 3. Key Commands | Provide exact shell commands with flags for common tasks | "Run tests: pytest -xvs --tb=short" |
| 4. Boundaries | Define Always / Ask First / Never rules | "Never commit .env files. Always run the full test suite before marking complete. Ask before modifying migration files." |
| 5. Environment Facts | Document file structure, config locations, and runtime context | "Config lives in .env.local for dev, env vars in production. Tests use Docker Postgres on port 5433." |
| 6. Coding Conventions | State style preferences, naming patterns, and architectural decisions | "Use dependency injection, snake_case for files, PascalCase for classes. Prefer composition over inheritance." |
The critical design principle: write only what the agent cannot infer from the codebase itself. A 200-line AGENTS.md that restates what is already in README.md, package.json, and tsconfig.json adds noise that degrades performance. An ETH Zurich study published in February 2026 found that poorly curated AGENTS.md files — auto-generated or exhaustive ones that duplicate existing documentation — inflated inference costs by up to 159 percent and reduced task success rates. The best practice is 150 lines or fewer, plain Markdown only, containing exclusively non-obvious information.
The Evidence
Vercel: 100 Percent Pass Rate with Passive Context
Vercel embedded a compressed 8-kilobyte documentation index for Next.js 16 APIs directly into their AGENTS.md file. Rather than using a retrieval-augmented generation system that dynamically fetched relevant documentation, Vercel made the key framework knowledge part of the agent's always-available context window. The result: a 100 percent pass rate on their agent evaluation benchmark, compared to a 79 percent maximum for skills-based retrieval systems.
The architectural insight is counterintuitive. For framework-specific knowledge, passive context — information that is always in the agent's window — outperforms active retrieval — fetching relevant docs on demand. The agent does not need to guess what to look for because the specification already told it what matters.
GitHub Internal: 35 to 55 Percent Fewer Bugs
GitHub's internal testing found that well-crafted AGENTS.md files reduced AI-generated bugs by 35 to 55 percent. The reduction varied by codebase and spec quality. The range is wide because the variance in AGENTS.md quality is also wide. A minimal file with agent role and boundaries removes the most common failure modes. A comprehensive file with all six elements removes significantly more.
The Linux Foundation Standardization
In December 2025, the Linux Foundation established the Agentic AI Foundation (AAIF) with 170-plus member organizations including Anthropic, OpenAI, Google, Microsoft, and AWS. The foundation consolidated three converging standards: the MCP protocol with 10,000-plus servers, the AGENTS.md specification with 60,000-plus repositories, and the Goose framework. The goal is to prevent vendor lock-in and fragmentation — the same forces that fragmented .cursorrules, CLAUDE.md, and GEMINI.md into incompatible silos before convergence.
How Each Toolchain Implements the Pattern
The good news: the pattern is cross-tool. The better news: the implementations differ enough that understanding the differences matters.
| Tool | File Name | Discovery | Notable Feature |
|---|---|---|---|
| OpenAI Codex CLI | AGENTS.md | Hierarchical: ~/.codex/ → repo root → CWD | Named agents: --agents review loads AGENTS.review.md |
| Anthropic Claude Code | CLAUDE.md | Recursive upward from CWD | Auto-memory; merges context from all parent directories |
| Google Gemini CLI | GEMINI.md | Global → workspace → just-in-time | Content concatenation with scope-aware overrides |
| Cursor | .cursor/rules/*.mdc | YAML frontmatter + glob scoping | Legacy .cursorrules still supported but deprecated |
The convergence is happening. A proposed .agents/rules/ directory format with YAML frontmatter and glob-based scope control is under discussion at the agents.md repository (issue 179). It would unify the current fragmentation into a single canonical location. Until it lands, writing an AGENTS.md file that follows the six-element framework and keeping tool-specific files as thin aliases is the pragmatic approach.
Implementation: Building an AGENTS.md This Week
Step 1: Start with Agent Role and Boundaries
These two sections deliver the highest return on investment. The agent role eliminates ambiguous prompts. The boundaries section — specifically the "Never" category — prevents the most expensive failures: committing secrets, modifying critical infrastructure files, or pushing directly to production. Define your Always / Ask First / Never rules based on the mistakes your team has actually made with AI-generated code, not hypothetical ones.
Step 2: Add Tech Stack and Key Commands
Specify exact versions. "Python" is not a tech stack. "Python 3.12, FastAPI 0.115, PostgreSQL 16" is. Provide the exact commands you run in development — with flags. pytest is not a key command. pytest -xvs --tb=short is. The precision eliminates the most common class of AI-generated errors: using outdated APIs, wrong import paths, or incorrect tool configurations for the project's version.
Step 3: Document Environment Facts
Write down what a new team member would ask in their first week: where the config lives, which port the dev database runs on, what environment variables are required, how to set up the local environment from scratch. This is the information the agent cannot infer from reading the codebase because it lives in local setup guides, Slack threads, and tribal knowledge.
Step 4: Add Coding Conventions
State the decisions your team has already made — not aspirational style guides, but the patterns the codebase actually uses. If the project uses dependency injection, say so. If classes are PascalCase and files are snake_case, say so. If every API endpoint has a corresponding test file, say so. The agent will follow the majority pattern in the existing code anyway, but explicit conventions break ties in mixed-style codebases and prevent the model from inventing its own patterns.
Step 5: Keep It Under 150 Lines
The ETH Zurich study is the cautionary tale: verbose AGENTS.md files increased inference costs and reduced task success. Write only what the agent cannot infer. Do not copy your README, your CONTRIBUTING.md, or your full style guide into the file. Link to them instead. The AGENTS.md should contain non-obvious, project-specific, action-oriented instructions — not documentation that already exists elsewhere.
The Honest Assessment
Spec-driven development does not make AI-generated code perfect. It makes it predictable. The bug reduction range — 35 to 55 percent — leaves a substantial residual. The 100 percent Vercel pass rate was on a framework-specific evaluation, not an open-ended codebase. Qred Bank's compliance workflow success required ongoing maintenance of their AGENTS.md files as regulatory requirements changed.
The approach also introduces a maintenance burden. Outdated AGENTS.md files that reference deprecated frameworks or obsolete commands are worse than no file at all, because the agent will trust the spec even when the codebase has moved on. The spec must be a living document, updated when the tech stack changes, when new boundaries are discovered through failures, and when conventions evolve.
The cross-tool convergence is real but incomplete. Writing a single AGENTS.md that works across Codex CLI, Claude Code, Gemini CLI, and Cursor is possible today for the six core elements. But the advanced features — named agents in Codex, auto-memory in Claude, glob-scoped rules in Cursor — require tool-specific configuration that fragments the standard. The proposed .agents/rules/ unification format would resolve this, but it is still under discussion.
Finally, the ETH Zurich caveat deserves emphasis: garbage in, garbage out. An AGENTS.md that restates the obvious, duplicates existing documentation, or includes contradictory instructions will degrade agent performance. The power of the pattern is in the curation — writing what the agent needs to know and nothing else.
Actionable Takeaways
- Start with Agent Role and Boundaries this week. These two sections require no research and prevent the most common failure modes. Write five to ten "Never" rules based on incidents your team has actually experienced. This alone will reduce AI-generated bugs by a meaningful margin.
- Specify your tech stack with exact versions. "Node" is not a tech stack. "Node 22.3, Express 5.0, PostgreSQL 16, Redis 7.2" is. The precision eliminates an entire category of wrong-API errors that compound through generated code.
- Write your key commands with full flags. AI agents default to bare commands without the flags your team uses. If you run
pytest -xvs --tb=short, put that exact command in the AGENTS.md, not justpytest. - Cap your AGENTS.md at 150 lines. If it is longer, you are restating information the agent can infer or find elsewhere. Link to additional documentation rather than inlining it. The ETH Zurich study quantified the cost of verbosity: up to 159 percent higher inference costs with worse outcomes.
- Keep it current. Treat AGENTS.md as a living document. When you upgrade a framework version, update the tech stack section. When an AI coding incident reveals a new boundary, add it to the Never or Ask First list. Stale specs produce worse results than no spec.
- Adopt the six-element framework as a cross-tool standard. Agent role, tech stack, key commands, boundaries, environment facts, and coding conventions are tool-agnostic. Write them once in AGENTS.md and let the tool-specific files thin-link or alias to it. When the proposed
.agents/rules/format lands, you will be ready to migrate with minimal effort.
Spec-driven development is not a theoretical framework. It is a production-validated pattern that 60,000 repositories have already adopted, that the Linux Foundation is standardizing, and that produces measurable improvements in AI-generated code quality. The gap between what AI can generate and what teams can review is widening. The AGENTS.md file is the bridge. It does not require new tooling, new pipelines, or new processes — it requires fifteen minutes of writing the context a senior engineer would give a new team member on day one. That is the entire methodology. The six elements, the evidence, and the maintenance discipline to keep it accurate. Start this week.