Your AI Agent Writes 70% of Your Code. Why Is Nothing Shipping Faster?
Your AI Agent Writes 70% of Your Code. Why Is Nothing Shipping Faster?
Some teams in 2026 measure "agent contribution percentage" — what fraction of committed code was written by autonomous AI agents. Some are at 65–80%. The number is real. It's also irrelevant.
The majority of teams that adopted AI coding tools saw little to no increase in overall throughput. Code volume increased. Validated, working, maintainable features in production: roughly flat.
The metric is measuring the wrong stage of the pipeline, and the confusion it creates costs more than the time it was supposed to save.
What Throughput Actually Is
Software throughput is not code committed. It's working functionality in production that solves the problem it was built to solve.
The chain from "code committed" to "working in production" involves review, testing, integration, QA, deployment, verification, and user validation. AI has not materially improved any of these stages. It accelerated one upstream stage — code writing — without accelerating the downstream stages that determine actual throughput.
This is not surprising. Bottleneck theory is fifty years old. Accelerating a non-bottleneck stage increases inventory at the next bottleneck; it doesn't improve overall throughput. You just get a larger queue.
The bottleneck in software development is not code writing. It never was. The bottleneck is validation — the human process of determining whether what was built is correct, complete, and consistent with the system it's entering.
The Queue That Built Up
When AI code generation accelerates, the review queue grows. Developers now spend 11.4 hours per week reviewing AI-generated code versus 9.8 hours writing code — a reversal from 2024.
The senior engineers who can do meaningful review — the ones who understand the system architecture, the security model, the implicit contracts between modules — are the same engineers who were already the bottleneck before AI entered the picture.
AI gave them more to review. It didn't give them more capacity to review it.
The team at 80% agent contribution is producing significantly more code per engineer than a team writing everything by hand. It's also creating significantly more review work for the same senior engineer capacity. The queue grows. The throughput stays flat.
The Real Cost of AI-Generated Code
There's a specific tax on reviewing AI-generated code that doesn't apply to human-written code: the reviewer has to assume bad intent by default.
AI-generated code contains 2.74 times more vulnerabilities than human-written code. When a human engineer writes code, the reviewer can make assumptions about what the author was trying to accomplish and check against that. When AI generates code, the reviewer has to verify not just correctness but intent — does this code actually do what the task requires, or does it do something plausible-looking that has edge case failures?
That verification work doesn't scale with code volume. It scales with reviewer judgment. And reviewer judgment is the most constrained resource on the team.
Why the Metric Persists
"Agent contribution percentage" is easy to measure and makes the productivity case visible to stakeholders.
"We're using AI tools" is hard to defend to a budget committee. "80% of our code is agent-generated, up from 20% last year" sounds like progress. It has a number. The number is going in the right direction.
The metrics that actually matter — review latency, defect escape rate, time-to-production for a new feature from initial work start — are harder to attribute to any single intervention and harder to make look obviously good.
So the proxy metric wins. This is not unusual. Organizations optimize for what's measurable, and what's measurable is often not what matters.
The Measurement That Would Work
The metric worth tracking is throughput per engineer per unit of time, inclusive of all pipeline stages: from initial task to confirmed working in production.
Not code generated. Not PRs merged. Feature validated and live — from when work started to when it was confirmed working.
Teams that track this tend to find that AI coding tools improve the metric less than expected, because improvements in generation time are absorbed by increases in review time, correction time, and integration time.
Some tasks see real improvement: well-defined, bounded implementation work against clear interfaces. The improvement is real and worth capturing.
Other tasks see no improvement or regression: complex cross-system changes, security-sensitive code, anything requiring architectural judgment. The AI contribution adds review cost without reducing design cost.
The useful question is not "what's our agent contribution percentage?" It's "for which categories of work is agent generation net positive end-to-end?" That answer is more useful and significantly smaller than 80%.
The Right Frame
AI coding agents are good at writing code within understood patterns. They're not good at determining what code should be written, validating whether it's correct in system context, or maintaining structural coherence over time.
Teams treating agent contribution as a throughput metric are measuring the thing AI is good at as a proxy for the thing that actually matters. The proxy and the real metric have drifted apart as AI adoption increased.
Throughput is still determined by the validation bottleneck. It was before agents. It is after agents.
The number that tells you something about your team's actual productivity is the one nobody is reporting.
