← Blog
ai

Agent Contribution Percentage Is the Wrong Metric

·4 min read

Agent Contribution Percentage Is the Wrong Metric

Some engineering teams are now tracking what share of their codebase comes from AI agents versus human engineers. SiliconANGLE reported on enterprise teams measuring this alongside traditional productivity signals, with some already at 65–80% agent-generated code.

This is a mistake. Not because AI-generated code is bad. Because the metric measures the wrong thing.

What it actually measures

Agent contribution percentage measures production volume attributed to a non-human source. What it doesn't measure: whether the human layer understands what was produced, can maintain it, or can reason about failure modes that weren't in the prompt.

My practice when building Ordia: I write the interfaces, the structural decisions, the data flow. AI fills in the interior of patterns I've already modeled. Agent contribution to interior implementations might be high. Agent contribution to structural decisions: zero, by design.

If I reversed this — let the agent define structure and interfaces — I'd have a codebase with high agent contribution percentage and no mental model. The percentage would look good. The system would be unmaintainable.

Production metrics and understanding metrics aren't correlated. The dashboard doesn't know that.

How the metric shapes behavior

Metrics create incentives. A team measured on agent contribution percentage will increase that number. They'll accept outputs they half-understand. They'll skip structural review because the agent is reliable and the pressure is real. The percentage goes up.

This is the same failure mode as measuring lines of code shipped. The industry spent decades moving away from that metric because it incentivizes producing code, not solving problems. Agent contribution percentage is the same mistake with different syntax.

The enterprise backlash described in recent reporting follows directly from this. Teams optimized for the metric. Code volume went up. Quality and maintainability degraded. Someone had to own the gap between what was shipped and what anyone understood.

What the metric should capture instead

The useful question isn't what percentage of code the agent wrote. It's whether engineers can explain what the code does, why key decisions were made, and where it will fail under unusual conditions.

None of those are capturable in a percentage. They require human engagement with the system — not reviewing AI output for syntax errors, but building the structural model that makes the code a thing someone can reason about.

A proxy that would be more useful: the ratio of human-authored interface definitions to agent-authored implementations. Interfaces are load-bearing. Implementations are replaceable. A high ratio of human-designed interfaces means the structural decisions — the ones that accumulate cost when they're wrong — are being made by humans. A low ratio means the agents are making those decisions too, and nobody is tracking whether anyone understands the results.

What this means for engineering management

The marginal cost of producing code has collapsed. This is real. A working implementation of a well-specified function now costs roughly nothing in terms of human time. This should change how teams allocate engineering attention — not toward producing more code, but toward the judgment work that remains expensive.

Judgment work: system design, interface definition, tradeoff reasoning across time horizons, failure mode analysis. These don't get cheaper with AI. They get more important as the volume of AI-generated code to reason about increases.

A team optimizing for agent contribution percentage is allocating human attention in exactly the wrong direction. It's turning engineers into output raters rather than system architects.

The broader calibration problem

There's a pattern here: adopting AI tooling produces early wins on easy tasks, generating enthusiasm that gets applied uniformly to harder tasks where the early wins don't transfer.

Agent contribution percentage is a metric calibrated on the early-win behavior. It measures that the tools are being used. It says nothing about whether they're being used in the contexts where they help versus the contexts where they create problems.

METR's study on experienced developers found 19% slower on complex, novel tasks despite expecting 24% faster. The expectation gap is calibration. The same calibration error produces metrics that reward the wrong behavior.

The metric that would actually matter

The push for enterprise order and control that follows the backlash is the market correcting for this. When the percentage went up and quality went sideways, the constraint became the engineers who understood the code.

That's the metric that should have been tracked from the beginning: how many engineers can fully explain what a given subsystem does and why. That metric doesn't scale upward under AI adoption pressure. That's exactly why it's the right one.

Agent contribution percentage will keep going up. The teams that don't confuse it for a success metric will be better positioned when the debt comes due.