Most engineering metrics are vanity metrics. They feel productive to track. They look good in board presentations. And they tell you almost nothing about whether your engineering organization is actually healthy or effective.
I've seen teams obsess over lines of code, story points completed, and pull request counts while missing that their deployment pipeline was broken for weeks. I've watched organizations celebrate velocity improvements while customer satisfaction cratered because they were shipping the wrong things faster.
The good news: a decade of research has identified which metrics actually predict team health and business outcomes. The challenge is implementing them without falling into the traps that make measurement counterproductive.
The DORA Framework: Start Here
If you're going to measure anything, start with DORA metrics. Ten years of research across thousands of organizations has validated that these metrics correlate with both team performance and business outcomes.
The original four metrics are:
Deployment frequency: How often you ship code to production. Higher is generally better because it indicates your ability to deliver value incrementally and respond to feedback quickly.
Lead time for changes: The time from code commit to running in production. Shorter lead times mean faster feedback loops and less work-in-progress inventory.
Change failure rate: The percentage of deployments that cause failures requiring rollback or remediation. Lower is obviously better, but zero is suspicious because it might mean you're not shipping enough.
Mean time to recover (MTTR): How quickly you restore service after an incident. Fast recovery matters more than preventing all failures, which is impossible anyway.
DORA added a fifth metric in 2024: reliability, measuring how well your service meets its availability targets.
The key insight from DORA's research: speed and stability are not trade-offs. Top performers do well across all five metrics, and low performers struggle across all five. Source: DORA 2024 State of DevOps Report
Why These Metrics Work
DORA metrics work because they measure outcomes, not activities. They don't care how many story points you completed or how many lines of code you wrote. They care whether you're delivering working software to users effectively.
They're also resistant to gaming. If you artificially inflate deployment frequency by shipping tiny changes, your lead time suffers. If you rush changes to improve lead time, your failure rate increases. The metrics balance each other.
Organizations that get metrics right see real results: 3-12% efficiency gains, 14% increases in R&D focus, and 15% improvements in developer engagement across 360+ organizations. Source: McKinsey Developer Productivity Research
The Vanity Metrics Trap
Now let's talk about what not to measure, or at least not to optimize for directly.
Lines of code
The classic bad metric. More code isn't better. Often less code is better. A developer who deletes a thousand lines of unnecessary complexity has probably created more value than one who wrote a thousand new lines.
Story points completed
Story points were designed for relative estimation within a team, not cross-team comparison or productivity measurement. Once you start treating velocity as a metric, teams inflate estimates to look more productive. The number goes up; actual output doesn't.
Pull request count
More PRs isn't necessarily better. A developer who ships one thoughtful PR might deliver more value than one who ships ten small ones that should have been batched together.
Hours worked
This should be obvious, but apparently isn't. Time spent is not value delivered. Teams that work unsustainable hours burn out and make more mistakes. They don't ship better software.
Meeting attendance
Being in meetings isn't work. Often the opposite: excessive meetings prevent work. If you're measuring meeting participation, you're measuring the wrong thing.
Metrics That Complement DORA
DORA metrics are the foundation, but they don't tell the complete story. Here are additional metrics worth considering:
Rework rate
Change failure rate captures the big failures that require rollbacks. Rework rate captures the smaller failures: bugs found in review, defects caught in QA, issues reported post-launch that require fixes. If both metrics hold steady while you ship more often, your process is working.
Cycle time breakdown
Lead time is useful, but knowing where time goes within the delivery process is more actionable. How long does code sit in review? How long does QA take? Where are the bottlenecks? This breakdown tells you what to fix.
Developer experience metrics
Survey your engineers regularly about friction points, tool satisfaction, and perceived productivity. These subjective measures often surface problems before they show up in delivery metrics.
90% of organizations now have platform engineering capabilities, with a direct correlation between platform quality and organizational performance. Source: DORA 2024 State of DevOps Report
If your developers are fighting their tools, that will show up somewhere.
Business outcome metrics
Engineering metrics should connect to business outcomes. Are the features you're shipping moving the metrics that matter to the business? High deployment frequency means nothing if you're deploying features nobody uses.
How to Implement Measurement Without Destroying Your Culture
Metrics go wrong when they become targets rather than indicators. Goodhart's Law states it clearly: when a measure becomes a target, it ceases to be a good measure. Here's how to avoid that trap.
Measure teams, not individuals
DORA metrics reflect team capabilities, not individual productivity. Comparing individuals using these metrics creates perverse incentives and destroys collaboration. Focus on team performance and compare current performance to historical trends, not to other teams.
Never set metrics as goals directly
Telling teams "you must deploy multiple times per day by year end" is the fastest way to game behavior. Instead, use metrics to identify problems and track whether interventions work. The goal is better outcomes, not better numbers.
Look for patterns, not snapshots
A single week's deployment frequency tells you almost nothing. Trends over months tell you whether things are improving or degrading. Resist the urge to react to short-term fluctuations.
Combine quantitative and qualitative data
Numbers without context are dangerous. If deployment frequency dropped, is that because the team was doing necessary maintenance, or because the pipeline broke? Metrics tell you something changed; they don't tell you why or whether it matters.
Be transparent about what you're measuring and why
Secret measurement breeds paranoia. If you're tracking metrics, tell your team what they are and what you're trying to learn. Involve them in interpreting the data. They often have context you don't.
The AI Complication
AI adoption improves throughput but often increases delivery instability. 90% of developers now use AI tools at work, yet organizational delivery metrics often stay flat or get worse. Source: DORA 2025 State of AI-assisted Software Development
Why? AI tools increase PR size significantly. More code means more review burden, more potential bugs, and longer time to merge. Teams using AI need to pay extra attention to change failure rate and rework rate to ensure that more output doesn't mean more problems.
The research also shows that working in small batches amplifies AI's positive effects on performance. If your team is using AI to generate larger chunks of code, consider whether smaller, more frequent changes would work better.
Getting Started
If you're not measuring anything today, start simple:
- Pick one DORA metric to track. Deployment frequency is usually the easiest to measure and has the most obvious improvement paths.
- Establish a baseline over 4-6 weeks before trying to improve anything. You need to know where you are before you can know if you're getting better.
- Identify one bottleneck that's hurting the metric. Long code reviews? Slow CI pipeline? Manual deployment process?
- Run one experiment to address that bottleneck. Measure whether it helped.
- Add another metric once the first one is stable. Build your measurement practice gradually.
Don't try to implement a comprehensive metrics dashboard on day one. That path leads to measurement theater: impressive-looking dashboards that nobody uses to make decisions.
The goal isn't to measure more. It's to measure enough to make better decisions and no more than that. Every metric you add has a cost: time to collect, time to review, risk of misinterpretation. Keep your metrics practice lean and focused on what actually helps you improve.
Key Takeaways
- Start with DORA metrics: deployment frequency, lead time, change failure rate, MTTR, and reliability. They're validated and balanced.
- Avoid vanity metrics like lines of code, story points, and PR counts. They don't predict outcomes and are easily gamed.
- Measure teams, not individuals. Compare against historical trends, not other teams. Never set metrics as direct goals.
- Start simple with one metric, establish a baseline, and add complexity only when you need it.
Frequently Asked Questions
Want help building a measurement practice that drives real improvement?
Let's Talk →