Metrics that Resist Gaming

The other day I described CRAP, which is a metric that flags code that is risky to change. I suggested that if you need to change that code, you start with tests and refactoring.

Tests and refactoring are positive to the codebase and improve the readability of the PR they are in, which is why I like CRAP as a metric—it’s hard to game.

In contrast, Story points (and velocity based on it) are the exact opposite.

Story points are completely made up numbers that have low accountability. Velocity is just points-over-time, so it’s also made up. If a team must improve velocity (for their manager), the easiest thing to do is to over-estimate the points per task and like magic, velocity can meet any target. I don’t think engineers would do this consciously, but this is just a known phenomenon of metrics (see Goodhart’s Law).

This is one of the reasons I don’t use Story Points. But to be honest, almost any estimation technique is ripe for gaming (ala Scotty from Star Trek).

When I wrote about DevEx, a new developer productivity methodology, I wrote that I thought that they “do help engineers deliver software better and faster”, but that they are most useful to the team itself (not stakeholders).

Looking over that article I realized that the thing I like about these metrics is that they are hard to game. If I get several 3-hour blocks of uninterrupted coding time per week, then I am sure I can write more and better code than if I didn’t. Counting lines of code (and judging its quality) is fraught, but hours of uninterrupted time is easy to count and more is better.

If you are worried that coders will shirk other duties (like code reviews or attending meetings), there is another metric to measure feedback loops, which is in tension to the flow metric.

My main critique of DevEx still stands—it’s not something to report outside of the team. But the more I think of it, the more I like it and will try implementing it on my (1-person) team.