Code Coverage by itself is a hard metric to use because it can be gamed, and so it will suffer more from Goodhart’s Law, which is summarized as “When a measure becomes a target, it ceases to be a good measure.” Goodhart’s Law observes that if you put pressure on people to hit a target, they will, but maybe not in the way you wanted.
And this would happen with code coverage because we can always increase coverage with either useless tests, tests of trivial functions, or tests of less valuable code.
I use these metrics in combination with coverage to make it harder to game:
- Code Complexity: The simplest way to do this is to count the branches in a function. I use extensions in my code editor to help bring complex code to my attention. If coverage of the function is also low, I know that I can make the code less risky to change if I test it (or refactor it).
- Usage analytics: If you tag your user analytics with the folder that the code generating it is in, you can later build reports that you can tie back to your coverage reports. See Use Heatmaps for iOS Beta Test Coverage. In that post, I used it to direct manual testing, but it would work for code coverage as well.
- Recency of the code: To make sure that my PRs have high coverage, I use diff_cover. This makes it more likely that my tests are finding bugs in code that is going to be QA’d soon and has already been deemed valuable to write. Very old code is more likely to be working fine, so adding tests to it might not be worth it. If you find a bug in old code worth fixing, it will generate a PR (and become recent code).
- Mutations: I am still trying to find a good tool for this, but this lets you test the quality of your assertions in addition to your coverage. I do it manually now.
Generally, the way to make a metric harder to game is to combine it with a metric that would be worse if it was gamed in ways you can predict (or have seen).