Category Archives: Code Reviews

Articles about code reviews

Question on r/ExperiencedDevs: Getting Code Reviewed Faster

I saw this question on the ExperiencedDevs subreddit today: My colleague’s code gets reviewed no questions asked. Getting mine reviewed takes a couple of nudges. How can I improve the situation? The answers on this subreddit are usually helpful, and I saw one that I resonated with me:

Personally quick PRs, sub 15 minutes total review time, I consider to be a mini-break from whatever feature I’m working on over the span of multiple days. If your PRs are 1000s of lines long and require an entirely different headspace it may cause delays that way.

This is what I was getting at in PR Authors Have a lot of Control on PR Idle Time where I told a story about the analysis a colleague at Atlassian did on our PR data. He found that short PR’s had less idle time.

On my team, where short PRs were highly encouraged, most reviews were done in a couple of hours, with 24 hours being the absolute max. Any longer than that would often be resolved by a live 1:1 review because it meant that the code was too complex to be reviewed asynchronously.

The thread on r/ExperiencedDevs has some more advice on solutions that try to resolve the social aspects, which are helpful, but I do think PRs that are easier to review will sit for less time. If your code is inherently complex, there is still a lot you can do to make the review easy (see A Good Pull Request Convinces You That it is Correct).

A Good Pull Request Convinces You That it is Correct

My onboarding peer-mentor at Trello described a good pull request as telling a story. In practice this meant that you would edit and order the commits after you were done so that the reviewer could go commit-by-commit and understand the change you made in steps.

So, for example, let’s say you were working on a feature. In your second commit, you introduce a bug, but then in your fifth commit, you find and fix that bug. Instead of just PR’ing that, you would edit the commits so that the bug was never introduced.

This is analogous to sending a document with superfluous text deleted, not crossed out. If you don’t edit the commits, you will waste the reviewers time because they might see the error in the second commit, make a comment, and then have to go back and amend their comment in the fifth. If you did this a lot, they might not even finish reviews before rejecting them (which is what I suggest you do to PRs with obvious problems).

I like the story frame, but I have started to think of a PR as more of an argument of its own correctness. I am trying to teach the reviewer the details of the problem and convince them through evidence that the new code is correct.

In a PR, I might start by introducing a unit test into the area you intend to change. Then, to make things clearer, I might commit a small refactoring (that isolates the change). It’s now possible to add more tests and possibly a failing one that shows what my intended fix will address. My small code clean-up commits are in service of that argument. Without them, it’s hard to tell if my fix won’t break something else. With them, the fix feels like a natural and inevitable conclusion.

Like a philosophical argument, I will anticipate and address the cases the reviewer might think of. But it’s not enough to handle a case in the code, your whole PR needs to make it clear that you anticipated and addressed it (with tests, comments, screenshots or any other evidence you can think of).

But the most important reviewer to convince is myself, of course, and doing the work to write the argument gives me confidence that my code is correct.

How to Find Bugs in a Code Review

I have written that the primary reason to review code is knowledge sharing, not finding bugs, and that the way to use code reviews to prevent bugs is to reject PRs that aren’t obviously correct. But what should you do if you really want to try to find bugs during review?

Run the code—preferably in a debugger from a test.

The best tip I got from Steve Maguire’s Writing Solid Code [affiliate link] was to step through my code in a debugger as I was writing it. The original version of this book (1993) came out before xUnit based unit-testing took off, so I don’t remember him mentioning them. But, I think a unit-test is a better way to do this.

If you don’t have a test that puts you at the line of code you want to inspect, then driving the code manually while stepping through the code in a debugger is a good-enough option. Of course, not having appropriate tests is probably a good enough reason to send the PR back, but if that’s not your team’s culture, and you want to find the problems manually during review, at least use a debugger.

It’s more likely you’ll see problems if you are actively engaging with running code. You will be anticipating what is going to happen as you step through, and if the code behaves differently, you’ll notice. You’ll either find a problem or learn something.

How to use Code Reviews to Prevent Bugs from Getting Into Production

When code has a bug in it, that is way more of an author problem than a reviewer problem. I don’t think it’s right to expect reviewers to find bugs (or to rely on it). It’s way harder to do that unless you actually run the code.

So, I would suggest that to fix the problem of buggy code making it into production, you do more things at the author level, since the author should be running the code a lot through tests and in the debugger. If you have a code reviews checklist, give it to the author, who should be reviewing their own code before PR’ing it.

Then, the reviewer should be verifying that there is evidence that the author actually did thorough work. The review checklist should be verifying the author provided evidence that the author did the pre-PR checklist. Check the tests, check their coverage, check UI screenshots, check that code is doing “something” with edgecases and that those are tested. Instead of looking for bugs, the reviewer should be learning the code and looking for higher level issues.

The author should be providing code that is correct and easy to see that it’s correct. If the reviewer doesn’t think it’s obviously correct, they should reject the PR. The presence of such a review will make the author do more thorough work.

If you want a code review to go fast, do it yourself

I am the only developer in my codebase right now, so I do all of my own code reviews. I post a PR, and then go review it. It goes quickly—I rarely have to wait.

But this is not what I’m talking about.

If you post a review, and the reviewer finds a lot of problems, that’s not because they are good reviewer. If QA finds obvious bugs, we know it’s because the work is bad. This is the same thing. Code reviewers and QA should not be finding obvious problems because there shouldn’t be any.

One reason this happens is that we feel pressured to get work done quickly, and so we post PRs that are not finished. The code can pass a simple review, but perhaps it doesn’t deal with edge cases or does things in a sloppy way with no unit-tests or is convoluted in a way that works, but is hard to read. Maybe the code is AI generated and PR’d after a cursory check.

The problem in each of these cases is not that AI generated code or that code was convoluted or that tests weren’t there. It’s that the author didn’t review their own code and fix these issues before opening the PR.

So, the author got their task done quickly, but now the code review eats up all of that saved time. It also obscures the problems that could have been found if the reviewer hadn’t gotten tired of making obvious comments. The author ends up spending more time, the reviewer is wasting all of their time, and all of this causes a big gap between calendar time and implementation time, which is a driver in incorrect estimates and low developer productivity.

The low productivity problem is insidious. It feels like we’re coding as fast as we can, but the unnecessary wait time builds up and makes projects late. Those wait times are rarely taken into account in an estimate—nor should they be. Instead of accounting for them, you should be eliminating them.

So, don’t post PRs of unfinished work that is going to cause the PR to be sent back (or worse, cause the ticket to be reopened). The best way to do this is to do your own code review first, fix all of the problems you find, fix up the commits, and then post the PR for a real review. By that time, the code should be correct and easy to see that it’s correct. That means:

You ran all of the automated checkers/formatters already and they all pass (maybe even ones that are not in your CI).
You tested your changes (see diff-cover to filter code coverage reports to your branch’s changes)
You tested edge cases
If you got to a solution with convoluted code (which is totally fine), you did a refactoring pass to make it easier to understand
If you generated code with AI, you did your own editing pass on it to fix obvious problems (like distracting comments).
You did an AI assisted code review of your own code and thought about each issue critically. Collect examples of your past reviews that got comments that should have been avoided and use them to prompt your AI reviews (to avoid repeating mistakes).
The PR commits are created carefully to be easy to review one-at-a-time. So, whatever problems you found are not added on commits, but squashed to make it look like the code was written correctly to begin with. Don’t make a reviewer read code in one commit that is fixed or removed in a subsequent one.
If you know that there are things that are unfinished in this PR, you explained that in your PR description and when they will be done.

Do this for your own enjoyment at work. When you find your own problems, it makes you feel good about yourself and boosts your self-esteem. When someone else points out your flaws, it doesn’t.

Use AI to Code Review Yourself, Not Others

In What to do about excessive (good) PR comments I wrote that a PR with a lot of good, actionable comments might be a sign that the author didn’t do enough beforehand to post an acceptable PR. It depends on many factors. For example, a new hire or junior developer might still need training, and so you would expect PRs to need some work,

But, there are some things that should never make it to review. We have linters and automatic code formatters, so we should never be correcting indentation in a review. If you use Rust, you are never finding a dangling pointer bug. There are automated tools for finding accessibility issues and security issues—which aren’t perfect, but they are probably better (and definitely faster) than most humans. All of these tools are somewhat mechanical, though, built on parser technology. We can do “better” with AI.

AI writes code good enough to be a starting point, and it can also review code. Since I think that the point of code reviews is information sharing, we wouldn’t want to replace them with AI. But AI is the perfect thing to use on yourself to make sure you don’t open PRs with obvious problems that slow down the review.

Why even do code reviews?

For most of my career, code reviews were rare. They weren’t directly supported by the most common source controls systems I used (RCS, SourceSafe, CVS, SVN, and early TFS) until git. Branching was hard to impossible, so we just all committed to the main branch. As a lead, I would read a lot of diffs after the fact, but not all of them, and I rarely sent comments.

I did, however, adopt unit tests and CI relatively early—in 2000, right after I read eXtreme Programming Explained [affiliate link]. So every change was unit tested soon after it was committed. We didn’t do code reviews, but we did a reasonable amount of pair programming, which had similar effects. Frankly, even after adopting code reviews for every PR a decade later (and pairing less), I can’t say that software I worked on had noticeably fewer bugs. Seems about the same.

What code reviews did do was make sure that more people than the author knew about the changes that were happening. Code reviews were far more efficient than pairing, and could be done asynchronously. I did code reviews nearly every day I worked on Trello, and I never worked on a team codebase I felt more generally comfortable with.

I think the defect-finding part of code reviews can be done by AI or other automated tools better than most humans—certainly faster. It’s not easy to see bugs just by reading code unless it’s egregious, and the automated tools are great at finding the egregious errors. Type systems and unit tests are also better than most programmers.

But, the information sharing part—that’s the real reason I think code reviews are important. I recently wrote that legacy code is code we didn’t write or review, so we shouldn’t let automation take over this part yet.

How to make it harder to game PR metrics

This is part of a series on shortening PR code review time. In the first post, I recommended gathering metrics (see If code reviews take too long, do this first). If you base a goal on these metrics, it’s possible that that metric will be gamed. This is not because of malice—it’s a well-known phenomenon (Goodhart’s Law), which just recognizes that it’s human behavior to make a metric improve if leadership says that it has to.

So, if you set a goal to lower the total time from PR creation to merge, then that time will probably shorten. You are hoping that we do this the right way by reducing bad comments and pre-checking PRs. But it’s inevitable that some improvement in this metric will be caused by behavior that you don’t want. For example, some bugs that could have been caught in review will make it to QA and production. This is the tradeoff if you want faster reviews—slow reviews are expensive too, and they are only worth it for expensive defects. Even if we agree that it’s cheaper to find problems in review than QA or production, there is still a cost where we’re fine with it being found in later (or never). But, if expensive issues are being found in production, it’s worth checking if that‘s because reviews are not thorough enough.

One way to make metrics less likely to be gamed is to pair them with a compensating metric that measures the effect of the gaming. So, in a call center, you might measure the time spent on a call, which you would want to decrease. To make sure that customer problems are still being solved, you might pair that with customer satisfaction scores or the number of times a customer has to call back. You want the pair of metrics you choose to both improve, not for one to improve at the expense of the other.

This is why I recommend pairing testing code coverage with some other metric that makes sure you get coverage of the most important or most risky code. For code review time, you might pair that with the DORA metric of “Change Fail Percentage” (percent of deploys with a bug requiring a hotfix or rollback). The pairing of metrics should imply a set of behaviors that can improve them both.

You could also go a step further and track the behaviors (or lead indicators) and not just the lagging indicators. A good lead indicator is a behavior, not an outcome, so it’s harder to game. For example, to make it more likely that a PR has a low idle time and is easier to review, it should be small. We don’t want to get into line counts, so just a high-level indicator of “excellent”, “acceptable”, and “unacceptable” PR size that is based on a fuzzy combination of the number of commits, changeset size, and maximum commit size could be enough. Just giving this indicator back to the author before they PR (with a local script) would encourage them break down their PR’s. If leads see undesirable behavior sneaking through, they could augment the script or override its assessment and reject the PR anyway.

Paired, complementary metrics are harder to game because the combination limits behaviors to those that can improve both metrics. A leading indicator is harder to game because it’s happening in real-time and you can react to it. So use these techniques to make sure your metrics are showing improvement in the underlying thing you care about.

How long PR code reviews lower developer productivity

I’ve been writing about code reviews for the past week because I think that shortening review cycles is one of the easiest ways to increase developer productivity. But I’ve taken that as a given so far because I’ve seen it on my own teams. A few years ago, I saw that the relationship between productivity and things like code review time has been supported by research, and I continue to see it in my consulting practice.

At Trello, I worked on the iOS app. We were a team of ten engineers who worked closely with the ten engineers building the Android app. We shared managers, PMs, designers, roadmaps, and thought of ourselves as the mobile team. We had shared processes, but since we were somewhat insulated from the other teams, with less intermingling, our processes were different from the rest of Trello.

At some point, a (non-mobile) engineer decided to gather metrics across all teams to see what he could learn about code reviews (see If code reviews take too long, do this first). He plotted various statistics and presented them for discussion. In slide after slide, the two mobile teams were outliers on nearly every statistic. The four metrics that stood out were: the size of the PRs, the number of PRs, PR idle time, and the total time to approval. Both mobile teams had the smallest and most PRs (normalized for team size). We also had the shortest times, which I think is a consequence of the first two metrics.

I can tell you from being on the mobile team in that time, that this was intentional and built into the mobile team culture. I was trained on our PR culture on my first day, when the entire mobile team was about five engineers, and we kept passing that on to new team members (see Construct PRs to Make Reviewing Easy). The short feedback loops were enabled by our careful PR construction and pre-checks, which I documented in PR Authors Have a lot of Control on PR Idle Time. When we dug deeper into the code review data, we found that the most senior iOS engineer and the most senior Android engineer were modeling that behavior (short and frequent PRs), which set a tone for the teams.

The shorter loops also contributed to my personal happiness at work and feeling of productivity. It was only after I left the team that I saw DevEx’s identification of feedback loops as a key driver of productivity. In the three years since I left, I’ve been seeing code review feedback loops as the easiest loop to shorten. It also has a large and immediate impact because it affects every engineer nearly every day.

According to DevEx, short feedback loops help by reducing context switching and enabling rapid course correction. I have seen that they also align estimated time with calendar time, which is the main source of the discrepancy in the feeling of productivity between engineering and everyone else (see Confession: I Estimate Using Time).

Some of the effect of shortening loops improves productivity without requiring us to “work faster”, but its effect on context switching is different. Reducing context switching does make us work faster. If we have to wait for days for a PR to be approved, then it takes longer to get started on the next step. We might have had to work on something unrelated in the meantime. That switch back to this PR (which might require review responses) causes a delay whose length is determined by the length of the waiting time. Having shorter code review loops means that we probably still have enough of the context in our heads and can respond faster.

So, reducing the wait time for a code review to start (PR idle time) and the total time it takes to approve a PR both reduces calendar time and context switches, which in turn reduces the time it takes to respond to a code review and to move on when the PR is part of a larger project.

If you still have nitpick PR comments

This is part of a series on reducing PR code review time, which is important because long feedback loops are one of the drivers of low developer productivity. Here are the other articles of the series with a TL;DR to remind you of the key takeaway, but there are details in the article of how to do it.

If code reviews take too long, do this first (TL;DR: gather data)
What to do about nitpick comments in PRs (TL;DR: eliminate them)
What to do about excessive (good) PR comments (TL;DR: make better PRs)
Not all PR nitpick comments are bad (TL;DR sometimes you need or want them)

If you are still getting nitpick comments after you have done what you can to eliminate them, try adding a checklist for comments. Here’s an example checklist to start from:

Is it a bug?
Does the code violate the specification?
Does the code violate our design system?
Is the code reimplementing something where it should reuse our library code?
Does the code not meet our style guide?
Does the code not meet our testing code coverage requirement?
Is the code not consistent with the vast majority of our codebase (especially recently approved code)?
Does the suggestion have an obvious, objective way to resolve it?

If a suggestion doesn’t meet your criteria for a good comment, then try one of these techniques to improve the comment

Reframe nitpick suggestions as an (objective) defect with a clear way to resolve it. So, for example, if you “don’t like” a variable name, give a reason why you don’t like it. Some examples that are actually defects: (a) the variable is two letters and you have a standard that requires four or more (b) the variable doesn’t match your naming convention (for example, you always use the word “count” instead of “length”) (c) you don’t understand the variable name — suggest better names — express how the name is confusing you, which is more objective than “not liking a name”.
Reframe the suggestion as the beginning of a process to update coding standards. When new non-defect, but possibly good comments are warranted, they should start as suggestions that might change the coding standard. In this case, you are going to move the discussion to what ever process you use to change standards (e.g. a retro)
Make the comment if we are in a situation that warrants it anyway: See Not all PR nitpick comments are bad

If the suggestion is still nitpick that you can’t improve, and it isn’t warranted anyway, don’t make it.