Category Archives: Software Development

Mastodon Debugging

Last week I wrote a post on Mastodon about a problem I was having and then while I was describing it, I came up with a way to move forward:

Trying to figure out the right way to deploy a web and mobile web client on the same server. They are both SPA’s so mostly just need their packages to get served. Stymied by not having an exact local deployment.

Should solve that first actually. Thanks mastodon teddy bear.

The last sentence is a reference to “Teddy Bear Debugging’. The idea is that before you ask for help, you go over to an anthropomorphized inanimate object and explain the problem to it. If your description is detailed, you often just solve it yourself. This apparently also works with Mastodon toots. I solved my problem later that day by building a faithful local deployment.

Then, I realized that a mastodon is an animal that there might be a plush version of, and when I searched, I found this: https://shop.joinmastodon.org/products/mastodon-plushie — Mastodon (the organization) sells a plush mastodon! At the time, it was sold out and for Europe only, but someone tooted at me when they went on sale in the US a few days later. I felt compelled to order one to support Mastodon, but also because it’s a great debugging tool.

Why even do code reviews?

For most of my career, code reviews were rare. They weren’t directly supported by the most common source controls systems I used (RCS, SourceSafe, CVS, SVN, and early TFS) until git. Branching was hard to impossible, so we just all committed to the main branch. As a lead, I would read a lot of diffs after the fact, but not all of them, and I rarely sent comments.

I did, however, adopt unit tests and CI relatively early—in 2000, right after I read eXtreme Programming Explained [affiliate link]. So every change was unit tested soon after it was committed. We didn’t do code reviews, but we did a reasonable amount of pair programming, which had similar effects. Frankly, even after adopting code reviews for every PR a decade later (and pairing less), I can’t say that software I worked on had noticeably fewer bugs. Seems about the same.

What code reviews did do was make sure that more people than the author knew about the changes that were happening. Code reviews were far more efficient than pairing, and could be done asynchronously. I did code reviews nearly every day I worked on Trello, and I never worked on a team codebase I felt more generally comfortable with.

I think the defect-finding part of code reviews can be done by AI or other automated tools better than most humans—certainly faster. It’s not easy to see bugs just by reading code unless it’s egregious, and the automated tools are great at finding the egregious errors. Type systems and unit tests are also better than most programmers.

But, the information sharing part—that’s the real reason I think code reviews are important. I recently wrote that legacy code is code we didn’t write or review, so we shouldn’t let automation take over this part yet.

How we did CI in 2000

Right after I read eXtreme Programming: Explained [affiliate link], we built a CI system at Droplets (where I worked). CruiseControl didn’t exist yet, so anyone trying to do this just figured it out for themselves. Our system was based on a physical build token.

All of our desktop machines were named after movies—the CI Build Machine was named Goodfellas. I printed out an 5×7 poster of the movie and put the manual instructions for doing CI on the back. Then, I laminated it. You needed to have this card on your desk if you wanted to commit code.

Here is what it said on the back of the card:

You must have this card on your desk to proceed
Commit your code
Go over to Goodfellas (don’t take the card)
Get your changes
Run the build and test scripts
If it’s broken, fix it
If you know someone was waiting for this card, give it to them
Otherwise, put the card back where you found it

The build and test were automated, but you did need to manually grab the card.

Doing this solved our problem of broken builds caused by “it worked on my machine” problems. Seems antiquated now, but before easy branching and merging, this one-at-a-time verification caught a lot of problems early.

Great Works of Software (Poetry Edition)

Three years ago, I identified five pieces of software that I thought were “great works” that were written by one or a very few number of people. In rough chronological order, they are C/Unix, The ARM instruction set, TeX, VisiCalc, and the Web Browser/Server. I didn’t pick them based on what the software did, but they are all basically programming languages. They also are somewhat old (> 30 years), yet still exist or have a lineage of derivative projects (i.e. they were not lost).

Since then, I thought of two others. They are also programming languages of a sort and both are enabled by the existence of the web. Both are in heavy use, albeit mostly through derivatives. They are also both relatively tiny programs—more like poetry than prose.

The first is Markdown by John Gruber—you can download the original Perl at that link. Markdown is ubiquitous and finished, in the sense that it hasn’t been updated, and it doesn’t need updates.

Even the derivatives don’t need updates. In 2016 I wrote the Swift implementation of Trello Flavored Markdown by porting our web client’s JavaScript version (so it would match exactly). It was a little longer than Gruber’s original, but not by much. Once it was done, it only had one update that I know of.

The second software poem is the original Wiki software by Ward Cunningham. I don’t know if the original source is available, but I used this Perl implementation he published when I wanted a locally hosted version—it was my original second brain. I use Obsidian for this now, which (of course) is based on Markdown.

It’s funny that both Gruber and Cunningham chose Perl to write their poems. Perl syntax lends itself to writing actual poems—I have a vague memory of seeing some in Larry Wall’s Perl book. I would also note that both poems have minimal dependencies. I don’t think Markdown uses anything that isn’t in Perl. WikiWikiWeb uses Berkeley DB by default, but I trivially replaced that with simple files in my version (which I can’t find, but I do have the files, and they are just text).

They were also built in a time when it was common to build web server software in Perl (Markdown is 2004, WikiWikiWeb is 1995). I learned Perl in 1993 at my first job, which we used to replace all of our shell and awk scripts. I built production websites with Perl until 1999, but Perl was still my go-to language for scripting until 2013 when I switched to Python for web backends, scripting, and anything where Pandas would help. I’m sure Perl has equivalents for everything I do, but I never learned modern Perl.

My attempt at a poem of this nature is Page-o-Mat, a YAML-based language for making journals. It’s small, but I need YAML and PDF dependencies to make that happen. That fact about modern programming makes me think we won’t see another great software poem again.

How to Reduce Legacy code

There are two books that I read that try to define legacy code: Working Effectively with Legacy Code [affiliate link] by Michael Feathers and Software Design X-Rays [affiliate link] by Adam Tornhill. Briefly, Feathers says it’s code without tests and Tornhill adds that it’s code we didn’t write. In both cases, there’s an assumption that the code is also low quality.

To deal with the first problem, you should of course, add tests, and Feathers gives plenty of tips for how to do that. I would add that you should use a tool like diff-cover to make that happen more often. Diff-cover takes your unit test code coverage reports and filters them down so that you can see the code changed in the current branch that isn’t covered. Often, that code is inside of other functions, so covering your code will reduce the legacy code you touched.

This makes sense because once you have changed code, you really can’t say that you didn’t write it. Tornhill’s book describes software he sells that identifies legacy code using the repository history. It highlights code that no current engineer has touched. Your edit would remove that code from the list.

But, the author of the PR isn’t the only person who should be considered to know this code—the code reviewer should be counted too. If they have never seen this part of the codebase before, they should learn it enough to review the code. This is why I don’t like the idea of replacing human reviewers with AI. It has a role in reviews, but you can’t increase the number of developers that know a portion of a codebase by letting AI read it for you.

Every part of the codebase that is edited or reviewed by a new person reduces its “legacyness”. So does every test. If your gut tells you an area of the code is legacy, check the repo history and coverage. If you have a budget for dealing with engineering-led initiatives, assign a dev who hasn’t made edits to that area to write some tests.

How to make it harder to game PR metrics

This is part of a series on shortening PR code review time. In the first post, I recommended gathering metrics (see If code reviews take too long, do this first). If you base a goal on these metrics, it’s possible that that metric will be gamed. This is not because of malice—it’s a well-known phenomenon (Goodhart’s Law), which just recognizes that it’s human behavior to make a metric improve if leadership says that it has to.

So, if you set a goal to lower the total time from PR creation to merge, then that time will probably shorten. You are hoping that we do this the right way by reducing bad comments and pre-checking PRs. But it’s inevitable that some improvement in this metric will be caused by behavior that you don’t want. For example, some bugs that could have been caught in review will make it to QA and production. This is the tradeoff if you want faster reviews—slow reviews are expensive too, and they are only worth it for expensive defects. Even if we agree that it’s cheaper to find problems in review than QA or production, there is still a cost where we’re fine with it being found in later (or never). But, if expensive issues are being found in production, it’s worth checking if that‘s because reviews are not thorough enough.

One way to make metrics less likely to be gamed is to pair them with a compensating metric that measures the effect of the gaming. So, in a call center, you might measure the time spent on a call, which you would want to decrease. To make sure that customer problems are still being solved, you might pair that with customer satisfaction scores or the number of times a customer has to call back. You want the pair of metrics you choose to both improve, not for one to improve at the expense of the other.

This is why I recommend pairing testing code coverage with some other metric that makes sure you get coverage of the most important or most risky code. For code review time, you might pair that with the DORA metric of “Change Fail Percentage” (percent of deploys with a bug requiring a hotfix or rollback). The pairing of metrics should imply a set of behaviors that can improve them both.

You could also go a step further and track the behaviors (or lead indicators) and not just the lagging indicators. A good lead indicator is a behavior, not an outcome, so it’s harder to game. For example, to make it more likely that a PR has a low idle time and is easier to review, it should be small. We don’t want to get into line counts, so just a high-level indicator of “excellent”, “acceptable”, and “unacceptable” PR size that is based on a fuzzy combination of the number of commits, changeset size, and maximum commit size could be enough. Just giving this indicator back to the author before they PR (with a local script) would encourage them break down their PR’s. If leads see undesirable behavior sneaking through, they could augment the script or override its assessment and reject the PR anyway.

Paired, complementary metrics are harder to game because the combination limits behaviors to those that can improve both metrics. A leading indicator is harder to game because it’s happening in real-time and you can react to it. So use these techniques to make sure your metrics are showing improvement in the underlying thing you care about.

How long PR code reviews lower developer productivity

I’ve been writing about code reviews for the past week because I think that shortening review cycles is one of the easiest ways to increase developer productivity. But I’ve taken that as a given so far because I’ve seen it on my own teams. A few years ago, I saw that the relationship between productivity and things like code review time has been supported by research, and I continue to see it in my consulting practice.

At Trello, I worked on the iOS app. We were a team of ten engineers who worked closely with the ten engineers building the Android app. We shared managers, PMs, designers, roadmaps, and thought of ourselves as the mobile team. We had shared processes, but since we were somewhat insulated from the other teams, with less intermingling, our processes were different from the rest of Trello.

At some point, a (non-mobile) engineer decided to gather metrics across all teams to see what he could learn about code reviews (see If code reviews take too long, do this first). He plotted various statistics and presented them for discussion. In slide after slide, the two mobile teams were outliers on nearly every statistic. The four metrics that stood out were: the size of the PRs, the number of PRs, PR idle time, and the total time to approval. Both mobile teams had the smallest and most PRs (normalized for team size). We also had the shortest times, which I think is a consequence of the first two metrics.

I can tell you from being on the mobile team in that time, that this was intentional and built into the mobile team culture. I was trained on our PR culture on my first day, when the entire mobile team was about five engineers, and we kept passing that on to new team members (see Construct PRs to Make Reviewing Easy). The short feedback loops were enabled by our careful PR construction and pre-checks, which I documented in PR Authors Have a lot of Control on PR Idle Time. When we dug deeper into the code review data, we found that the most senior iOS engineer and the most senior Android engineer were modeling that behavior (short and frequent PRs), which set a tone for the teams.

The shorter loops also contributed to my personal happiness at work and feeling of productivity. It was only after I left the team that I saw DevEx’s identification of feedback loops as a key driver of productivity. In the three years since I left, I’ve been seeing code review feedback loops as the easiest loop to shorten. It also has a large and immediate impact because it affects every engineer nearly every day.

According to DevEx, short feedback loops help by reducing context switching and enabling rapid course correction. I have seen that they also align estimated time with calendar time, which is the main source of the discrepancy in the feeling of productivity between engineering and everyone else (see Confession: I Estimate Using Time).

Some of the effect of shortening loops improves productivity without requiring us to “work faster”, but its effect on context switching is different. Reducing context switching does make us work faster. If we have to wait for days for a PR to be approved, then it takes longer to get started on the next step. We might have had to work on something unrelated in the meantime. That switch back to this PR (which might require review responses) causes a delay whose length is determined by the length of the waiting time. Having shorter code review loops means that we probably still have enough of the context in our heads and can respond faster.

So, reducing the wait time for a code review to start (PR idle time) and the total time it takes to approve a PR both reduces calendar time and context switches, which in turn reduces the time it takes to respond to a code review and to move on when the PR is part of a larger project.

If you still have nitpick PR comments

This is part of a series on reducing PR code review time, which is important because long feedback loops are one of the drivers of low developer productivity. Here are the other articles of the series with a TL;DR to remind you of the key takeaway, but there are details in the article of how to do it.

If code reviews take too long, do this first (TL;DR: gather data)
What to do about nitpick comments in PRs (TL;DR: eliminate them)
What to do about excessive (good) PR comments (TL;DR: make better PRs)
Not all PR nitpick comments are bad (TL;DR sometimes you need or want them)

If you are still getting nitpick comments after you have done what you can to eliminate them, try adding a checklist for comments. Here’s an example checklist to start from:

Is it a bug?
Does the code violate the specification?
Does the code violate our design system?
Is the code reimplementing something where it should reuse our library code?
Does the code not meet our style guide?
Does the code not meet our testing code coverage requirement?
Is the code not consistent with the vast majority of our codebase (especially recently approved code)?
Does the suggestion have an obvious, objective way to resolve it?

If a suggestion doesn’t meet your criteria for a good comment, then try one of these techniques to improve the comment

Reframe nitpick suggestions as an (objective) defect with a clear way to resolve it. So, for example, if you “don’t like” a variable name, give a reason why you don’t like it. Some examples that are actually defects: (a) the variable is two letters and you have a standard that requires four or more (b) the variable doesn’t match your naming convention (for example, you always use the word “count” instead of “length”) (c) you don’t understand the variable name — suggest better names — express how the name is confusing you, which is more objective than “not liking a name”.
Reframe the suggestion as the beginning of a process to update coding standards. When new non-defect, but possibly good comments are warranted, they should start as suggestions that might change the coding standard. In this case, you are going to move the discussion to what ever process you use to change standards (e.g. a retro)
Make the comment if we are in a situation that warrants it anyway: See Not all PR nitpick comments are bad

If the suggestion is still nitpick that you can’t improve, and it isn’t warranted anyway, don’t make it.

Not all PR nitpick comments are bad

This article is part of a series on reducing code review time. It helps if you have read these two first.

If code reviews take too long, do this first (gather data)
What to do about nitpick comments in PRs (eliminate them)

In the second article above, I define a nitpick comment as one that is just an opinion and not a defect, where a defect is something that doesn’t meet some objective criteria. A bug is an obvious example but so is a style-guide violation. A nitpick is a comment where the author and reviewer might disagree, but the issue isn’t objective, and there isn’t an obvious way to resolve the problem.

I recommend that you have almost none of these kinds of comments in a review, but there are a few cases where it’s ok.

During onboarding: You should set the expectation to your new hires that the first month or so of PRs might have excessive commenting as a way of training on culture of the team. To make this less onerous, early PRs should be very small. The more junior the new hire, the longer this might last.
If requested: Sometimes we are really do want to ask for opinions on code that isn’t ready for merge, but you want more eyes on it. In that case, authors should tag their PR as something that they would like more help on, and then any suggestions are fine. Authors that are doing new things in the codebase (and possibly setting new standards) should proactively request (nitpick) opinions.
As a precursor to setting a standard: if you have a strong opinion that you think should become a team standard, then starting the conversation in a PR comment might be warranted. I would recommend that you quickly move it to whatever process that your team uses to change their standards (e.g. a retro). Use the PR as an example.
If rare: If you think it won’t derail the PR and that the author would welcome it, then go ahead. A good example is pointing to an interesting technique that the author might not be aware of, but you think applies. This is more of an FYI than a request to change something.

What to do about excessive (good) PR comments

This post is part of a series on reducing the time it takes for a PR to go through code review. Here are the first two articles:

In step 1, we gathered data, and in step 2, we took steps to eliminate nitpick comments. But, we could still have slow code reviews because there is a lot of discussion about the pull request that needs to happen before the code is merged.

Some of this is fine. There will always be outlier pull requests that merit careful attention. However, if this is the norm, then it’s worth taking steps to reduce the code review feedback loop because (according to DevEx) long feedback loops are one of the drivers of low developer productivity.

When you have a lot of good commenting on a PR it might be a sign that the author isn’t checking their own code enough before they create the pull request. Spending an hour or so making sure that a pull request doesn’t have obvious problems saves the time of the reviewer but also reduces the feedback loop, which could go over several days.

On my team at Trello, almost all PRs were approved within a day because we had a culture of making sure that PRs were easy to review. I have documented some of our practices here:

To construct a PR that is easy to review, the code must be correct and also easy to know that it’s correct. Here are some ways to reduce the time it takes for reviewers to approve PRs:

Use automatic linters and code formatters that make it impossible to PR code that doesn’t meet your coding standards. The reviewer doesn’t need to check for style problems as they are impossible.
Go beyond simple automations. Automated review tools exist for accessibility, security, code coverage, static analysis, etc. These should establish a baseline to assist the reviewer and not replace them. Reviewers should not look at PRs until they pass the automated checking.
If you can’t automate something, create a short checklist for the author to go through before posting the PR.
If it’s clear that a PR has lots of problems, a lead should have a 1:1 with the author to train them on self-checking and pair program a better PR with them. This avoids back-and-forth that might take days. The goal should be to get the PR into a state that the lead would approve.

When PRs are slowed down because of excessive back-and-forth between the author and reviewer, but all of that discussion was necessary, that is an indication that not enough work is being done by the author to make the code obviously correct. Don’t discourage the commenting or short-cut the review in this case.