Category Archives: Software Development

Make (Tiny) Tech Debt Payments Visible

The second discipline in The Four Disciplines of Execution (4DX) is to act on the lead indicators to accomplish a big and important long-term goal. To practice this discipline, pick some short-term thing that you could do at any time. You have to design it so that continuously achieving the lead indicator would eventually build up and achieve the long term goal. For example, I want to publish a book by June 30, but what I track is the number of days I work on it each week—my goal is five per week. Every day, I have a choice to do this. It’s a metric I can move at any time. I think it’s very likely that if I accomplish this goal every week, I will have a book by June 30.

The third discipline helps you stay on track by asking you to build a scoreboard that lets you know if you are winning. I track my lead indicators in my journal and with some software that I am working on. I also decided to use the Chronicling App on my iPhone to make it extra visible as a widget on my home screen. The scoreboards need to be visible enough so that you are constantly reminded of it. The bulk of my day is spent NOT working on the book, but I want to keep it top of mind so that I can put in at least an hour each day.

I would love to apply this to tiny tech debt payments. This is something I try to do almost every day, guided by the Tech Debt Detectors in my Editor. The work ends up in commits that go into the PR I’m working on because they are targeted in the area I am trying to change.

The commit is the leading indicator. Over the course of months, they will result in a cleaner codebase that is easier to work on. Each one contains a refactoring or a new test, but not every refactor or test counts. When I am doing TDD or just testing as I go, that’s not a debt payment. Neither is cleaning up a PR for review or removing a TODO in the grace period. So, I can’t just look for commits that add tests or commit messages with the word “refactor”.

To track these commits, I will use the string “[payment]” in the commit messages. My scoreboard is just:

git log --oneline | head -10 | grep -i '\[payment\]' | wc -l

which would show me the number of commits in the last 10 that were payments. My goal for now is that it not be 0. I run yarn lint and yarn test constantly, so I could just make them report the number.

If you are on a team working on a messy codebase, then adopting something like this gives you agency every day to make some kind of difference. If you do it, then invest the time to make a central scoreboard to make that work visible.

Instant Coverage Check with Code Perturbance

Let’s say you don’t use coverage tools, and you’re looking at a function that you want to change and you wonder if it’s tested.

No problem—just change something random in the function.

Change a less-than to a greater-than, an && to an ||, or slightly alter an arithmetic expression. Change something small that keeps the code syntactically correct, but slightly wrong otherwise. Now, run your tests.

If tests fail, undo that change and try something different.

If the tests pass, you know this line is untested. Add a failing tests to force you to undo your change.

Keep perturbing lines until you feel comfortable that you’ve covered enough of the function to make it less risky to change.

Patterns for Comparing Multidimensional Things

Dates and Integers have a natural ordering. We all agree that January 1st is before January 2nd and ten comes after nine. But, there is no natural ordering for things like vectors, complex numbers, and matrices because they are multi-dimensional. Unfortunately, most things in real life are multi-dimensional.

A common way to deal with ordering a list of things in software is to put their attributes in different columns. You see this in email clients, spreadsheets, and lots of other software. Then, when you click on a column, the list is sorted by that attribute. You can explore various orderings for different contexts. Some might be more useful—for example, in my WordPress backend, I sort my posts by reverse chronological date. But, it’s valid to sort them by title if you need to. Good versions use the previous sort choice as a sub-sort that breaks ties in the main sort.

A grid listing blog posts sorted by date.

Another way is to come up with some kind of function of the attributes that results in a single-dimensional attribute that is easy to compare. One that I’ve seen on flight search websites is an “Agony” score that takes into account the number of stops, the price, and the departure time. You could sort by ascending agony and hopefully see the best choice that considered all of the variables, rather than just sorting on price.

I do something like this in my iOS app, Habits. For each habit, I look at your entire history with the habit. I weight recent adherence more than the past and try to come up with a score normalized between 0-100. My intent is that you can use that to compare how well you are doing on different habits.

A third way is to map attributes to elements of a chart. One attribute could be the x-axis and another could be the y-axis. You could map one attribute to the size of a dot and another to color along a gradient. If your x and y are categories rather than a continuous value, you might end up with a heatmap. This heatmap compares the amount of testing done on different iPhones and iOS versions.

A heat map the test status of iOS devices across different features in an app

For continuous axes, you might end up with a chart like this one you can generate with chartjs:

In that last chart, it matters which attribute you map to which chart element. It’s often the case that we filter for just the upper-right quadrant, so the x and y would override color and size. You might want to rotate through different choices of the mapping.

Lastly, you could generate radar charts for each thing. Putting the attributes along a multi-dimensional graph like this one:

This works well when you want to combine things to form a balanced whole. By overlaying two radar graphs, you can see if the combination is complementary.

Blue radar chart overlayed on top of the yellow one to show the combination

But, you could also get a sense of an ordering. You could calculate the covered area, which is function of the attributes. You could size the spokes and normalize the data on them to express a priority and to dampen the effect of outliers.

I’m thinking a lot about multidimensional comparisons as I consider ways to prioritize projects. I’ll be writing more about this soon.

A good reason to use TODO

On my first job, there was a vestigial TODO that always bothered me. It said

/* TODO: PSJB, Is this right? -AW */

I eventually figured out that “PSJB” were the initials of our CEO (who wrote a lot code for the early versions). I knew who AW was, but he left before I started, so I couldn’t ask him what this meant. I wanted to just delete it, but I could never figure out if the code it referred to was right. I left the company before figuring it out—the comment might still be there.

This was a bad way to use TODO.

To avoid this problem for others, for my last PR at Atlassian, I searched for every TODO that I left in the code, which was possible because we were forced to sign them. I resolved each one in some way and then removed it. Saying “toodle-oo” by removing “TODO: Lou” from our code made me smile.

None of these TODOs were there for good reason.

Since I’m 100% in charge of my code style guide these days, I don’t allow TODOs to be merged (it won’t pass linting). But, I do use TODO in the code while I am building a PR if it’s convenient—I’ll be forced to remove it before merge.

The only other TODO I’m ok with is ones that are going to be resolved very soon (within the grace period)—hopefully in a stacked PR.

If you have (tech) debt, inflation is good

I rent my apartment. I moved here in 2018, and over the last 5 years (because of many factors, but mostly inflation), my rent has gone up about 30%. Inflation over that time is about 22%, so it’s even gone up in constant dollars.

If you had a fixed-rate mortgage, your payments would have gone up 0%. When you have a mortgage, inflation makes housing costs relatively smaller in your total budget because everything else goes up. If your interest rate is lower than inflation, then you get to pay the debt back in lower-value dollars.

The same is true for tech debt where inflation is the increasing size of the codebase and the team. Writing more code shrinks the relative cost of the debt you have. Having more team members makes paying tech debt a smaller proportion of your work.

If you had a metric of tech-debt, new good code would tend to lower it. This is true as long as interest on the tech debt is not too high. For tech debt, interest payments are only due if you want to change the code.

If your roadmap requires you to mostly change tech-debt-laden code, then inflation is low (no new code) and so the interest payments are high. This is a good time to prioritize paying tech debt down.

Conversely, code that has debt, but basically works and is not going to be changed, is like having a 0% loan. You have the loan. It may one day come due, but at least you don’t have to service it if you don’t want to. If your team and codebase doubles in size, that debt will feel smaller.

How to Lower Tech Debt with One Easy Trick

Yesterday, I wrote about the Tech Debt Detectors that I use in Visual Studio Code.

Here’s what it looks like for one of my CRAP-y functions. The red bars in the left-gutter show that I don’t test this function at all, and the red square at the end of line 4 shows that it has a lot of branches.

I wrote this function to make it easier to call GQL Mutate functions with boiler-plate error handling. This function reduces the complexity of each calling function. I am ok with this being complex, so to reduce CRAP, I should be testing each branch. I was surprised that I didn’t already do this, because I test GQL calls with a mock server. I did a full text search for for the function name, and I see that … I NEVER USE IT?!

Ah yes, now I remember. I didn’t like that this code wasn’t type-safe, so I generated type-safe variants from my queries (see Why I am Using Code Generation Again, Part I and Extending GraphQL Code Generation (Part II).

I never removed this function after migrating all code to the new version, so I did it now. Deleting code is a great way to lower tech debt. No code—no debt.

Tech Debt Detectors

When I wrote Use Your First Commit to Fix CRAP I said that “there are extensions for many IDEs to get you the [CRAP] metric directly”, but I hadn’t installed any. I thought that the two components of CRAP were easy enough to notice without them, but that’s only half-way true. Today, I use two extensions for Visual Studio Code to help make CRAP-y more evident to me.

Note: The CRAP metric indicates that a function is risky to change because it’s complex and undertested. To fix the function, you either need to break it into smaller functions or add tests—both actions are generally good, so it’s a metric that’s hard to game.

The first component of a CRAP-y function is its complexity, which you can estimate by counting its branches. So, count each if/else-if/else, case in a switch, loop/break/continue, and each or/and in your boolean expressions. You are trying to get an idea of how many paths there are. Since, you want to keep function complexity very low, you really don’t need to count every branch—you can stop at some low (single-digit) number. It isn’t hard to estimate a YES/NO answer to the question of complexity for any particular function, but the problem is remembering to ask.

To get complexity in Visual Studio Code, I am using CodeMetrics by Kiss Tamás. For each function, the extension shows a green, yellow, or red indicator and a short message above the function.

The second component of CRAP is test coverage. To show that in my editor, I use Coverage Gutters. This extension shows red and green markers to the left of the code to indicate if a line was run during tests. It needs you to generate standard code coverage files, which jest can do for me. It should support any language that has standard coverage support (i.e. in lcov format).

I’ll show some examples of what this looks like and how I fixed problem areas in upcoming posts.

Projects that fail never pay off tech debt

I just shut down a project I started in October 2021. It was code for a startup, but it turned out the idea didn’t have traction, and my partner and I decided that it wasn’t worth pursuing. The tech debt in this project will never be paid. If I had been paying it all along, it would have been a waste of time.

This was not a full-time project for me, and I am the only developer on it, so there’s not a ton of code. But, even a three-month project could have a little debt, so even though it’s not that old, it had some debt.

Like most projects, it had dependencies. I just checked my yarn.lock files and I see that the last time I did an update was about a year ago. I consider all third-party dependencies to be tech debt, especially as they get out of date, so that’s one that’s always building on most projects. The only way to avoid dependency debt is to not have dependencies. Which, in a way, is true now.

The biggest codebase issue that I was wrangling with was authorization. The permission model was getting a little out of control, and the code wasn’t helping make sense of it. I had been planning something more attribute based in the code, but well, now I don’t have to worry about it.

If there’s a lesson to learn here, it’s this: Don’t rush to pay off debt in projects that have a good chance of dying. The goal should be to get customers. To the extent that it’s not externally perceivable to customers, code health is usually not much of a factor in early traction.

Tech Debt in a 3 Month Old Project

I’ve been working on a new web app since October. The backend is a GraphQL API on top of a database schema. Each entry point unpacks the request and then calls a function that queries or mutates the database. There are some complex mutations—ones where there might be updates, inserts, and deletions that all need to succeed or fail together.

I know that I’m supposed to do this with transactions, but I was also using an ORM that I’m not very familiar with, and wanted to make progress quickly. I decided to pay lip-service to transactions, wrap some functions in them to indicate to myself that I needed to think about them, but mostly just ignore them. I would never not do transactions in a production app, but for this, it made sense.

After MVP, a month or so ago, I’ve just been adding minor things while my partner onboards some trial users. Eventually, I tried to implement a feature where the API call could result in many inserts and deletions, and I could see in tests that I could cause this to fail in ways that corrupted the data.

So, I had to stop, learn how to do transactions in this ORM, and then implement them in my data code. It took a few hours, and now the debt is paid off.

I do this kind of short-term borrowing/paying of debt all of the time. I am borrowing to keep myself in flow. I try to keep the debt top-of-mind by making it very evident in the code. It’s like using a credit card where you pay it off inside the grace period.

Be Skeptical of Points-based Productivity Claims

I do not personally use Story Points in my estimates because I know that everyone outside of development will translate them to time and I want to do that for them.

But, consider this claim: Adopting GitHub Copilot will increase productivity by 20%. I actually believe that to be true, but can you show it with Points? No, you cannot.

Here’s how it should go

You do 10 sprints, you see that you have a steady state velocity of 100 Points
You introduce GitHub Copilot
Maybe for 2 sprints, you see velocity go up because developers over-estimate their stories. Let’s say it’s 120 now. Better report that quick, because …
Then, devs start to adjust their estimates and velocity goes back to where it was.

Productivity went up, but velocity should stay constant because points are just time. You don’t get more time because of productivity gains.

If you see sustained velocity improvements (without changing the number of team members), then I would suspect gaming or a misunderstanding of how points are supposed to work.

Points and Velocity are best for front-line managers to understand what is going on with their teams and to size sprints. They should not be reported outside of the team, because they will be misunderstood and misapplied.