Category Archives: Software Development

Invest 10% of a team (not of each dev) to pay back tech debt

If you budget 10% of your team’s time to paying down technical debt, there are a few ways you could do it.

  1. Make sure 10% of the story points of each sprint are technical debt related
  2. Assign every other Friday (10% of 2 weeks) to everyone paying down technical debt (see this article for a story about Tech Debt Friday)
  3. Assign 10% of the team to spend 100% of their time paying down technical debt (rotating who this is every quarter or so, or at project boundaries)

I’ve done some variation on all of these ways, and in my experience, #3 works the best. When I was at Trello, my team allocated more like 30%, but tech debt was lumped together with anything “engineering driven”, which was more than just debt payoff (e.g. tooling).

The main reason #3 works better is how companies typically review and reward developers. Something that is 10% of your work is never going to show up on your review. Over time this is generally a disincentive to do it. But, if you are supposed to spend 100% of your time on something, then it has to show up on your review.

Making this someone’s full time job for a quarter means that they can plan bigger projects with more impact. It’s hard to get a PR done in one day, so it takes about a month to get anything deployed at all. When you work on it full-time, you can deploy much more frequently. When I was on a tech-debt project, I would use the first week to deploy some extra monitoring that could measure impact or catch problems.

It allows devs to get into the zone, which is really helpful in giant refactoring or restructuring/rewrite slogs. If you only get one day every two weeks, you have to reacquaint yourself with anything big, which eats into that one day quickly.

It will also make it more likely that this debt paydown is localized. This makes it easier to test that it hasn’t caused regressions.

Finally, (for managers) it’s easier to measure that you are actually spending your budget correctly because you don’t have to monitor individual stories over time. You just need to track how developers were allocated over time.

Other articles about Tech Debt:

Knowing Assembly Language Helps a Little

I can’t say I recommend learning assembly, and I never really had to write much professionally, but knowing it has been helpful in giving me a mental model of what is happening inside a computer.

I started with assembly soon after I started programming in BASIC. In the eighties, all of the computer magazines listed assembly programs because that was the only way to do some things. Jim Butterfield’s Machine Language for the C64 [amazon affiliate link] was a classic.

In college, I used assembly in a few classes. In Computer Architecture we had to write a sort algorithm in VAX assembly, and in my Compilers course, we had to generate assembly from C (and then we were allowed to use an assembler to make the executable).

This was last time I wrote any significant amount of assembly, but in all of the time I worked in C, C++, Java, C#, and Objective-C, I found myself needing to read the generated assembly or bytecode on a lot of occasions. There were some bugs that I probably could have only figured out this way. Knowing how different calling conventions work in C on Windows was part of my interview at Atalasoft (and it was actually important to know that on the job).

So, if you have any interest in it, I would try it out. The main issue is that modern instruction sets are not optimized for humans to write. But, I learned 6502 assembler on a C64, and if you learn that then you can get into the wonderful world of C64 Demos.

Moore’s Law of Baseball

For almost my entire life (and before that all the way back to the dawn of baseball), the stats on the back of a baseball card were unchanged. If you got the box scores for your favorite player, you could calculate their stats yourself with a pencil. That’s not necessarily good. These stats were simple and misleading.

For example, it was clear in the 90’s that on-base percentage was more important than batting average. This got expanded on in the money ball era. Computers were brought in to analyze players, and so analyzing players was now subject to Moore’s Law, which can be simplified to say that we double computer power every 18 months. We’ve had about 20 doublings since then.

What the Moore’s Law of baseball? The number of stats is doubling every 18 months, all enabled by modern compute power.

There’s a stat called WAR or Wins Above Replacement, which tries to tell you how many wins a player adds to their team relative to the average player at their position (who has a WAR of 0). To calculate WAR for a single player you need every outcome from every player. It’s so complex, that we can’t agree on the right way to do it, so we have a dozen variants on it.

Stats like Exit Velocity, Launch Angle, Spin Rates, Pitch Tunneling, and Framing are only possible to know because of high-speed cameras and advanced vision processing enabled by Moore’s law. We’re not limited to describing what has happened already—some broadcasts put pitch-by-pitch outcome predictions on the screen.

Even with all this advancement, it still sometimes feels like we’re still at the dawn of this era. As a fan, these don’t feel like the right stats either. No one will be put in the hall of fame because they hit the ball hard a lot of times.

Just need a few more doublings, I guess.

LUI LUI

I go by Lou, but my entire family calls me Louie, so I smiled when I found out that there is such a thing called a Language User Interface that uses natural language to drive an application and that it was called a LUI.

In a LUI, you use natural language. So this is not the same as a keyword search or a terminal style UI that uses simple commands like the SABRE airline booking system.

In this video, it output responses on a printer. But the display terminal version was not that different. I worked on software that interfaced with it in 1992, and this 1960’s version is very recognizable to me.

But, this is not a LUI. A LUI does not make you remember a list of accepted commands and their parameters. You give it requests in just the way you would a person, with regular language.

In SABRE, a command might look like this:

    113JUNORDLGA5P

But, in a SABRE LUI, you’d say “What flights are leaving Chicago O’Hare for Laguardia at 5pm today?” which may be more learnable, but a trained airline representative would be a lot faster with the arcane commands.

With a more advanced version that understood “Rebook Lou Franco from his flight from here to New Orleans to NYC instead” that uses many underlying queries and commands (and understands context), the LUI would also be a lot faster.

This would have seemed far-fetched, but with ChatGPT and other LLM systems, it feels very much within reach today.

My Typing Teacher was a Genius

When I was in middle school, typing was a required subject. I don’t really know why.

In the early eighties it was not common for people to type at work. There were still specialists for that. Even in the late eighties when I worked in an accounting office and there were secretaries that took dictation and typed up memos. Computer spreadsheets existed, but the accountants there still used pencil and paper and secretaries typed them up if they needed look more formal.

This was the world my typing teacher, Mrs. Cohen, grew up in and probably worked in before becoming a teacher. I think, that deep down, she knew that we wouldn’t find typing relevant, and honestly, the class didn’t take it that seriously.

But one day, she read us an article from the local paper that said that kids needed to learn how to type because computers were going to be a big thing and soon everyone would need to know how to type. It had a huge impact on me—I still remember it very clearly.

I had already been exposed to programming and even had a computer at home. But, coding was just for fun. I didn’t think it would be a job, or that I would be typing every day at work. Mrs. Cohen was the first person that made me think that computers would be more than a toy.

The System Boundary is Defined by the External Pieces

In a C4 System Boundary diagram, you start by drawing a blue box in the center. That’s your system. And you draw some blue stick figures with arrows pointing at that box. Those are your users.

An empty blue box next to blue human shape. An arrow is pointing from the human to the box.

Every system in the world pretty much looks the same if you stop there. Put some words on the parts to make it more specific.

A blue box next to blue human shape. An arrow is pointing from the human to the box. The blue box says "Sprint-o-Mat: a watchOS app to guide programmed runs" and under the human is a caption "Runner". An arrow points from the human to the box and says "sets up and runs with". This is the system in a context diagram.

But this diagram is called a Context diagram for a reason. The most important part is not the system box (the three other types of C4 diagrams will elaborate on it), but the all of the gray boxes and stick figures you put around it.

A blue box next to blue human shape. An arrow is pointing from the human to the box. The blue box says "Sprint-o-Mat: a watchOS app to guide programmed runs" and under the human is a caption "Runner". An arrow points from the human to the box and says "sets up and runs with". This is the system in a context diagram.

Under that are gray boxes that say HealthKit, RunGap and Running Social Networks. These are external context systems. There is a gray set of humans that say "other runners" The diagram show the relationship between them.

These are the external pieces that are not the system and who are not your users. They are out-of-scope, but do a lot of work in the diagram to help describe the system.

Metrics that Resist Gaming

The other day I described CRAP, which is a metric that flags code that is risky to change. I suggested that if you need to change that code, you start with tests and refactoring.

Tests and refactoring are positive to the codebase and improve the readability of the PR they are in, which is why I like CRAP as a metric—it’s hard to game.

In contrast, Story points (and velocity based on it) are the exact opposite.

Story points are completely made up numbers that have low accountability. Velocity is just points-over-time, so it’s also made up. If a team must improve velocity (for their manager), the easiest thing to do is to over-estimate the points per task and like magic, velocity can meet any target. I don’t think engineers would do this consciously, but this is just a known phenomenon of metrics (see Goodhart’s Law).

This is one of the reasons I don’t use Story Points. But to be honest, almost any estimation technique is ripe for gaming (ala Scotty from Star Trek).

When I wrote about DevEx, a new developer productivity methodology, I wrote that I thought that they “do help engineers deliver software better and faster”, but that they are most useful to the team itself (not stakeholders).

Looking over that article I realized that the thing I like about these metrics is that they are hard to game. If I get several 3-hour blocks of uninterrupted coding time per week, then I am sure I can write more and better code than if I didn’t. Counting lines of code (and judging its quality) is fraught, but hours of uninterrupted time is easy to count and more is better.

If you are worried that coders will shirk other duties (like code reviews or attending meetings), there is another metric to measure feedback loops, which is in tension to the flow metric.

My main critique of DevEx still stands—it’s not something to report outside of the team. But the more I think of it, the more I like it and will try implementing it on my (1-person) team.

Use Your First Commit to Fix CRAP

The CRAP metric combines cyclic complexity and test code coverage to flag functions that are both complex and under tested so that you can see which functions are risky to change.

There are extensions for many IDEs to get you the metric directly or that will show the parts (test coverage and complexity). But you don’t really need them, because you know CRAP-y code when you see it—run unit tests to see if the function is under test and eyeball the complexity by counting up the branches and logical sub-expressions—you can stop counting at about four, because more than that is probably CRAP.

So, if you have to change a CRAP-y function, you could start the PR by trying to lower the score.

The first step to reduce CRAP scores is to add tests. Complex functions are often hard to test, but I would add any tests you can to start, because they help with the next step.

Next, lower complexity by refactoring the function down into simpler parts. The tests you just added will make sure you do it right, but these should be simple mechanical refactors that might even be automatable by your IDE. If they are not trivial, you need to add more tests. Do not restructure or rewrite code unless that’s the goal of the PR—all of your changes should preserve the observable behavior of the code.

I start a lot of PRs this way. It’s a good way to get warmed up, and you know that you are improving the code base in a place that benefits the most from it. You are paying technical debt down right before an interest payment was due.

First Rule of Refactoring Club

Don’t talk about refactoring club.

A long time ago, I linked to this post by Martin Fowler (author of Refactoring [amazon affiliate link]), where he lamented at the misuse of the word “refactoring”:

However the term “refactoring” is often used when it’s not appropriate. If somebody talks about a system being broken for a couple of days while they are refactoring, you can be pretty sure they are not refactoring. [This is] restructuring.

For me, refactoring might be part of every PR. My first commit is often a refactoring that makes the rest of the commits easier to do and understand. I might also refactor at the end, but those commits will be squashed before I PR since you don’t need to see how I got there.

In TDD, there’s a specific practice to Red, Green, Refactor your way to working code (or as I do it Green, Refactor, Red) that explicitly thinks of refactoring as a small thing you do often.

The tell to knowing that you are doing refactoring wrong is that you feel like it’s something to talk about. Refactoring, when done well, is about as interesting as variable naming.

It’s not not interesting, but you don’t need to talk about it in a stand-up.

Making Sausage and Delivering Sausage

There’s an article about DevEx, a new developer productivity methodology, in ACM Queue. If you subscribe to the Pragmatic Engineer newsletter, there was an interview with the article’s authors last week. This is the latest methodology from the people behind DORA and SPACE.

DORA’s measurements were grounded in externally visible outcomes.

  • Deployment Frequency
  • Mean Time to Recovery
  • Change Failure Rate
  • Lead Time

The idea was to pick things that engineers could actually control.. Even though the elements of DORA are not directly translatable to business outcomes, they are still understandable to external stakeholders.

In SPACE, these metrics are still one kind that we collect, but SPACE also recognizes that there are other things besides Performance and Activity metrics (the P and A of SPACE). It also considers Satisfaction, Communication, and Efficiency, which are more internal to the team.

In DevEx, the emphasis is on internal metrics: Flow, Cognitive Load, and Feedback Loops.

I want to say upfront that I completely agree that these things do help engineers deliver software better and faster. But they are hard to share outside of the team. It’s how the sausage is made. The business ultimately needs to deliver sausage.

Aside from the rest of the business not understanding or caring about these metrics, I also worry that they will try to get too involved in them. Engineering leadership should care a great deal about the cognitive load of the members of their teams, and should work to lower it, but they need to find a better way to express that outside of engineering if they do.

I know the DevEx authors know this, and emphasis on these “making sausage” metrics doesn’t mean that they don’t also think externally visible performance isn’t important (they did after all design DORA and SPACE). But if you deliver on, for example, long flow states, but there isn’t more useful software on servers, you have failed in the business objective. This is the same thing I said about Story Points—they are too far removed from things people outside of engineering care about:

[…] regular people just translate [story points] to some notion of time anyway, and in that regard, I am very regular. If you are going to take some random number I provide and assign a date to it, I’d much rather I do the translation for you.

To the extent that you report directly on DevEx, try to emphasize the parts outsiders can help with. Frequency of meetings and speed of external feedback loops (especially from product management) are good examples of that.