Category Archives: Software Development

Metrics that Resist Gaming

The other day I described CRAP, which is a metric that flags code that is risky to change. I suggested that if you need to change that code, you start with tests and refactoring.

Tests and refactoring are positive to the codebase and improve the readability of the PR they are in, which is why I like CRAP as a metric—it’s hard to game.

In contrast, Story points (and velocity based on it) are the exact opposite.

Story points are completely made up numbers that have low accountability. Velocity is just points-over-time, so it’s also made up. If a team must improve velocity (for their manager), the easiest thing to do is to over-estimate the points per task and like magic, velocity can meet any target. I don’t think engineers would do this consciously, but this is just a known phenomenon of metrics (see Goodhart’s Law).

This is one of the reasons I don’t use Story Points. But to be honest, almost any estimation technique is ripe for gaming (ala Scotty from Star Trek).

When I wrote about DevEx, a new developer productivity methodology, I wrote that I thought that they “do help engineers deliver software better and faster”, but that they are most useful to the team itself (not stakeholders).

Looking over that article I realized that the thing I like about these metrics is that they are hard to game. If I get several 3-hour blocks of uninterrupted coding time per week, then I am sure I can write more and better code than if I didn’t. Counting lines of code (and judging its quality) is fraught, but hours of uninterrupted time is easy to count and more is better.

If you are worried that coders will shirk other duties (like code reviews or attending meetings), there is another metric to measure feedback loops, which is in tension to the flow metric.

My main critique of DevEx still stands—it’s not something to report outside of the team. But the more I think of it, the more I like it and will try implementing it on my (1-person) team.

Use Your First Commit to Fix CRAP

The CRAP metric combines cyclic complexity and test code coverage to flag functions that are both complex and under tested so that you can see which functions are risky to change.

There are extensions for many IDEs to get you the metric directly or that will show the parts (test coverage and complexity). But you don’t really need them, because you know CRAP-y code when you see it—run unit tests to see if the function is under test and eyeball the complexity by counting up the branches and logical sub-expressions—you can stop counting at about four, because more than that is probably CRAP.

So, if you have to change a CRAP-y function, you could start the PR by trying to lower the score.

The first step to reduce CRAP scores is to add tests. Complex functions are often hard to test, but I would add any tests you can to start, because they help with the next step.

Next, lower complexity by refactoring the function down into simpler parts. The tests you just added will make sure you do it right, but these should be simple mechanical refactors that might even be automatable by your IDE. If they are not trivial, you need to add more tests. Do not restructure or rewrite code unless that’s the goal of the PR—all of your changes should preserve the observable behavior of the code.

I start a lot of PRs this way. It’s a good way to get warmed up, and you know that you are improving the code base in a place that benefits the most from it. You are paying technical debt down right before an interest payment was due.

First Rule of Refactoring Club

Don’t talk about refactoring club.

A long time ago, I linked to this post by Martin Fowler (author of Refactoring), where he lamented at the misuse of the word “refactoring”:

However the term “refactoring” is often used when it’s not appropriate. If somebody talks about a system being broken for a couple of days while they are refactoring, you can be pretty sure they are not refactoring. [This is] restructuring.

For me, refactoring might be part of every PR. My first commit is often a refactoring that makes the rest of the commits easier to do and understand. I might also refactor at the end, but those commits will be squashed before I PR since you don’t need to see how I got there.

In TDD, there’s a specific practice to Red, Green, Refactor your way to working code (or as I do it Green, Refactor, Red) that explicitly thinks of refactoring as a small thing you do often.

The tell to knowing that you are doing refactoring wrong is that you feel like it’s something to talk about. Refactoring, when done well, is about as interesting as variable naming.

It’s not not interesting, but you don’t need to talk about it in a stand-up.

Making Sausage and Delivering Sausage

There’s an article about DevEx, a new developer productivity methodology, in ACM Queue. If you subscribe to the Pragmatic Engineer newsletter, there was an interview with the article’s authors last week. This is the latest methodology from the people behind DORA and SPACE.

DORA’s measurements were grounded in externally visible outcomes.

  • Deployment Frequency
  • Mean Time to Recovery
  • Change Failure Rate
  • Lead Time

The idea was to pick things that engineers could actually control.. Even though the elements of DORA are not directly translatable to business outcomes, they are still understandable to external stakeholders.

In SPACE, these metrics are still one kind that we collect, but SPACE also recognizes that there are other things besides Performance and Activity metrics (the P and A of SPACE). It also considers Satisfaction, Communication, and Efficiency, which are more internal to the team.

In DevEx, the emphasis is on internal metrics: Flow, Cognitive Load, and Feedback Loops.

I want to say upfront that I completely agree that these things do help engineers deliver software better and faster. But they are hard to share outside of the team. It’s how the sausage is made. The business ultimately needs to deliver sausage.

Aside from the rest of the business not understanding or caring about these metrics, I also worry that they will try to get too involved in them. Engineering leadership should care a great deal about the cognitive load of the members of their teams, and should work to lower it, but they need to find a better way to express that outside of engineering if they do.

I know the DevEx authors know this, and emphasis on these “making sausage” metrics doesn’t mean that they don’t also think externally visible performance isn’t important (they did after all design DORA and SPACE). But if you deliver on, for example, long flow states, but there isn’t more useful software on servers, you have failed in the business objective. This is the same thing I said about Story Points—they are too far removed from things people outside of engineering care about:

[…] regular people just translate [story points] to some notion of time anyway, and in that regard, I am very regular. If you are going to take some random number I provide and assign a date to it, I’d much rather I do the translation for you.

To the extent that you report directly on DevEx, try to emphasize the parts outsiders can help with. Frequency of meetings and speed of external feedback loops (especially from product management) are good examples of that.

C4 Context Diagrams in GitHub READMEs

I discovered C4 diagrams two years ago and I’ve been using them in my private projects since then. I use Confluence for all of my project documentation, so I’ve been using the draw.io add-on to make the diagrams because that’s the best solution I’ve found that lets me edit the diagrams inside of the document.

As I wrote in Towards a Portfolio Based Interview Process for Programmers, when describing what a GitHub portfolio repository should look like:

I could use an orientation. I need a starting place. The bigger the project, the harder it will be to jump in and take a look around. Give me what you’d give a new contributor.

The purpose of a context diagram is to explain the boundaries of your system. You do this by representing your system as a single box in the center and surrounding it with your various user roles and collaborating systems (see more on the C4 site).

To put one in a README, you could use mermaid’s C4 support for context diagrams. This support is experimental (in May 2023), so it’s very hard to get a good diagram. I am personally finding it impossible to get anything more than simple diagrams to look good enough.

To give an example, here is some code that I used to add a context diagram to the Page-o-Mat README (here’s a link to the commit in case I change it later).

C4Context
  %% This is a Mermaid diagram for the system context
  Person(designer, "Journal Designer")
  System(pageomat, "Page-o-Mat", "Makes Journal PDFs")

  Person_Ext(journaler, "Journal User")
  System_Ext(printservice, "Print Service", "A PDF printing service (e.g. LuLu).")
  
  Rel(designer, pageomat, "Creates specs for")
  Rel(pageomat, printservice, "Generates PDFs for")
  Rel(journaler, printservice, "Buys journals from")

  UpdateLayoutConfig($c4ShapeInRow="2", $c4BoundaryInRow="1")
  UpdateRelStyle(designer, pageomat, $offsetX="-40", $offsetY="40")
  UpdateRelStyle(journaler, printservice, $offsetX="-40", $offsetY="40")

It looks like this:

Page-o-Mat C4 context diagram

You can see that color is being used to indicate which parts of the entire system are in scope for the project.

If that’s too frustrating to use, then I suggest MonoDraw for making an ASCII Art version. The downside is that this diagram is hard to edit, but a context diagram doesn’t change much. Another small issue is that you can’t use color. I can live with just text tags (in <<>>) for that because it saves me having to add more files to the repo for diagrams.

More on Copilot for Learners

Yesterday, I wrote that you shouldn’t use GitHub Copilot if you are a new programmer:

But, if you are still learning to program, then I am worried that Copilot would inhibit that. The main way it does this is by not letting you practice remembering.

I just saw a paper called: GitHub Copilot AI pair programmer: Asset or Liability? that studied this issue and concludes:

Based on our findings, if Copilot is used by expert developers in software projects, it can become an asset since its suggestions could be comparable to humans’ contributions in terms of quality. However, Copilot can become a liability if it is used by novice developers who may fail to filter its buggy or non-optimal solutions due to a lack of expertise.

So, check that paper out if you want to see actual data supporting this belief. One of the core findings is that Copilot generates wrong code that is easy for experts to fix, which was my intuition based on using it.

Mental Representations in Coding

I recently reread Peak by Anders Ericsson, which is the source of the term “Deliberate Practice” that was popularized by Malcom Gladwell’s “10,000 hours”. The truth is more complicated, and I highly recommend reading both Peak and Kathy Sierra’s Badass, which deep dives on the practicalities of gaining expertise.

I wrote some thoughts on applying deliberate practice to learning how to code a couple of years ago. Looking that over, I was very focussed on what I thought you should do, which was to turn vague descriptions into code (not transcribe code tutorials). I had not remembered another important point, though. One purpose of the training is to help build a mental representation that makes it easier to do the task.

One of the mental representations I have is that UI layouts are primarily about nested rectangles and the relationships of their sizes and positions to each other.

If I were learning to code HTML/CSS now, I would try to learn this first and only make pages of different colored rectangles in common layouts. I would try to limit CSS to just the layout attributes and the HTML to just <div> and maybe just enough to see the layout.

Next, I would add the concept of invisible rectangles that are just helping with the relationships.

Next, we could explore media query breakpoints and other responsive features of web coding.

Only then, would I attempt to make “real” pages. And, I would start with getting the layout done first.

The key is to identify the mental representation that helps to do the harder task and to break it down to a simple task that you drill over and over to make it automatic.

Standards Help Me Port My Brain, Not My Programs

SQL is a well-established standard and is implemented in a consistent way across a variety of DBMS’s. In my career, I’ve worked with MySql, SQLServer, SQLite, and Oracle. My knowledge of SQL goes over pretty unchanged.

My code for all of these systems is hopelessly incompatible.

I can’t grab SQL from my node app and put it in my iPhone app, because there’s more to it than the SQL. The SQL is embedded in the host language and often abstracted in a library (like TypeORM, Django, CoreData, or JDBC). These languages and libraries are completely incompatible with each other. I tend to use ORMs, but this would be true even without them because you often want some kind of composability with your SQL clauses. You’d have to build that in the host language and then spread partial SQL strings around your code.

Even if you tried to keep your SQL pure and outside the host language, you’d run into portability issues. Each of the individual DBMS vendors add to their implementation of SQL in mutually incompatible ways (mostly for good reason). User needs often outpace the speed of the standardization bodies, and so the vendors need to innovate outside of the standard.

You run into this with web programming because there is no standard way in SQL to grab rows N through M from a SELECT. In order to implement paging, you need to be able to do this efficiently. SQL has also been extended in incompatible ways to store external blobs. Even simple things, like dates, are not consistent.

And, I haven’t even mentioned stored procedures.

But, even though SQL doesn’t help me port my programs, SQL is a good way for me to think about data. I can show it to other developers or DBA’s, and we can have a conversation. Most of my data code will be based around SQL in some way and will be understandable to any developer who knows SQL.

My programs get many benefits from SQL, but not portability. And this is true with standards in general. Instead of Java’s promise to “write once, run anywhere”, we live in a world that React Native calls “learn once, write anywhere”, and it’s not bad.

Co-working First Impressions

I started working remotely in 2013 and haven’t spent significant time in an office since then. When I went to Trello’s NYC office, it was mostly for offsites or to onboard a new team member, so I wasn’t planning on getting a lot of programming done. Even so, the Trello founders were highly influenced by Peopleware, and knew that company offices needed to provide a quiet working environment.

But, now that I’m working alone, I do miss having some interactivity with people during the day, so I am trying out a co-working space once a week. Today is my 2nd day.

Some random thoughts.

I am really glad I brought noise cancelling headphones. It’s just enough (along with listening to ocean sounds) to drown out the one-sided zoom meetings when I am trying to code or write (luckily, it seems to be somewhat rare).

Sitting by a window is nice. It’s on the third floor across the street from a residential neighborhood. All I see are shade trees, rooftops, and the big blue sky. At home, my window is on my left, slightly behind me. This gives me a place to stare to rest my eyes.

A view outside of a co-working office showing tree tops, rooftops, and the sky

The weather is hot enough for shorts and a t-shirt, but like everywhere else in Florida, when you get inside, they have the AC cranked up. I get to wear jeans, which I miss.

I thought I would miss my monitor more. They have a place to store one, and I was already planning to do that, but I’m getting a lot done right now without it. If that keeps up, I probably won’t bother.

Doing this on Mondays sets up the week well. I guarantee that I won’t have a meeting (because I am blocking the whole day). I would not have done that if I was working from home. Since I end each week with a plan for the next week, I can just get going when I get here.

They have regular and counter-height desks. You can stand at the latter.

Audio Copilot

When I first wrote Robotic Pair Programmers I really was imagining that an AI code assistant would be more like a side channel and not inside autocomplete. To be fair, I was programming mostly in Xcode at the time and had not yet experienced VSCode, so my conception of how fast an IDE could be was antiquated.

I do think that the autocomplete integration is genius, but I have also been wanting an audio side-channel to my work that would be more like a human assistant. Think of it as a ChatGPT enabled Siri, but still fully integrated into my IDE.

Here’s a script of what that might look like (NOTE: the first draft of this script was generated by ChatGPT, which I am pointing out because I think it’s interesting in this case. I edited it to fit the article better.)

Programmer: “I want to do a binary search in Python. Can you help?”

AI assistant: “Sure. One way to start is to define a function called binary_search that takes in a sorted list and a target element. Inside the function, you can initialize variables called left and right to the beginning and end of the list, respectively.”

Programmer: “Okay, I’ve defined the function. What’s the next step?”

AI assistant “Next, you can use a while loop to iterate as long as the left index is less than or equal to the right index. Inside the while loop, you can calculate the middle index by taking the average of the left and right indices.”

Programmer: “Got it. And if the element at the middle index is the target element. I can return the index. What do I do if the middle index isn’t the target element?”

AI assistant: “If the target element is less than the element at the middle index, you can set the right index to be the middle index minus one. If the target element is greater than the element at the middle index, you can set the left index to be the middle index plus one. This way, the while loop will continue until the target element is found or the left and right indices cross each other.”

I would expect that the assistant would make incorrect assumptions or mistakes and then the programmer would clarify.

More importantly, when the programmer is programming, the AI assistant will still be making suggestions via autocomplete, but now is much more aware of the goal and so we’d expect the suggestions to be better.

The much bigger win will be when the assistant doesn’t wait for my requests, but interrupts me to help me when I am doing something wrong. To continue the binary_search example, if I set left to the middle index (off by one) then the assistant would let me know my mistake via audio (like a human pair would).

Just like in Assistance Oriented Programming, I think the key is to get intent in Copilot as early as possible.

Addendum

This example is simple, but I generated lots of interesting scripts in ChatGPT where the programmer and assistant collaborated on

  1. Testing the binary search
  2. Doing quicksort together, but I asked ChatGPT to make the assistant make incorrect assumptions that get corrected.
  3. Building a burndown chart in a web based bug tracking program

They were all interesting, but I didn’t include these because the that isn’t the point of the article.