Algorithmic Code Needs A Lot of Comments

I recently read an online debate between Bob Martin and John Ousterhout about the best way to write (or re-write) the prime number algorithm in The Art of Computer Programming by Donald Knuth. Having read the three implementations, I think Knuth’s original is the best version for this kind of code and the two rewrites lose a lot in translation.

My personal coding style is closer to Ousterhout’s, but that’s for application code. Algorithmic code, like a prime number generator is very different. For most application code, the runtime performance will be fine for anything reasonable, and the most important thing to ensure is that the code is easy to change, because it will change a lot. Algorithmic code rarely changes, and the most likely thing you would do is a total rewrite to a better algorithm.

I have had to maintain a giant codebase of algorithmic code. In the mid to late 2000’s, I worked at Atalasoft, which was a provider of .NET SDKs for Photo and Document Imaging. We had a lot of image processing algorithms written in C/C++.

In the six or so years I was there, this code rarely changed. It was extensively tested with a large database of images to make sure it didn’t when we updated dependencies or the compiler. The main two reasons why we would change this code was to (a) fix an edge case or (b) improve performance.

The most important thing that this code could have that would help is a lot of documentation. It was very unlikely that the coder would know what this code was doing. It probably would have been years since its last change, and unless it was our CTO making the change, there is no way anyone could understand it quickly just from reading the code. We needed this code to run as fast as possible, and so it probably used C performance optimization tricks that obfuscated the code.

Both Ousterhout and Martin rewrote the code in ways that would probably make it slower at the extremes, which is not what you want to do with algorithmic code. Martin’s penchant for decomposition is especially not useful here.

Worse than that, they both made the code much harder to understand by removing most of the documentation. I think they both admitted that they didn’t totally understand the algorithm, so I’m not sure why they think reducing documentation would be a good idea.

To be fair, this code is just not a good candidate to apply either Ousterhout’s or Martin’s techniques, which are more about API design. Knuth described the goal of his programming style this way:

If my claims for the advantages of literate programming have any merit, you should be able to understand the following description more easily than you could have understood the same program when presented in a more conventional way.

In general, I would not like to have to maintain code with Knuth’s style if the code needed to be changed a lot. But, for algorithmic code, like in his books or in an image processing library, it’s perfect.