Author Archives: Lou Franco

Long-lived Computational Systems

Last month, I turned 51, which isn’t that old for a human. I’ll hopefully live a lot longer, but even now, my uptime is better than any computational system.

The closest I can come to finding something as old as me is Voyager. If you take away Voyager, I don’t even know if 2nd place is more than ten years old. It’s probably a TV.

Unlike anything we make today, Voyager was designed from the beginning to have a long-life. The design brief said: “Don’t make engineering choices that could limit the lifetime of the space craft”. This led the engineers to make extensive use of redundancy and reconfigurability.

For more details, check out this presentation of the design of Voyager by Aaron Cummings.

In the 40+ (and still counting) years that Voyager has been running, having redundancies and being reconfigurable has extended its life and capabilities.

And it hasn’t hurt one bit that the software is not “type-safe”, “late-binding”, or “functional”. It doesn’t make use of a dependency framework, design patterns, or an IO monad. It is not declarative—it’s probably not even structural. None of these things contributed to its long life.

This is why I find many of the arguments over software development paradigms so boring. None of it has resulted in anything like Voyager, and even that hasn’t been replicated to even a minimal extent. In all of our big systems, the adaptability comes from the people still working on it. If we stopped maintaining these systems, they’d stop working.

The only upside is that our AI overlords will probably only run for a month or two before crashing on unexpected input.

April Blog Roundup

This month I realized that this blog is easier to keep up with if I just document my projects. I released episodes 4, 5, 6, and 7 of my podcast about writing. I also wrote articles about how I am self-hosting it, using S3 for the media, and how I get simple stats.

I also wrote an article about Bicycle, an open-source library I am working on with a couple of friends, and I’ve been writing articles about WatchKit on App-o-Mat. And this post itself is an example of just documenting my projects.

Most of the articles I write for this space are about software development and processes.

  • In Defense of Tech Debt encourages you to just think of tech debt as a cost which might be acceptable.
  • But, I think Tech Debt Happens to You most of the time because of dependencies.
  • And then in Timing Your Tech Debt Payments, I compare payments to servicing interest and paying down principal and offer a best time to do the latter.
  • In It’s Bad, Now What?, I talk about pre-planning actions for when monitoring shows problems
  • In Assume the Failure, I recommend framing all risks as your own failures, not external ones, so that you can personally mitigate them.
  • In Mitigating the Mitigations, I show how you can’t wait until a risk materializes to do your mitigation plan. It might be something you need to do somewhat in parallel.

And, I wrote some articles on gaining expertise

I have been thinking about the great works in software and software writing as I think about where I want to spend my time. I think there’s an interesting cycle of making -> tool making -> making with the tool, where the result is content in a new medium.

Getting Podcast Stats from S3 Web Access Logs

I self-host the Write While True podcast using the Blubrry PowerPress plugin for WordPress and storing the .mp3 files in S3.

One downside of self-hosting is that you don’t have an easy way to get stats. Luckily, podcast stats aren’t great anyway, so whatever I cobble together is honestly not that different from what a host can give you, The only way to do better is to do something privacy-impairing with the release notes (tracking pixels) or by building a popular podcast player—neither of which I’m not going to do.

So, I use S3 and set it up to store standard web-access logs, so I have a log of each time the .mp3 file was downloaded. The main thing you need to do is get the log files local, and then you can count up the downloads by filtering with grep and counting with wc.

To download the logs, I use the AWS CLI (command-line) tool. Once you install it, you need to authenticate it with your account (see docs). Then you can use:

aws s3 sync <BUCKET URL> <local folder>

To bring down the latest logs.

The first thing you might notice is that there are a lot of log files. Amazon seems to create a new file rather than ever need to append to an existing one. Each file only has a few log lines in it in a typical web access log format. I store them all in a folder called logs.

I name the .mp3 file of every episode of Write While True in a particular way, so those lines are easy to find with

grep 'GET /writewhiletrue.loufranco.com/mp3s/Ep-' logs/*

This is every download of the .mp3 files. There’s no way to know if the user actually listened to it, but this is the best you can do.

It does overcount my downloads though, so I use grep -v to filter out some of the lines

  1. Lines containing the IP address inside my house
  2. Lines that contain requests from the Podcast Feed Validator I use
  3. Lines that contain requests from the Blubrry plugin

The basic command is:

grep 'GET ...' logs/* | grep -v 'my ip address' | grep -v 'other filters' | wc -l

This will give you a count for all episodes, but if you want to do it by episode, you just need to grep for each episode number before counting.

I’ll probably script up something to create a graph with python and matplotlib at some point. If I do, I’ll post the code and blog about it.

Single Case Enums in Swift?

I watched Domain Modeling Made Functional by Scott Wlaschin yesterday where he shows how to do Domain Driven Design in F#. He has the slides and a link to his DDD book up on his site as well.

One thing that stood out to me was the pervasive use of single case choice types, which is a better choice than a typealias for modeling.

To see the issue with typealias, consider modeling some contact info. Here is a Swift version of the F# code he showed.

typealias Email = String

The problem comes when you want to model phone numbers, you might do

typealias PhoneNumber = String

The Swift type checker doesn’t distinguish between Email and PhoneNumber (or even Email and String). So, I could make functions that take an Email and pass a PhoneNumber.

I think I would have naturally just done this:

struct Email { let email: String }
struct PhoneNumber { let phoneNumber: String }

But, Scott says that F# programmers frequently use single case choice types, like this

enum Email { case email(String) }
enum PhoneNumber { case phoneNumber(String) }

And looking at it now, I can’t see much difference. In Swift, it is less convenient to deal with the enum as Swift won’t know that there is only one case, and so I need to switch against it or write code that looks like it can fail (if I use if case let). I could solve this by adding a computed variable to the enum though

var email: String {
  switch self {
    case .email(let email): return email
  }
}

And now they are both equivalently convenient to use, but that computed variable feels very silly.

What it might come down to is what kind of changes to the type you might be anticipating. In the struct case, it’s easier to add more state. In the enum case, it’s easier to add alternative representations.

For example, in Sprint-o-Mat, I modeled distance as an enum with:

public enum Distance {
    case mile(distance: Double)
}

I knew I was always going to add kilometers (which I eventually did), and this made sure that I would know every place I needed to change.

If I had done

public struct Distance {
    let distance: Double
    let units: DistanceUnits
}

I could not be 100% sure that I always checked units, since it would not be enforced.

So, in that case, knowing that the future change would be to add alternatives made choosing an enum more natural. This is also reflected in the case name being miles and not distance. It implies what the future change might be.

Even so, I don’t think single case enums will replace single field structs for me generally, but they are worth considering.

Write About What You are Doing

On January 6th this year, I finished reading The Practice by Seth Godin, which is a series of arguments trying to get you to ship every day. When I was done, I was convinced. I have now gone over 3 months without missing a day.

The beginning was a pent up list of ideas that had just been sitting in my brain. I brainstormed a bunch of these in a topic list and I am making my way through them.

This has been a good source of blog posts, but honestly, how many of these can I write?

As I look through my recent and upcoming topics, I am starting to see a trend. I am much more writing about things I’m actively working on. Each Monday, I publish a podcast episode, so that’s one post per week that I don’t have to “think up”. About every two weeks, I give an update on App-o-Mat articles. And a couple of days ago, I talked about a new project, Bicycle (an open-source Swift library for modeling interdependent variables).

In these three cases, the thing I am doing is a lot more work than a blog post, but at least the blog post is easy to write. And, in the case of the podcast, I also documented my self-hosting setup in what will probably be three posts, and there was a post about podcast accessibility.

Even the things I am doing are a byproduct of something else. My App-o-Mat articles are lessons I learned from making Sprint-o-Mat. My podcast is about what I have learned by writing this blog—woah, full circle.

Since some of the great works of software were created to make the great works of software writing, I see that making begets tool-making begets making.

And even this post, which would have been impossible to write three months ago, is now easy.

Write While True Episode 7: Find Your Voice

Lately, I’m thinking a lot about what this podcast sounds like. I’m new to podcasting and I’m very aware that I have a lot to do to sound more natural, but that’s not exactly what I’m talking about.

Transcript

Use S3 to Serve Podcast Episodes

I started a podcast about a month ago, and for various reasons I decided to self-host it rather than use a podcast service. I am doing this mainly because I want the episodes to be available indefinitely, even if I stop making new ones, and I don’t want to pay for just hosting. I also don’t care about analytics, and I have the skills and desire to learn how to self-host.

I think this is the wrong choice for almost everyone who podcasts.

But, if you got this far, I will say that it’s probably right not to just put your mp3 files on your web-host. I haven’t really done the math, but these are large files, and if you get any kind of traffic, it will probably be expensive and possibly send you over your caps.

I’ve decided that the minimum I need to do is to use S3. I think it’s probably technically correct to also use a CDN, but I’ll cross that bridge if I get more traffic.

(If you have no idea what S3 or a CDN is, I really recommend you do not go down this route)

There are a lot of good guides out there for the specifics. I used these two:

In addition to setting up a bucket for your .mp3 files and artwork, I suggest you set up a separate bucket for logs and then send web access logs to that bucket. The AWS official docs are good to see how to do this.

By having logs stored you have enough to get some simple analytics. There are services that can read and graph the data in them.

I will post soon about how I scripted a simple way to get episode download counts.

Tech Debt Happens to You

In the original ANSI C, there are a bunch of library functions that use internal static variables to keep track of state. For example, strtok is a function that tokenizes strings. You call it once with the string, and then you call it with NULL over and over until you finish reading all of the tokens. To do this, strtok uses static variables to keep track of the string and an iterator into it. In early C usage, this was fine. You had to hope that any 3rd party library calls you made while iterating tokens weren’t also using strtok because there could only be one iterator at a time.

But when threads were introduced to UNIX and C, this broke down fast. Now, your algorithms couldn’t live in background threads if they used strtok. This specific problem was solved with thread-local variables, but the pervasive use of global state inside of C-functions was a constant source of issues when multi-threading became mainstream.

The world was switching from desktop apps to web apps, so now a lot of your code lived in a multi-threaded back-end that serviced simultaneous requests. This was a problem because we took C-libraries out of our desktop apps and made them work in CGI executables or NSAPI/ISAPI web-server extensions (similar to Apache mod_ extensions)

To make this work, we had to use third-party memory allocation libraries because the standard malloc/free/new/delete implementations slowed down as you added more processors (from constant lock contention). Standard reference-counting implementations used normal ++ and -- which aren’t thread-safe, and so we needed to buy a source code implementation of stl that we could alter to use InterlockedIncrement/InterlockedDecrement (which are atomic, lock-free, and thread-safe).

As the world changed around us, we could keep moving forward with these tech-debt payments.

Also, this was slow-paced problem—strtok/malloc/etc were written in the 70s and limped through the 90s. That’s actually not that bad.

But, the world doesn’t stop. Pretty soon, it was just too weird to implement back-ends as ISAPI extensions. So, you pick Java/SOAP because CORBA is just nuts, and well, that’s wrong because REST deprecates that, and then GraphQL deprecates that, and you picked Java, but were you supposed to wait for node/npm? Never mind what’s going on on the front-end as JS and CSS frameworks replace each other every 6 months. Even if you are happy with your choice, are you keeping your dependencies up to date, even through the major revisions that don’t follow Substitutable Versioning?

And I think that this is the main source of tech debt, not intentional debt that you take on or the debt you accumulate from cutting corners due to time constraints. The debt that comes with dependency and environment changes.

Being able to bring code into your project or build on a framework is probably the only thing that makes modern programming possible, but like mortgages, they come with constant interest payments and a looming balloon payment at some point.

There are some you can’t avoid, like the OS, language, and probably database, but as you go down the dependency list, remember to factor in the debt they inevitably bring with them.

Timing Your Tech Debt Payments

It’s impossible to ignore that developers have a visceral reaction against tech debt. Even if they agree that it’s worth it. That’s because they are the ones that need to service the debt.

Tech debt is a cost similar to real-life debt like a mortgage. If you can use tech debt to bring forward revenue and growth, you can pay off the debt later.

But, until then, the interest must be paid.

So, when you are calculating the cost of taking on some debt, a factor in that calculation is how much future work is going to happen on that code. The more work you do, the more interest you pay. If you fix bugs or add features to debt-laden code, you are servicing the debt by making an interest payment. If you refactor, you are paying off principal, and future interest payments are lowered, but that only matters if there are going to be future interest payments.

If you have a system that works and doesn’t need any changes, the fact that it has tech debt doesn’t matter.

To carry the analogy forward, some mortgages have penalties for early payment. Paying off tech debt also has a penalty, usually in QA and system stability.

This is why my favorite time to pay off tech debt is just before a major feature is being added to indebted code. You are trading off the looming interest payments (which will balloon) and your penalty is already being incurred, because you need to QA that whole area again anyway.