Category Archives: Software Development

Using Market Data in the Python Net Worth Estimator

When I first made the python version of the net worth spreadsheet, I used a function like this:

def netWorthByAge(
    ages, 
    savingsRate = 0.18, 
    startingNetWorth = 10000,
    startingSalary = 40000,
    raises = 0.025,
    
    marketReturn = 0.06,
    inflation = 0.02,
    
    retirementAge = 65
  ):

You could pass in parameters, but only constants. This is how the spreadsheet works as well—each cell contains a constant.

The first thing to do to make it more flexible is to use a lambda instead for the marketReturn parameter:

marketReturn = (lambda age: 0.06),

Then, when you use it, you need to call it like a function:

netWorth = netWorth * (1 + marketReturn(age - 1)) + savings

We use last year’s market return to grow your current net worth and then add in your new savings.

The default function is

(lambda age: 0.06)

This just says that 0.06 is the return at every age, so it’s effectively a constant.

But, instead, we could use historical market data. You can see this file that parses the TSV market data file and gives you a simple function to look up a historical return for a given year.

Then, I just need to create a lambda that looks up a market rate based on the age and starting year:

mktData = mktdata.readMktData()
for startingYear in range(fromYear, toYear, 5):
  scenario = networth.netWorthByAge(ages = ages,
    savingsRate = savingsRate, 
    marketReturn = (lambda age: mktdata.mktReturn(mktData, age, startingYear = startingYear))
  )

And this will create a scenario (an array of doubles representing net worth) for each year in the simulation.

You can download and run the code to play with it. Here’s a sample chart it generates:

There are lines for starting the simulation in 1928, 1933, 1938, 1943, 1948, 1953, and 1958. This gives you an idea of the expected range of possibilities.

How Programmers Can Beat the Market

In my last few posts I’ve been trying to model net worth over time as a function of savings. In my last post, I used real historical market data instead of a constant 6%. Looking at a few scenarios, the market returned more like an average of 9.5% over those time periods.

I still think you should use 6% in your plans (if it’s actually 9.5% in the next 60 years, then that’s good news—you can make adjustments every decade if you are way ahead of plan).

Reminder: I am not a financial advisor and this is not advice. I don’t know anything about your personal situation. Talk to a fiduciary if you want advice.

But, you might think: “I bet I can beat the market—I understand tech/bitcoin/stonks better than most people.”

It’s not likely.

S&P keeps a scorecard of actively managed funds against their benchmarks. In a single year, active funds may do ok against the market, but go to page 9 and look at their longer term performance. 75% don’t beat the S&P 500 over 5 years, and 94% don’t beat it in 20.

These are funds with professionals with a staff who spend all day, every day thinking about this and are paid based on performance. You can beat 94% of them by just putting your money in an index fund.

But, then what do you do with all of that free time?

To beat the market, remember that you are also a player in a market. If you work full-time, you are in the labor market. You could also create products and sell them (in the market).

Look at your current net worth. To pick a random number, let’s say it’s currently 100k. If the market returns 6%, you’ll have 6k more at the end of the year. If you try to beat the market by picking stocks, and get 10%, you have made $4k more. Let’s say your net worth is $500k. Working to get that 10% return will get you an extra $20k.

Is this really the best way to make an extra $4-20k? It’s not a sure thing—you might not beat the market (like over 50% of active managers each year). You could lose money.

So, instead, invest in yourself.

To “invest” in the labor market, you could take courses to make yourself worth more and then either ask for a raise or change jobs. You could raise your profile with an open-source project, writing, or by giving talks. You do not need to be “famous” to do this. Your projects or work don’t need a million followers. You could beat the market with a few dozen.

Outside of your job, you could seek side-income. If you make $100/hour, you only need 40 hours of consulting (less than 1 hour/week) to “beat the market”. Technically, you beat the market in your first hour.

Or with your first sale.

And there aren’t a lot of costs to consulting, e-books, or software (other than your own time). But, at least you know that the time spent will net a positive return. Even if you make no sales, it’s pretty likely that you have made yourself more valuable.

If you spend 40 hours and make $4k in the market, that’s a one-time effect. You haven’t made yourself more valuable, and it’s unlikely you can do this year after year. If you get a $4k raise, your salary is $4k higher the next year too. Beating the market is automatic and you can build on it.

More good news: the less you money you have, the more “return” you can make this way. If you have $10k in the bank, making $10k on the side doubles your net worth. Getting a $5k raise is like a 50% return.

You can’t beat that picking stocks.

Adding Market Data to the Net Worth Spreadsheet

I found some historical market data on an NYU business professor’s home page. I put the first four columns on a new sheet in the Google sheets version of the net worth estimator.

I also put a sample portfolio that you could play with. It’s 70% stocks and then a mix of bonds.

Then, on the main sheet, I use the portfolio return column instead of the fixed 6% return.

Historically, a portfolio like that returned more than 6% — more like 9.5% on average. I showed three lines, one for starting in 1928, one for 1948, and one for 1958. Since I am trying to simulate 60 years, that’s about as late as I can can start. But, for planning purposes, being conservative is still a good idea in my opinion (which is worthless as I am not a financial advisor and this is not advice).

There are many things wrong with this model that I’ll address soon. The main issue is the the post-retirement spending model seems way too low. I am also using a constant inflation rate instead of historical data. Finally, wage inflation has not necessarily kept up with inflation, and certainly in down market years, we could expect wage freezes or temporary unemployment.

All of these things are a lot easier to model in python.

FIRECalc only simulates net worth after retirement, so it can do many more simulations (because of the shorter duration). Still, there are more than 40 possible scenarios in this data.

Next, I’ll add this data to the python version and try to draw more scenarios.

FIRECalc

In my net worth estimation spreadsheet, I used 6% as a default market return and recommended that you keep it conservative. If the market happens to be better that that, that’s good news. If you estimate it too high, then you might not save enough.

Another way to model this is to use historical market returns year-by-year.

The site FIRECalc does this for understanding post-retirement spending vs. expected market growth. It takes your portfolio size and starting expenses as input and then draws a line on a graph for each possible retirement year using real market data. You get an idea of what would have happened in all possible historical scenarios and what percent of the time you would not have had enough in your portfolio.

If you want to do a more sophisticated analysis, there are advanced versions on the site. For example, it assumes constant (inflation-adjusted) expenses, but if you want to do something more sophisticated than that, it offers a few spending models. You can also adjust your portfolio stocks v. bonds balance.

Doing this kind of analysis is the kind of thing that’s fairly easy to do in a program, but not as easy to do in a spreadsheet. It’s not impossible, but you would need a couple of columns for each scenario. In my version, to show 30 different saving scenarios, I need 60 columns, plus a few columns for the actual historical data.

I’ll work on updating the spreadsheet to show a few scenarios to give you an idea.

The Net Worth Spreadsheet in Vanilla Python

This article is based on the spreadsheet I made to estimate net worth over time.

I think that Excel is basically a programming language, and I have an interest in bridging the gap from it to more traditional languages. And I needed an excuse to get matplotlib working on my machine.

Luckily, procrastination worked in my case. If I had tried it a few months ago, numpy was not yet working on M1 macs, but as of python 3.9.4, it installs normally using pyenv and pip.

I started a GitHub repo to hold the Excel spreadsheet and python port — I’ll be posting more ports to various frameworks. I like it as a simple example because it has conditionals, loops, arrays and is a pretty useful thing to know.

It’s also a good starting point for learning more programming. Excel is great, and there’s a lot you can do, but with the python version, I could add the following features.

  1. Instead of using constants for the various inputs (like market rate), use a lambda and define different ways the market could move rather than constant
  2. Do similar things with spending models in retirement, inflation, etc.
  3. Make it easier to have more lumpy spending/saving — meaning, assume aggressive spending in youth, then a period where you might have children and buying a house, then a house sale in the future, etc.
  4. Make it possible to show many more scenarios at once (like 30 historical market curves effect on your plan)

To be fair, all of this is very possible in Excel. It’s just a lot easier in python.

The Net Worth Spreadsheet Documentation

Yesterday, I posted a spreadsheet to help you explore how savings rate relates to eventual net worth in retirement.

It’s a very simple spreadsheet with a simple model, so I wanted to document it here in case you want to play the formulas.

Here is a description of the inputs and how they are used.

Age/A2: Your age. You can set this to your current age, or a past age if you want to figure out a benchmark

Year/B2: The year associated with the age in A2.

The sheet allows you to compare two scenarios. I1:I8 corresponds to Scenario A (the blue line) and K1:K8 corresponds to Scenario B (the red line)

Starting Net Worth/I2/K2: Your net worth at the age in A2

Market Return/I3/K3: The market return that will be applied to your net worth at the end of each year to determine the starting point for the next year. Obviously, this is simplistic. I recommend keeping this conservative—Use the nominal return (not the return after inflation).

Starting Salary/I4/K4: The gross salary (or income) at your age in A2

Inflation/I5/K5: The rate of inflation to use on expenses after retirement.

Raises/I6/K6: The expected % raise to your salary you will get each year. I set this a little over inflation. It is expected that your expenses and savings increase proportionally such that your saving rate stays constant.

Retirement Age/I7/K7: The age you plan to retire. At this point, you start to draw down against savings. Inflation is applied to the expenses every year. The starting expense amount is estimated based on your savings rate.

Savings Rate/I8/K8: The % of your gross income that you save.

Things this spreadsheet does not try to model:

  • Big expenses like houses and college tuition
  • Increasing expenses because of children
  • Volatile markets
  • Windfalls

Gerry Sussman on Biological Systems

Yesterday, I lamented that our computer systems are so short-lived, as opposed to biological systems (like humans) which routinely live lives twice as long as the longest-lived computer system.

I want to be clear that I am talking about the uptime/runtime of a mostly static system, not something like UNIX that is constantly maintained (unless there’s a 3B2 in Bell Labs somewhere running UNIX from the 70’s processing payroll or something).

I was reminded about this talk from Gerry Sussman titled We Really Don’t Know How to Compute.

My main takeaway has to do with types and correctness. Basically, that they are a dead-end. They are very useful (I use them!), but correctness isn’t an interesting goal for a long-lived system.

Sussman brings up biology—and one point he stresses is adaptability.

If adaptability is a key to long-livedness, then type-systems and correctness appear to be in opposition to that. As do runtime assertions. Imagine if humans “crashed” if they got unexpected input. Or what if humans simply refused to “boot” if they had a minor gene “incorrectness” (admittedly, they do refuse to boot with major gene defects).

Here’s an example of adaptivity in humans: we take food as input. We were designed by evolution to use whole plants and maybe some meat as optimal fuel.

However, a modern human can live on processed food, much more meat and dairy, oils, refined sugar, and many things that did not exist when we were designed. We don’t crash immediately on that input or even reject it. We get fat, we develop heart-disease, diabetes, etc.

In other words, we get feedback that the input is bad—eventually the system will end earlier than it would have with better input, but there are many examples of long-lived humans that have never had perfect input.

Is the Human system “correct”? How would you use Domain Driven Design and types to describe the input to this system?

The reality is that the input is essentially infinite and unknown, and what matters more is the adaptability and feedback.

Gerry’s talk has something to say about what kind of programming language you need for this (Spoiler Alert: Scheme), but generally more dynamic, more data-driven languages will work better.

Long-lived Computational Systems

Last month, I turned 51, which isn’t that old for a human. I’ll hopefully live a lot longer, but even now, my uptime is better than any computational system.

The closest I can come to finding something as old as me is Voyager. If you take away Voyager, I don’t even know if 2nd place is more than ten years old. It’s probably a TV.

Unlike anything we make today, Voyager was designed from the beginning to have a long-life. The design brief said: “Don’t make engineering choices that could limit the lifetime of the space craft”. This led the engineers to make extensive use of redundancy and reconfigurability.

For more details, check out this presentation of the design of Voyager by Aaron Cummings.

In the 40+ (and still counting) years that Voyager has been running, having redundancies and being reconfigurable has extended its life and capabilities.

And it hasn’t hurt one bit that the software is not “type-safe”, “late-binding”, or “functional”. It doesn’t make use of a dependency framework, design patterns, or an IO monad. It is not declarative—it’s probably not even structural. None of these things contributed to its long life.

This is why I find many of the arguments over software development paradigms so boring. None of it has resulted in anything like Voyager, and even that hasn’t been replicated to even a minimal extent. In all of our big systems, the adaptability comes from the people still working on it. If we stopped maintaining these systems, they’d stop working.

The only upside is that our AI overlords will probably only run for a month or two before crashing on unexpected input.

Getting Podcast Stats from S3 Web Access Logs

I self-host the Write While True podcast using the Blubrry PowerPress plugin for WordPress and storing the .mp3 files in S3.

One downside of self-hosting is that you don’t have an easy way to get stats. Luckily, podcast stats aren’t great anyway, so whatever I cobble together is honestly not that different from what a host can give you, The only way to do better is to do something privacy-impairing with the release notes (tracking pixels) or by building a popular podcast player—neither of which I’m not going to do.

So, I use S3 and set it up to store standard web-access logs, so I have a log of each time the .mp3 file was downloaded. The main thing you need to do is get the log files local, and then you can count up the downloads by filtering with grep and counting with wc.

To download the logs, I use the AWS CLI (command-line) tool. Once you install it, you need to authenticate it with your account (see docs). Then you can use:

aws s3 sync <BUCKET URL> <local folder>

To bring down the latest logs.

The first thing you might notice is that there are a lot of log files. Amazon seems to create a new file rather than ever need to append to an existing one. Each file only has a few log lines in it in a typical web access log format. I store them all in a folder called logs.

I name the .mp3 file of every episode of Write While True in a particular way, so those lines are easy to find with

grep 'GET /writewhiletrue.loufranco.com/mp3s/Ep-' logs/*

This is every download of the .mp3 files. There’s no way to know if the user actually listened to it, but this is the best you can do.

It does overcount my downloads though, so I use grep -v to filter out some of the lines

  1. Lines containing the IP address inside my house
  2. Lines that contain requests from the Podcast Feed Validator I use
  3. Lines that contain requests from the Blubrry plugin

The basic command is:

grep 'GET ...' logs/* | grep -v 'my ip address' | grep -v 'other filters' | wc -l

This will give you a count for all episodes, but if you want to do it by episode, you just need to grep for each episode number before counting.

I’ll probably script up something to create a graph with python and matplotlib at some point. If I do, I’ll post the code and blog about it.

Single Case Enums in Swift?

I watched Domain Modeling Made Functional by Scott Wlaschin yesterday where he shows how to do Domain Driven Design in F#. He has the slides and a link to his DDD book up on his site as well.

One thing that stood out to me was the pervasive use of single case choice types, which is a better choice than a typealias for modeling.

To see the issue with typealias, consider modeling some contact info. Here is a Swift version of the F# code he showed.

typealias Email = String

The problem comes when you want to model phone numbers, you might do

typealias PhoneNumber = String

The Swift type checker doesn’t distinguish between Email and PhoneNumber (or even Email and String). So, I could make functions that take an Email and pass a PhoneNumber.

I think I would have naturally just done this:

struct Email { let email: String }
struct PhoneNumber { let phoneNumber: String }

But, Scott says that F# programmers frequently use single case choice types, like this

enum Email { case email(String) }
enum PhoneNumber { case phoneNumber(String) }

And looking at it now, I can’t see much difference. In Swift, it is less convenient to deal with the enum as Swift won’t know that there is only one case, and so I need to switch against it or write code that looks like it can fail (if I use if case let). I could solve this by adding a computed variable to the enum though

var email: String {
  switch self {
    case .email(let email): return email
  }
}

And now they are both equivalently convenient to use, but that computed variable feels very silly.

What it might come down to is what kind of changes to the type you might be anticipating. In the struct case, it’s easier to add more state. In the enum case, it’s easier to add alternative representations.

For example, in Sprint-o-Mat, I modeled distance as an enum with:

public enum Distance {
    case mile(distance: Double)
}

I knew I was always going to add kilometers (which I eventually did), and this made sure that I would know every place I needed to change.

If I had done

public struct Distance {
    let distance: Double
    let units: DistanceUnits
}

I could not be 100% sure that I always checked units, since it would not be enforced.

So, in that case, knowing that the future change would be to add alternatives made choosing an enum more natural. This is also reflected in the case name being miles and not distance. It implies what the future change might be.

Even so, I don’t think single case enums will replace single field structs for me generally, but they are worth considering.