Getting Podcast Stats from S3 Web Access Logs

I self-host the Write While True podcast using the Blubrry PowerPress plugin for WordPress and storing the .mp3 files in S3.

One downside of self-hosting is that you don’t have an easy way to get stats. Luckily, podcast stats aren’t great anyway, so whatever I cobble together is honestly not that different from what a host can give you, The only way to do better is to do something privacy-impairing with the release notes (tracking pixels) or by building a popular podcast player—neither of which I’m not going to do.

So, I use S3 and set it up to store standard web-access logs, so I have a log of each time the .mp3 file was downloaded. The main thing you need to do is get the log files local, and then you can count up the downloads by filtering with grep and counting with wc.

To download the logs, I use the AWS CLI (command-line) tool. Once you install it, you need to authenticate it with your account (see docs). Then you can use:

aws s3 sync <BUCKET URL> <local folder>

To bring down the latest logs.

The first thing you might notice is that there are a lot of log files. Amazon seems to create a new file rather than ever need to append to an existing one. Each file only has a few log lines in it in a typical web access log format. I store them all in a folder called logs.

I name the .mp3 file of every episode of Write While True in a particular way, so those lines are easy to find with

grep 'GET /' logs/*

This is every download of the .mp3 files. There’s no way to know if the user actually listened to it, but this is the best you can do.

It does overcount my downloads though, so I use grep -v to filter out some of the lines

  1. Lines containing the IP address inside my house
  2. Lines that contain requests from the Podcast Feed Validator I use
  3. Lines that contain requests from the Blubrry plugin

The basic command is:

grep 'GET ...' logs/* | grep -v 'my ip address' | grep -v 'other filters' | wc -l

This will give you a count for all episodes, but if you want to do it by episode, you just need to grep for each episode number before counting.

I’ll probably script up something to create a graph with python and matplotlib at some point. If I do, I’ll post the code and blog about it.