Command Line Analytics
data zsh unix

I’m starting a new Data Scientist position in a few weeks and I decided to clean up a few things on my computer prior to starting the new job. While I was organizing some source code I unearthed a tool that I wrote last year which outputs some interesting analytics. Since switching to the Z-shell a few years ago, I’ve setup my shell to automatically append every command ever run to a zsh history file. The primary use case of this file is to allow quick search history access so that previous commands can quickly be executed. It’s also sometimes useful to pipe history to ack or grep to find a complex command run months ago.

An unexpected use case of the zsh history file is for tracking personal analytics. You can quickly look at the top ten most frequent commands without much effort in bash or zsh with this history one-liner:

$ history | awk '{a[$2]++}END{for(i in a){print a[i] " " i}}' | sort -rn | head -10

Although the output of this command is interesting, there is actually quite a bit more information stored within history. For example, look at the last 25,000 commands I’ve run on my personal computer. Each point is the number of total daily commands executed in the shell:

Most of my work happens on remote servers, but I’ve found that looking at the shell history on my personal machine is a good proxy for my collective work habits. My command usage has varied substantially over the last few months as is evident from the above plot. There’s a lot of variance in the number of commands I run each day with the median value being 109 and a range interval of [0, 774) commands/day—about 3 fold fewer commands than I run on my remote servers each day. The most obvious anomaly in my command frequency is in the month of March where I took a ski vacation. You can also see that in the weeks flanking that vacation I wasn’t running many commands. I was busy shutting down projects for the trip and getting back into the work after the trip. Between the months of April and June there is also some nice phasing correspond to a 5-6 day period length. The number of commands I run throughout the week gradually increases starting on Monday and peaking on Friday or Saturday before dropping on Sunday. The months of June, July, and August were more variable as I was traveling.

Another interesting analysis is to look at the same data but plotted by hourly frequency:

I was surprised at the hourly distribution in the above plot. I would have guessed that the frequency of commands I run on my personal computer would dip in the middle of the day, but the data is surprisingly consistent from 6am to 6pm. I avoid working in the evening and I’m almost always sleeping from 11pm to 4am, so the drop in command frequency makes sense across this time interval.

One helpful aspect of analyzing command line history is that I can see what commands I frequently execute and then build aliases and shortcuts for the especially verbose commands. Here are all the commands that I’ve run more than 50 times:

The top 6 commands dominate in frequency with ls and cd comprising a full 25% of the total commands executed. It’s amazing that these two commands account for such a large fraction of the data.1 Most of the other top commands are fairly terse, but I decided to shorten several such as aliasing python to p and xelatex to xlx.

Of all the Python commands that I run, pretty printing JSON is by far the most frequent, so I decided to take the full command and alias it:

# original example command
$ cat infile.json | python -mjson.tool

# alias
$ alias ppj='cat "$1" | python -mjson.tool'

Now it’s also easy to inspect the data structure of a JSON file by examing the last few lines from a huge file with:

$ ppj infile.json | tail -15

  1. Note that zsh has a nice setopt AUTO_CD command that eliminates much of the need for cd in zsh. Unfortunately, old habits die hard and I still have a hard time weening myself off of cd. ↩