Background

For some reason, I had the dubious idea to watch Snakes On A Plane yesterday. If you haven’t seen the movie, the summary says it all:

An FBI agent takes on a plane full of deadly and poisonous snakes, deliberately released to kill a witness being flown from Honolulu to Los Angeles to testify against a mob boss.

As I struggled to pay attention throughout the movie, I started to wonder just how many snakes could possibly be on that plane? Then I started to wonder what would be the maximum number of snakes that could theoretically fit on the plane?

Estimation is integral to most of the things I do.1 I routinely use back of the envelope calculations in my work and in every day situations to estimate how long a project will take or to eat cheeseburgers. Making accurate approximations is an invaluable skill, composed of part art and part science. I’ve learned a lot about estimation simply by observing how other people work through these types of problems. There’s usually no right or wrong approach to making estimates. Accurate solutions can come from disparate strategies. This post is about some of the techniques I routinely use to make estimations and how I applied these techniques to estimate how many snakes could fit on a plane.

Estimation

My approach to this estimate problem showcases 4 main techniques which will make repeated appearances thoughout the estimation. I’ve highlighted these techniques because they are generalizable to most estimation problems:

• Divide/Conquer
• Ratios
• Limits
• Properties of error

In the movie, the snakes on the plane2 range is size from tiny to large. To simplify the problem, I wanted to estimate the volume of a single snake as a fixed spherical volume. Here in New England, the majority of snakes are small, so for this problem, I used a golf ball as a proxy for the volume of a snake. Using a golf ball also allowed me to determine how accurate my estimation was because the size of a golf ball is easy to determine.3

The first thing I usually do in estimation problems is to break them apart into smaller sub-problems. I call this approach Divide and Conquer based on the algorithm design paradigm of the same name. I use Divide/Conquer because it’s conceptually easier to reason about smaller estimations. How far is it from Chicago to Coffee Club Island? I don’t know. It easier for me to start by estimating how far it is from Chicago to Cleveland or other familiar distances and then use this information to construct a final estimate.4 Following this logic, I divided the problem into 3 main sub-problems:

1. How many golf balls would fit in a shoebox?
2. How many shoe boxes would fit in an airplane seat?
3. How many airplane seats are in a 747 cabin?

The sub-problems I’ve chosen may seem odd, but I deliberately set them up as ratio estimations. Ratio are dimensionless and frequently remove unwieldy unit conversions. I prefer ratios because I find them easier to reason about.5 The number of rows in a 747 cabin is much easier for me to estimate than the length of the cabin in meters. I’ve walked the isle of a 747 many times so my estimation in units of seats is more precise than if I were to try and make an arbitrary estimation in meters.

I attempted the first sub-problem by approximating the number of golf balls that would lay along each dimension of a shoe box. I estimated that the length, width, and height of a shoe box is between 7-14 golf balls long, 5-9 golf balls wide, and 2-4 golf balls high, respectively. These estimations are limits, not a single value, but a range of plausible values. I’ve bounded my approximations with a minimum and maximum value where I feel reasonably confident the true values is between. One of the over-arching concepts of good estimation technique is to not necessarily try and approximate the correct answer, rather try to make estimations that are unlikely to be egregiously wrong. Estimation is less about finding the right answer and more about avoiding massive errors in logic and estimation.

Finding limits is a very powerful estimation technique, especially when the range of the limits is large. I worked in bioinformatics for many years and routinely needed to make estimations on extremely large and extremely small numbers. One trick I learned to exploit was to make estimations using the geometric mean of the limits. Imagine trying to estimate a value where the true answser is somewhere between 10 and 1000. Many people would be tempted to make an estimate of 500 by approximating the average of the limits. The problem with an estimate of 500 is that if the true value is 1000, I would be off by a factor of 2. If the true value was 10, I would be off by a factor of 50. A better approach is to use the geometric mean. The geometric mean is also easy to approximate by averaging the exponents of the limits. In this case, the geometric mean produces a more sensible estimation of approximately 100. In the worst case I can only be off by a factor of 10 in either direction. The geometric mean is yet another application of ratios where the value of the estimation $x$, satisfies the property of cross products where $i$ and $j$ are the limits $\frac{i}{x} = \frac{x}{j}$ .

I knew that the volume of a rectanglar prism was the product of its width, height, and length. I estimated that the number of golf balls in a shoe box $B_s$, to be 200: I then repeated the logic I used for the first sub-problem to generate estimates for the number of shoe boxes on an airplane seat and the number of seats in a 747 cabin. Here are all the estimations I made in the sub-problem:

Sub-problemLengthWidthHeight
Golf balls in a shoebox1073
Shoeboxes in a seat336
Seats in a Boeing 7475083

The end result of these sub-problems yielded the following conversion calculation and a final estimate of $1.5 \times 10^7$ golfballs/747 cabin: By using many smaller estimations I am able to leverage statistics in my favor. I constructed my sub-problems so that they would produce approximations of roughly the same magnitude even though the units are different. I made nine total estimates with values ranging from 3 to 50. I did this because estimation error frequently follows a symmetrical distribution. In a series of estimates, the probability of under or over estimating a quantity is often approximately equivalent. By linking a series of estimations together as I’m doing in this problem, estimation error can be mitigated because the under and over estimations tend to cancel each other out in the final estimation. Choosing sub-problems where the estimations are roughly the same size protects against any one bad estimation from skewing the final estimation.

According to Boeing, the interior cabin volume of a 747 is 876 cubic meters. The diameter of a golf ball is 42.67 millimeters. Using these measurements and the average density of close-packing spheres, I can determine an accurate solution for how many golf balls would fit in a 747:

$$\frac{\pi}{3\sqrt2}\cdot \frac{876}{\frac{\frac{4}{3}\pi}{(2.13\times 10^{-2})^{-3}}} = 1.59 \times 10^7$$

My estimation for the number of snakes on a plane was $1.50 \times 10^7$ compared to the true value that I worked out above—$1.59 \times 10^7$. Any time I can make an estimate that’s within an order of magnitude of the true answer, I consider it a win. My estimate was only off by about 900,000 golf balls or roughly 5%. This is a very good approximation. The remarkable property of estimation is that it frequently involves almost no math. My solution only required that I know how to calculate the volume of a rectanglar prism and be able to do simple multiplication.

If you’re reading this Samuel L. Jackson, I’m available for Snakes On A Plane 2. If you should need me, I already have an idea for the plot.

1. Learning estimation is important. Sound estimation techniques should be a mandatory component of high school education.

2. In the movie, it’s explicitly stated that the plane is a Boeing 747.

3. I worked through this estimation in my notebook prior to calculating the real solution and writing this blog post.

4. Google Maps has no idea. Using code from Road Trip Algorithms and the Haversine formula, I calculated Chicago and Coffee Club Island to be 3,539 miles apart.

5. Percentages and probabilities are also useful for many of the same reasons.