A few days ago I found myself in the midst of an interesting knapsack problem. I was preparing for an endurance race and I wanted to minimize the weight of the gear I needed to carry during the race. I was concerned with weight because I was running The Death March, a classic test-piece among endurance athletes that weaves through the Pemigewasset Wilderness. With 20,000 feet of vertical elevation change over 34 miles, The Death March is well-deserving of its name. This post is about The Death March and the algorithm I used to help me select the optimal gear to carry in the race.
Endurance racing is an interesting resource allocation problem because of the inherent trade-offs between weight and speed. The longer a racer estimates it will take to complete a race, the more food, water, and supportive gear they need to take with them. However, the more weight a racer carries, the longer it will take them to complete the race. Reducing weight improves speed, but at the expense of safety and comfort. Take too little gear and risk hypothermia and dehydration. Take too much gear and risk not finishing the race in a reasonable time frame. Selecting the most useful combination of gear at the least possible weight is crucial for a successful race.
Completing The Death March in October added another layer of complexity to the race. Capricious autumn weather in the Pemi forces racers to carry extra gear to deal with fast moving winter storms. At one point in my race, the weather changed from a calm and sunny 55°F to blasting sleet and 70 mph wind all within about 2 hours.^{1} It was crucial that I carry extra gear to accommodate for these weather conditions, while also limiting the amount of added weight in my pack.
I formulated my optimization problem as a knapsack problem. To explain how I found the best set of items to carry for my race, I’ll start with some defining some notation. Let $x_i$ denote an item that I considered carrying in the race. Associated with each item were two variables—the physical weight of the item, $w_i$ and a value of importance for the item, $v_i$. If $x_i$ was a topographic map for example, it would have a large value of importance because maps are needed for navigation. The objective of this problem was then to find a subset of items that maximized the value of importance carried in the race without exceeding the total weight capacity, $C$:
$$ \begin{eqnarray} \mbox{maximize} && \sum_{i=1}^n v_i x_i \\ \mbox{subject to} && \sum_{i=1}^n w_i x_i \lt C \\ && x_i \in \{0,1\}, \: \{i = 1, \dots, n\} \end{eqnarray} $$
The naive approach to this problem is to compute all $2^n$ possible item combinations and then find the best subset of items to pack. Brute-force would be a good strategy if the item set was not overly large; however, I had dozens of items I was considering carrying in the race, so I needed to use a solution that was faster to compute than exponential time.
The knapsack problem is well-studied and many algorithms exist to solve different applications of the problem. A common improvement to the brute-force approach is to use backward induction and dynamic programming techniques to find intermediate solutions to some of the problem’s sub-problems so that these solutions can be reused throughout the main problem. The advantage of using dynamic programming is that it generally leads to faster solutions, but at the expense of a larger memory footprint.
I’ll use a minimal working example to help explain the dynamic programming approach I used for my race. Consider a racer who wishes to carry a total weight of less than 8 kilograms in a race and has only 5 items to choose from. The goal is to maximize the value of items in the knapsack such that the total capacity is less than 8 kilos. Here are the weights and values of each of the 5 items numbered 1–5:
Most implementations start by building a matrix, $A$ of $N$ columns and $M$ rows where the $j$th column represents a specific integer weight of the knapsack from $0$ to $N$ and the $i$th row represents an index to the item being considered for inclusion in the knapsack, $\{0,\dots,M\}$, where $i=0$ indicated no item. The first objective is to complete the matrix from top left to bottom right and find the value in each cell of the matrix. Once the value in a given cell has been determined, the solution can be reused to help find other solutions in subsequent cells.
It’s clear that if there are no items in the knapsack and/or the weight of the knapsack is 0, the total value in the knapsack must also be 0. For all cells that satisfy one or both of these statements, the value in the cell is set to 0. For all other cells it is necessary to find the maximum of two possible solutions: 1. inherit the solution from the cell above the current cell, $A_{i-1,j}$ or 2. choose a better solution if it exists. This second solution is found by first computing the residual weight in the knapsack by finding the difference between the column weight and the weight of the current item. This quantity gives a shifted column index in the row above the current cell, $A_{i-1, residual}$. This index specifies the cell containing the value of the residual weight. Once the residual value has been found using the index, this value is added to the value of the current item and the maximum of this sum and solution 1 is the value chosen to enter into the current cell. Each cell value is computed in this manner until the entire matrix is completed. The completed matrix for this problem looks like this:
Next, the optimal set of items to include in the knapsack can be determined by retracing the correct branch of the recurrence that produced the optimal value in the last cell, $A_{N,M}$. I’ve drawn the correct branch as a line through the matrix in the drawing above starting at $A_{5,8}$ and moving backward in the matrix toward $A_{0,j}$. The circles containing values greater than zero denote which items are to be include in the knapsack. These cells were found by starting with the optimal value in $A_{N,M}$ and subtract the weight of each item in the row above the current cell and then shifting left in the the row by this index value. This process is essentially reversing the residual computation performed above when the matrix was being filled. The best set of items to carry in this minimal working example subject to the weight constraint of 8 kilos is: $\langle 5, 4, 3 \rangle$.
I estimated that I would need to carry no more than 6 kilograms of gear in my race. My goal was to then find the best combination of items to carry in the race which would provide me with the most value subject to this weight constraint. The complete data and code I used in my knapsack problem is available from the GitHub repository that accompanies this blog post. I wrote an implementation for this algorithm in Python. The main function knapsack
takes two parameters, an integer value, C
indicating the capacity of the knapsack and an array of dictionaries, items
where each map represents one item containing a name, integer weight, and integer value:
#!/usr/bin/env python3 # -*- coding: utf-8 -*- import sys import json from collections import defaultdict, namedtuple, deque from itertools import product def item_set(filename): with open(filename) as infile: return json.load(infile) def knapsack(C, items): Datum = namedtuple('Datum', 'x, w, v, url') items = [Datum(**_) for _ in items] A = defaultdict(int) Q = deque() S = len(items) N = range(1, S + 1) M = range(1, C + 1) for i, j in product(N, M): abv_val = A[i-1, j] prev_elem = items[i-1] res_wi = j - prev_elem.w if j < prev_elem.w: A[i, j] = abv_val else: A[i, j] = max(abv_val, A[i-1, res_wi] + prev_elem.v) while S > 0: nxt = S - 1 if A[S, C] != A[nxt, C]: Q.appendleft(items[nxt].x) C -= items[nxt].w S = nxt return Q def output_fmt(packed_items, knapsack_data): item_set = set(packed_items) for item in knapsack_data: if item['x'] in item_set: line = '[{x}]({url})'.format(**item) print(line) def main(): json_data = item_set(sys.argv[1]) knapsack_data = [ {'x': _.get('item'), 'w': _.get('weight'), 'v': _.get('value'), 'url': _.get('url')} for _ in json_data ] packed_items = knapsack(6000, knapsack_data) output_fmt(packed_items, knapsack_data) if __name__ == '__main__': main()
In total, my algorithm suggested I take 27 items with a combined weight of 5,964 grams, which was only 37 grams less than my desired maximum weight capacity. I took almost the exact set of gear my algorithm suggested with the exception of only a few items. Here are the items my algorithm selected for me to carry in the race:
Below, I’ve highlighted a few of the items I carried in the race and my rationale for selecting each item.
Choosing the appropriate footwear for racing was paramount. I gave footwear a high value of importance to ensure my algorithm would select shoes for the race; I wasn’t going to run barefoot. Footwear selection is critically important for racing because carrying weight at the extremities is costly. Over the course of many miles, repeatedly lifting extra weight on the feet is especially fatiguing. Therefore, I wanted my algorithm to select a race shoe that was light but with enough protection and rigidity to handle moving quickly without damaging the feet from impact with rocks. My algorithm selected the Salomon Speedcross paired with Kahtoola MICROspikes.
The Speedcross is a perfect shoe for this race because it is light-weight and breathable while still providing enough rigidity, cushioning, and protection to handle the rock-laden trails. In October, The Death March is frequently veneered with ice, so a traction system was necessary to prevent slipping on icy scree. Kahtoola’s microspikes are lightweight, easy to take on and off, and provide most of the functionally of traditional crampons.
Along with footwear, I placed a high value on a layering system for clothing I felt would be important for the race. The logic of layering certain pieces of clothing together was to provide maximum versatility for a broad range of weather conditions, ecosystems, and activity levels. The layering concept is similar in theory to my Matryoshka Hardwear. Each layer can be used alone or in combination depending on conditions. My layering system was tailored for the three main ecosystems I encountered in the race—acadian forests, the taiga, and alpine tundra.
A good layering system has two main functions; it draws perspiration away from the body and protects the body from the environment. These two jobs require fabrics that are highly breathable, fast drying, and remain warm even when wet all while shielding the body from wind, rain, sun, and snow. Perspiring in alpine climates can be especially dangerous because lingering moisture can quickly cool and induce hypothermia. Selecting outerwear composed of the right materials is vital for keeping the body at the proper temperature.
The job of a base layer like the Patagonia Lightweight Crew was to wick moisture away from the body and keep the skin dry. On top of the base layer, I wore a Patagonia R2 fleece as my mid layer to help evaporate moisture, transport it away from the base layer, and vent moisture into the air. The mid layer also provided added insulation to help retain heat close to the body. When the temperature dropped or winds intensified, I added a Patagonia Nanopuff Pullover synthetic jacket for warmth over my base and mid layer. I prefer synthetic fabrics at this time of year because they stay warm even when wet. My final layer was an Arc’teryx Alpha FL shell jacket and Venta SV gloves, which protected my skin and the other under layers from wind and precipitation. Together or in different combinations, these layers allowed me to stay safe while moving fast and light through the mountains.
Altogether, a very fun albeit exhausting trip!
A passing hiker coming from Greenleaf informed me of the wind speed. I later confirmed the information was accurate using data from the Mt. Washington Observatory. ↩