Road Trip Algorithms
haskell algorithm

Background

As you’re reading this, I’m likely enveloped by corn on a lonely road in the Midwest on my way to the East coast. My family is moving back to New England. As we started planning our route to the East coast, we had a difficult time deciding how best to choose an optimal route. We wanted to take some time to visit old friends and interesting places as we tic-tacked across the United States. Who to visit? What attractions to see? Best order of destinations to visit? We decided to visit 6 destinations along the way, but we were unsure how to traverse our route in a way that would minimize our total distance traveled.

To start this project, I constructed a list of zip codes corresponding to places we wanted to potentially visit. In total, I had about 50 zip codes representing the location of family and friends’ homes, National Parks, and other attractions like cheeseburgers. How could I find the shortest route through 6 destinations among all the possible zip code combinations on my list? To solve this problem I turned to graphs and Haskell.

Graph Algorithms

Let $ G = (V,E) $ denote a connected and undirected graph composed of a finite set of vertices, $ V $ and edges, $ E $. Vertices represent potential zip code destinations to visit and edges represent paths connecting vertices. Edge weights, $ W $ represent the distance from the one vertex to another. The graph follows these assumptions:

  • weights are positive: $ \lbrace w \in \mathbb{R}: w > 0 \rbrace$
  • no self-loops: an edge cannot have the same vertex as source and destination
  • repetition is not allowed: in a given path, a vertex cannot be visited twice

There are many well-known algorithms for solving graph problems. One of the most famous is the shortest path problem where the objective is to find the minimal cost through a graph from a source to a destination vertex. For example, Dijkstra’s algorithm finds the minimal cost through a graph assuming all edge weights are non-negative.1 Since the 1950’s, many additional algorithms have been invented to solve this problem. Notable methods include: A-star, Breadth-first, and Bellman-Ford just to name a few. Dijkstra’s algorithm remains popular because it is reasonably fast and relatively easy to implement.

The problem I wanted to solve was not a shortest path problem, but a variant problem I call the v-shortest path (VSP) problem where the objective is to find the shortest path through a fixed number of vertices. I am not aware of an algorithm that solves this problem out of the box. To illustrate what I’m describing, consider the hypothetical graph below. What is the shortest path from Smallville to Dover which travels through exactly 4 vertices?

It’s obvious for this contrived example that there is only one solution:

Smallville -> Oakdale -> Hill -> Dover

I am not an algorist, but finding a solution to the VSP problem is, I think, NP-hard as it reduces to a traveling salesman problem. In thinking about how to go about solving the VSP problem, I borrowed an idea from a related family of algorithms called the k-shortest path (KSP) algorithms, which are an extension of the shortest path problem. The objective is to find multiple, k-shortest paths. The native implementations of most KSP algorithms give the top k-shortest paths, which can pass through any number of nodes.

To solve the VSP problem, I decided to implement my own variant of a KSP algorithm called Yen’s algorithm. Yen’s algorithm takes advantage of an interesting property of a shortest path: optimal substructure. A subpath of a shortest path is itself a shortest path. The idea of Yen’s algorithm is to find the shortest path through a graph by using a shortest path algorithm. Once the shortest path has been found, take each vertex in the shortest path and spawn another shortest path called a spur, then repeat until the specified number of shortest paths are found.

My VSP variant of Yen’s algorithm makes a few concessions. For one, a solution is not guaranteed. If a VSP solution exists, but it is not a shortest path, we ignore it. I made this compromise since finding the v shortest path through 50 vertices in a graph composed of many paths would take a very long time as every path permutation and associated costs through all vertices would potentially need to be computed.

Solution

I’ve chosen to implement my solution in Haskell because this problem is suited to a functional style. Haskell is an exceptional language that looks and acts quite a bit differently than most other programming languages.2 Haskell is purely functional, lazy, higher-order, and strongly typed. Purity makes a program easier to reason, it frees my mind from the mental overhead of trying to remember the side effects of mutable state. Another advantage of Haskell is its strict and unobtrusive type system. If a Haskell program compiles, it is almost always correct. Haskell has many more advantages—elegant parallelism and concurrency, an easy to read mathematical-like syntax, code brevity, and exceptional performance for a high-level language—it can even sometimes outperform C. These advantages are nice, however the real reason to learn Haskell is that it makes you a better logician and programmer in whatever problems you encounter or future languages you may learn.

Below I highlight only some of the important parts of my solution to the VSP problem. If you want to see the entire solution, I’ve placed the code on my project page.

Graph Construction

In the Smallville to Dover problem above, cost weights across paths are given. Unfortunately, I started the road trip problem without knowing the weights between zip code destinations. How can distance weights be generated in a graph of zip codes? I was able to find a government data file with all the US zip codes and corresponding latitude and longitude coordinates. From this data, I wrote a recursive function that reads the data file as a bytestring and matches zip code data to my list of zip code destinations. This function produces a list of three-membered lists containing a zip code destination from my list of possible destinations and the latitude and longitude for each corresponding zip code.

-- |collect matching zipcode and location data from a bytestring
filterZips :: [String] -> [[BC.ByteString]] -> [[String]]
filterZips       _ [] = []
filterZips       [] _ = []
filterZips (x:xs) dat = elimEmpty $ map (filterByzip x) dat ++ filterZips xs dat
    where 
        elimEmpty = filter (\i -> i /= [])
        filterByzip z lst =  matchHead z $ map BC.unpack lst
        matchHead match lst 
            | match == head lst = [head lst, lst!!7, lst!!8]
            | otherwise = []

If you know the latitude and longitude coordinates of two places, the distance between the places can be calculated using the Haversine formula. Fortunately, the GPS package on hackage provided the necessary data types and functions for this calculation. This is the function and data type I wrote to calculate distance in kilometers from two zip codes:

-- |find distances in km from 2 lat/long coordinates
zipDist :: [Waypoint] -> (String, String, Float)
zipDist [cs,cs'] = ((zipcode cs),(zipcode cs'),(mDist cs cs'))
    where 
        mDist xs xs' = fromIntegral 
            (round $ GPS.distance (dms xs) (dms xs') / 1000) :: Float
        dms p = GPS.degreePairToDMS (lat p, lon p)

data Waypoint = Waypoint { zipcode :: String
                         , lat :: Double 
                         , lon :: Double
                         } deriving (Read, Show, Eq, Ord)  

Using the above function, I wrote a list comprehension function called edges, which generates all the edges in the graph. If you’ve ever written any Python, you will recognize the list comprehension, which was borrowed from Haskell. The final function binds together all the functions I’ve discussed thus far. I’ve added an additional filter function to exclude paths greater than 1000 kilometers because we did not want to drive more than 1000 km on any given day.3 Here is an example of what the final code looks like in action:

module Gengraph where
import qualified Data.ByteString as BS
import qualified Data.ByteString.Char8 as BC

main :: IO()
main = do 
    let zips = ["98109", -- Seattle, WA
                "98404", -- Tacoma, WA              
                "99362"] -- Walla Walla, WA     
    bytes <- BS.readFile "zip-code-data.txt"    
    print $  genGraph zips bytes

-- | Build a graph from a list of zipcodes
genGraph :: [String] -> BC.ByteString -> [(String, String, Float)]
genGraph zipcodes bytes = filter (\(x,y,z) -> z < 1000) 
                          $ map zipDist . edges $ filterByZips' zipcodes bytes
    where 
        filterByZips' zips b = filterZips zips $ parseBytes b
        parseBytes dat = map BC.words . BC.lines $ dat

Reading files in Haskell is a big deal. Interacting with files constitutes an impure action, so I’ve wrapped this impurity inside a do block, which is syntactic sugar for the IO monad. The output of the above code is a list of nested tuples containing a source vertex, destination vertex, and distance in kilometers as edge weight:

[("98109","99362",351.0),("98404","98109",47.0),("98404","99362",337.0)]

Next, I needed a shortest path algorithm. I choose Dijkstra’s algorithm because I was able to find a really nice Haskell implementation from Bonsai Code. Using this base algorithm, I wrote an additional function that calculates the VSP through a given number of nodes, if the solution exists. I call the function roadTrip. Here’s the function in action on an example road trip through 6 destinations from Seattle to Boston:

module Main where
import qualified Data.List as DL
import qualified Data.ByteString as BS
import qualified Gengraph as GG
import qualified Dijkstra as DJ

main :: IO ()
main = do 
   let zips = ["98109", -- Seattle, WA
               "84148", -- Salt Lake, UT
               "89503", -- Reno, NV
               "89311", -- Great Basin, NV
               "83704", -- Boise, ID
               "59434", -- Glacier, MT
               "82190", -- Yellowstone, WY
               "57750", -- Interior, SD
               "80002", -- Arvada, CO
               "66605", -- Topeka, KS
               "52245", -- Iowa City, IA
               "55405", -- Mineapolis, MN
               "63101", -- St. Louis, MO
               "44105", -- Cleveland, OH
               "16254", -- Shippenville, PA
               "19099", -- Philadelphia, PA
               "15201", -- Pittsburgh, PA
               "60563", -- Naperville, IL
               "02109"] -- Boston, MA

   bytes <- BS.readFile "zip-code-data.txt"
   let graph = GG.genGraph zips bytes
   print $ roadTrip 6 "98109" "02109" graph

roadTrip :: Ord a => Int -> a -> a -> [(a, a, Float)] -> [[a]]
roadTrip n s t g = filter (\path -> length path == n) $ soln s t g 
    where
        soln source target gr = DL.nub $ map (optpath source target) $ permute gr
        optpath source target dat = DJ.shortestPath source target $ DJ.buildGraph dat
        permute xs = [beginning ++ end | (beginning, ignored:end) <- div xs]
            where
                div xs = zip (DL.inits xs) (DL.tails xs)

The shortest path though 6 destinations in this graph is:

[["98109","82190","57750","52245","44105","02109"]]

Seattle -> Yellowstone -> Interior -> Iowa City -> Cleveland -> Boston

  1. I’ve heard that Edsgar Dijkstra invented his famous algorithm in about 20 minutes while at a coffee shop. ↩

  2. If you are interested in Haskell, I highly recommend Learn You a Haskell for Great Good. Yann Esposito’s Learn Haskell Fast and Hard post is also excellent. ↩

  3. Distance is calculated ‘as the crow flies’. ↩