Ski It If You Can
javascript visualization design data

Introduction

This is a meta-post about data visualization and a project I’ve been working on to determine what mountain has the most difficult ski terrain in North America. In this post, I discuss the design and implementation of my project—Ski It If You Can.

Background

My motivation for this project came from a recent ski trip I took to Mad River Glen. MRG is an iconoclast—the last vestige of what East Coast skiing used to be. The parking lot is dirt, the trails are ungroomed, and the terrain is a melange of rock, grass, ice, and powder. It’s a blast.

MRG’s moniker is “Ski It If You Can.” They are not exaggerating. Despite having a bevy of difficult terrain, I was curious to see if I could measure the ski difficulty of different mountains to answer two questions:

  1. How does MRG rank in terms of difficulty against other mountains?
  2. What mountain has the most difficult skiing in North America?

I decided to measure the difficulty of different mountains using trail composition. Ski resorts in North America use a rating system to measure trail difficulty. The rating system is used for safety to help skiers gauge which runs are suitable for their ability level. Most mountains classify trail difficulty into three categories—green, blue, and black, signifying beginner, intermediate, and expert trails, respectively. The classification system is widely used and fairly standardized across different mountains making it a convenient metric for measuring difficulty.

Data Collection

I started this project by gathering data. The best data source I could find for ski information was OnTheSnow. Using this data and supplementary information from various resort websites, I collected mountain statistics for 2028 ski mountains in North America.

Prior to using the data, I had to standardize trail difficulty between mountains. Rating systems at the expert level differ slightly between mountains. Some mountains sub-divide expert trails into separate classifications. The most common gradation is to use a black diamond to symbolize expert trails and a double black diamond to represent expert expert trails. Not all mountains use this classification, so I needed to standardized trail ratings across the data by aggregating all sub-classifications of expert trails into a unified class. This transformation allowed me to compile standardized data for all resorts where trail difficulty was expressed on a three class ordinal scale—green for beginner, blue for intermediate, and black for aggregated expert classifications.

Design

Creating visual representations of ordinal or categorical data is difficult. The problem is especially thorny when the number of samples exceeds roughly 20. The canonical data visualization for ordinal and categorical data is the bar chart. As the number of samples in a bar chart increases, the visualization has to extend along one axis to capture all the samples. A large sample sized forces the designer into making an undesirable trade-off between eliding data to simplify the visualization or creating a figure with unwieldy dimensions.

I struggled with this problem in my whisky analysis and I frequently see the same problem in the media. Here’s an example from the Economist where a viewer would like to see the attitudes about wealth for all countries, but, due to size constraints, the Economist chose to only show a few countries:

I encountered the problem I just described again in this project. With over 2000 samples of ski data, a bar chart was not a viable option for visualizing the data. For this project, I was able to exploit a property of the data that allowed me to build a visual representation in a finite space without the need to elide samples.

When working with data that is a composition, that is, all samples sum to a constant, a ternary diagram can be used to visualize the data. In my project, and in the Economist chart above, the composition is a percentage. My data is the proportion of beginner, intermediate, and expert ski trails at a given resort. This property allowed me to represent four-dimensional data in three-dimensions and render it in a two-dimensional space because there is only one experimental unit.

Implementation

I built the ternary plot and small multiple visualizations for this project with Data-Driven Documents. I’ve never seen a ternary diagram implemented in D3, so I’ve summarized some of the salient features of my implementation in this section.

In the outer level of my code, I created a self-referencing anonymous function that contains logic to make the design responsive. When the browser window is resized, the ternary diagram auto-scales with the width of the browser. I’ve attached an event handler that uses a callback to update the visualization with the proper dimensions given the current width of the browser. This feature also makes the chart reusable.

All the logic for generating the SVG is contained inside the render function. The function has an arity of one and accepts an associative array containing the dimension parameters for the visualization. Here’s my code:

(function() {
  var dims = function(sel, fn) {
    var viz = $(sel)
      , thisWidth = viz.parent().width()
      , s = thisWidth > 600 ? 600: thisWidth
      , t = s * 0.08
      , m = {top: t, right: t, bottom: t, left: t}
      , w = s - m.left - m.right
      , h = s - m.top - m.bottom
      , params = {s:s, t:t, m:m, w:w, h:h};
    fn(params);
  };

  dims('#ternary-plot', function(d) {render(d);});

  $(window).on("resize", function() {
    d3.select("svg").remove()
    dims('#ternary-plot', function(d) {render(d);})
  });
})();

Each datum in the ternary visualization encodes the trail composition of one ski mountain. For example, Stowe Mountain is composed of 16% green, 59% blue, and 25% black. These 3-tuples represent points in three dimensions and are commonly referred to as barycentric coordinates.

To build the diagram, it was necessary for me to map the barycentric coordinates into cartesian space. The cartesian coordinate $(x, y)$ is obtained as a linear combination of the vertex coordinates in the ternary diagram. Wikipedia provides a clear explanation of this mapping. Here’s my implementation in Javascript:

// map barycentric to cartesian coordinates
// (assumes input data is normalized)
var bary = function(a, b, c) {
  var x = b + c / 2
    , y = Math.sqrt(3) / 2 * c;

  return {x: x, y: y}
};

I initially visualized the data as a scatter plot in my ternary diagram where each point represented the composition of one mountain. I didn’t like the results of this implementation. With over 2000 points, the data was too dense. Many of the points were overlapping and it was hard to see specific mountains because they were occluded by other points.

To remedy this problem, I quantized the data. Only 3 regular polygons tesselate in a Euclidean plane, so I choose the shape with the most vertices, the hexagon, to minimize the distance from any coordinate in my visualization to the centroid of the closest bin. I then used color and hue to encode the mode and density of each bin.

Supplemental

Once I had identified the most difficult ski resorts using the ternary plot, I wanted to create another visualization to highlight how the two most difficult resorts differed from each other. Vertical elevation change from the top to the bottom of the mountain was one of the most striking features in the data, so I decided to design something around this variable. I decided to use a bubble chart and small multiples to convey the size, composition, and elevation change of different mountains.

At the time I was creating this visualization, I was (and still am) inspired by William Playfair’s unorthodox usage of circumferences and small multiples in his analysis of European economics at the turn of the 19th century. This visualization was my main inspiration: