Naming & Searching Files P3
osx workflow

Background

Of all the things I’ve written on this site, I’ve received the most email about Naming and Searching Files Part 1. The most common question I get relates to how my naming methods integrate with my photography workflow. In my original post, I skirted discussing photography because I’ve found this type of data is hard to properly manage. This post extends my previously discussed naming and searching methods in the context of photos and described how these methods integrate with my photography workflow.

Overview

Photo organization is a hard problem. The crux of the issue is that there is no mapping from pixels to words. Text-based data is semantic—bytes map to characters which have meaning. Unlike photos, this property allows text to be searched, filtered, compared, and classified.

Another problem with photos is that they resist taxonomy. Hierarchal file systems classify files and bind them to physical addresses in a 1-to-1 relationship. This association mimics our physical world—an object can’t be in two places at once. For the majority of data, this model works well. A mouse is a rodent, a rodent is a mammal, and a mammal is an animal. The taxonomy of photos is sometimes less clear. Photo data can exhibit a many-to-many mapping. Many photos can have many taxonomies. This notion disbars analogies to the physical world and makes photo organization more complex.

Photo data must also be robust. Unlike other documents, photos become more important with age. Safeguarding these memories from corruption, software incompatibilities, and other catastrophes is of foremost importance. For robustness, my photo system decouples data from tools. Orthogonality is a principle of fault-tolerant design. I use open standards as much as possible, which allows me to move freely among different photo software with the knowledge that my data is not dependent on any particular tool.

An effective photo system must also deal with consolidation and distribution challenges. If I take a vacation with my family, multiple people will have many unique photos that the group would like to obtain. My workflow must be able to integrate different types of data input streams and then redistribute the consolidated data.

A major issue with any workflow is finding the correct balancing between the investment required to build and maintain a system versus the derived benefits that those methods return. Annotating every photo in great detail isn’t something I’m willing to do. My workflow strives for something akin to the Pareto principle. I try to find the few things that provide the most functionality and then implement them.

Setup

Despite all of the difficulties with photo organization, I’ve converged on a system that works for me. I store my photos on a Synology DS413 NAS1 using Cloud Station. I use the DS413 because it solves many of the robustness and distribution challenges I discussed. Cloud Station is a Dropbox-style private syncing service. I use Cloud Station because it facilitates collaborative editing of my library and allows me to edit my library from multiple computers. Synology also provides photo support for Mac/iOS as well as a web interface for sharing photos. For robustness, I’ve configured the DS413 in a 4-bay RAID configuration.

Aperture is my primary tool for working with photos. I shoot most of my pictures with my iPhone and use Photostream as a common ingress point to automatically import images into Aperture.2 I use the same method to capture photos from other people using the Family and Friend’s Photostream. Whenever I go on vacation or to events where multiple people are shooting photos, I make sure to have everyone setups a shared Photostream prior to shooting so that I receive all the photos.

Metadata

At the foundation of my photo workflow is metadata. Almost all the metadata that I work with resides in two open standards—IPTC Core and EXIF. I use the IPTC Core schema because of its stability, controlled vocabulary, and broad compatibility with other tools. The core specification focuses on photo environment and contains fields for things like caption and keywords. EXIF is a natural complement to IPTC, it contains metadata about the camera and technical details of the photo such as camera make/model, lens type, shutter speed, and location.

Most cameras automatically pre-populate the EXIF and IPTC metadata fields when they shoot a photo. Some cameras even allow the shooter to control how these fields get populated. This limits the amount of manual annotation that’s needed during import.

One field I manually annotate is the IPTC Keywords field. I use keywords primarily for search. In practice, I find that I construct searches based on 3 things—persons, actions, and ratings. These keywords can be combined with numerous other metadata fields to efficiently find most photos. Each keyword is prepended with an abbreviation of the label name. A photo can contain any number of keywords or none at all. Here’s an example picture I took a few years ago on a ski trip in British Columbia:

20070321-R8567

The IPTC metadata associated with this image has an action and rating keyword:

I prepend keywords with a label abbreviation primarily for fuzzy searching. Using a label allows me to construct label-specific searches. Sometimes I’m looking for an image of someone, but I can’t remember their name, only that it starts with an A. In this circumstance I can find the image by fuzzy match on p::A. Aperture’s startswith option allows me to perform this query directly inside Aperture.

Workflow

As photos are imported into Aperture I immediately append them with some base metadata.3 New imports get flagged with the keyword r::-1 to signify that the photos have not been fully processed yet. I also use a few of Aperture’s presets to batch fill metadata fields. For example, one of my preset fills me in as the author, adds copyright information, flags the photo, and renames the image’s file name. Keywords are added on a per project or per shot basis. I also use the headline and caption fields sometimes to logically group photos belonging to an event like a wedding or vacation.

My system uses a rating keyword to represent the importance of the image. I rate the majority of images as r::1. This rating is used to filter out photos that aren’t particularly good compositions or where the subject is uninteresting. Ratings r::2 and r::3 are reserved for photos I’m likely to search for again. A rating of 3 represents the best one-hundred or so photos I’ve ever taken.

The wealth of information contained in metadata largely obviates the need for a complex file naming system. In the parlance of Naming and Searching Files Part 1, I use an ID string, token, and an incrementing counter for photos. A basic file name looks like this:

The ID string is an ISO 8601 date stamp followed by a record token indicating that the file is a photo and then an incrementing counter. The counter provides a helpful way to view a consecutive series of files while editing.

I store my photos in Aperture as referenced images rather than using Aperture to manage my photos. I use referenced images because the files are easier to access with other tools. I can also move the location of the master images independent of where the Aperture library is located. Apple also claims that referenced images take up less disk space.

With my referenced images I use a shallow folder system where files are housed in a parent directory of the year the photo was taken.4 My system doesn’t need folders for organization, but I use these shallow folders mostly for performance. The file system and indirectly Aperture can take a performance hit when a single directory holds a large number of photos.

Batch Editing

I like to use Aperture’s batch editing tools in my workflow. Applying metadata, basic image manipulations, and other post processing steps in batch is straight-forward. Aperture provides a simple way to lift metas data from one photo and apply it to another. Sometimes I need to perform a specific batch operation that isn’t possible in Aperture. For these situations, I like to use command line tools.

Exiv2 is one of my favorite command line tools because it can be used to view, search, and modify both EXIF and IPTC metadata. One common operation I do with Exiv2 is to strip metadata when sharing photos:

exiv2 -d a *.jpg

Exiv2 can also be used to add, remove, or delete metadata. This command adds the keyword foo to a group of photos:

exiv2 -M "add Iptc.Application2.Keywords foo" *.jpg

There are a number of other tools like ImageMagick that I also use for bulk watermarking and other manipulations.


  1. Thanks to Gabe for turning me on to Synology

  2. I like to use Dropbox/Hazel for photos outside of Photostream. My Dropbox/Hazel integration is pretty basic. Nate Boateng has an efficient workflow for photo organization with these tools. 

  3. Metadata is not written to master images in Aperture by default. It’s critical to make sure Aperture applies metadata to the master images. Here’s how

  4. One annoyance I’ve found with Aperture is that the default import behavior adds photos as managed not referenced. Aperture provides an option to import as referenced, but I frequently forget to check this option when importing photos. A handy trick is to setup a smart folder that watches for any managed images that should accidentally enter the photo library.