Naming & Searching Files P1
workflow

Naming Files

Developing a systematic file naming method for your digital files is probably far more important than you think. As time progresses, software changes and memory fades. This can leave your hard work abandoned in a file format that is currently unreadable or lost in a sea of meaninglessly named files on your system. Good luck finding and opening that Word Perfect document named paper.wp4 from 1987 or a MS Word file named 1.docx in year 2040.

Ambiguous file naming is sometimes combated with an extensive folder system. However, this method leads to problems as the amount of data in your digital life grows. As the topology of your data becomes more complex, it is not always obvious which folder should contain that paper.wp4 file. The problem becomes even more apparent when trying to locate the file 24 years later. An extensive folder system is also cumbersome to navigate on a daily basis. Drilling down through 15 folders to find your paper.wp4 file is unnecessary, inelegant, and time consuming.

File naming is sometimes also ignored because it is very convenient to simply use the magical search powers of Spotlight or some other file searching tool to search for a string of text in the file you are looking for. Using this method is also a very bad idea. Good luck finding your favorite picture of your dog Timmy that you took at Aunt Mildred’s 100th birthday party in 1997 by typing Timmy and Mildred; into Spotlight. There are a lot of times when a certain file type can not be indexed by Spotlight or the file contains no text - like a picture. For these reasons, file naming is very important.

This post is part one of a two part series explaining the method I use to name digital files on my computer and how I quickly find files I am looking for. My method eschews extensive folders in favor of encoding descriptive data about the file in the file name. I feel that this method has several advantages over a folder-centric system. I hope that by sharing my strategy you can at least take away something that will allow you to be more productive and efficient at collecting, processing, and searching your digital data.1

The Overview

Most digital workflows incorporate one to three main tools to capture and organize data. These are:

  1. File Names
  2. Directories (Folders)
  3. Metadata

In a typical workflow, descriptive words or phrases are used to name files with semantic signifiers. For example, a folder named pets might have a nested file named dog_info.txt about dogs. Additional information in the form of metadata can also be associated with a file. Descriptive metadata can take the form of things like keywords (tags). For the file dog_info.txt the tag dog_tricks might be used. Structural metadata can be associated to groups of files such as a series of numbers describing an ordered collection of files. For example dog_info_1.txt, dog_info_2.txt, dog_info_3.txt, etc. Administrative metadata such as when the file was created or file type can also be associated with files.

A main advantage of using the traditional hierarchical file system with many files inside of many folders is that it is scales. A system starting with one file inside one folder can readily expand to accommodate thousands of folders with millions of files. As more files flow into this system more and more folders are easily created like smaller and smaller Russian nesting dolls to house the files. In contrast, keywords do not scale well. Starting with the initial system of one file inside one folder, imagine starting with one keyword. As more files and folders are added more keywords must be used. The problem is that if we decide to add a new keyword called puppy_training to our 10,565th added file, it is likely that many of the previous 10,564 files also should be tagged with the keyword puppy_training. Thus, we are faced with two ugly options:

  1. search 10,564 files to determine which files should get the keyword puppy_training
  2. create a file system with an incomplete index of keywords

Despite their drawbacks, keyword-based filing system do have advantages. A main benefit of the keyword approach is that one file can be bound to a virtually unlimited number of keywords. This system creates a powerful relational database were files are linked to related files through the use of common keywords. This system is analogous to the internet or a SQL database where data is linked to each other to create a data-rich network of interconnections.

The Method

My method started with my own experimentation in 2007 and borrows heavily from the excellent ideas shared on the Scrivener forum. In 2007 I was finishing my Ph.D. and I was working with thousands of text files. This work necessitated a highly organized method to name and retrieve files. I also wanted a file naming convention that would be future proof - that is: 1) a method that could be used on any operating system I choose to use in the future 2) a method not reliant on any proprietary software.2

The only future-proof data are folders and the file name. Therefore, the method I use incorporates only three folders and four unique text descriptors in the file name. When combined, this strategy allows each file I create to be easily retrieved by a quick search method. That’s it. No fancy tools or software.

The three folders I use are:

  • Inbox: Unprocessed files (downloads, etc.)3
  • Active: Files I am currently working with
  • Archive: Files that I am not currently working with

The four classes of descriptors I use in the file name are:

  • ID strings
  • Tokens
  • Keys
  • File extensions

An example file name might look like this:

File naming syntax

The above example might seem illegible at first, but stick with me. After I explain the purpose of each descriptor I think you will see the power of this method.

ID strings

In my workflow, a file name always starts with an ID string that contains a date and time stamp organized as YearMonthDay_HourMinuteSecond (YYYYMMDD_HHMMSS).4

The main advantage of this descriptor is that the ID string contains all three metadata forms:

  • descriptive (a unique string of numbers)
  • structural (a series of consecutive numbers)
  • administrative (a unique date and time of creation for each file)

Naming files starting with this ID string creates a unique prefix for every file on my computer since no two files can be created at the exact same date and time. The unique ID string prefix allows all files to be quickly organized by their unique date and time to create a timeline of files organized by creation time. In Part 2 I will also explain how the ID string will help with searching and finding files.

This chronological ordering allows me to easily retrace my daily steps to quickly find a file that I created this morning for example. The ID string is also helpful when, for example, I can remember creating a file in the fall of 2001, but I will have no recollection of what I called the file. As we will see in Part 2 of this post, using the date and time ID string makes finding files like this extremely easy.

Tokens

The second class of descriptor in my file names are called tokens. In computer programming tokens are reserved words, operators, or punctuation marks that have specific meanings. In my file naming system tokens are reserved combinations of three characters; a letter followed by two numbers. Each letter or number in the token describes something about an individual file. The meaning of each of the token components is illustrated in the table below. Letter tokens can either be upper or lowercase. Uppercase denotes an internal file that is private. A lowercase letter indicates the file has been shared with someone or is in the public domain. The first number in a token can be either a .1 or a .2. A personal file is denoted with .1, while a work-related file is denoted with a .2. Putting it all together, a file with the token r1.3 would be a file that contained something I created and shared externally that was a personal file relating to design and visuals. Using this system, 90 unique token combinations are available for each file. The number of tokens can be expanded at any time by added additional letters or numbers. An important point to remember is that sometimes a file can satisfy several tokens. For example, a published scientific manuscript could have the token r1.2, but if it also contained figures it could have the token r1.3. In cases such as a file that can satisfy several tokens the lowest number value token is always chosen for the file name - in this example, the file would take the r1.2 instead of r1.3 token.

  • -RRecord: something created—writing, pictures, etc
  • -IInformation: something collect—articles, bookmarks, etc
  • -CCommunication: something exchanged—email, IM, etc
  • .1Important documents: backups, finance, taxes, etc
  • .2Writing: blog, manuscripts, books, cover letters, reviews, etc
  • .3Design and visuals: art, scientific figures, seminar slides, etc5
  • .4Life: recipes, productivity, vacations, etc
  • .5Commerce: transactions, returns, etc

Keys

Keys are third class of descriptor in my file naming convention. Keys are analogous to keywords, but unlike keywords (tags), keys are stored within the file name. Most people already use keys to name their files. A file about cats might be labeled cat_info.txt In this example, the word cat is a key. Keys are usually words that describe a file and their implementation is usually unique each individual person. I do not have a rigid method for using keys in my file naming, but I do adopt some common phrasing such as starting all files about recipes with the key ‘recipe.’ I find it useful to always ask myself “what keys would I use to remember this file if I were looking for it in 15 years?”

File Extensions

The final class of descriptor is the file extension. A file extension is added as a suffix to indicate the file format of it’s contents. Since file extensions are already part of the file name, I incorporate them into my file naming method. In Part 2 I will explain how to search by filtering through the file name and how this can be useful to find certain files.

Make sure to check out Part 2 where I discuss how to search for files.


  1. I recommend also looking at the very interesting workflows of Brett Terpstra, Ben Brooks, and those discussed on the Mac Power Users Podcast episode MPU045. ↩

  2. My file organization method has the added benefit of allowing me to move into and out of a myriad of software applications such as DEVONthink, EagleFiler, or Vector NTI to gain addition functionality without corrupting my data. ↩

  3. The inbox file resides on my desktop and any file on the desktop is automagically moved to this folder via Hazel. ↩

  4. Naming files starting with YYYYMMDD_HHMMSS may seem like a lot of work, but I use Text Expander or Hazel to do this automatically. ↩

  5. Note that my photography collects contains some additional syntax for finding relevant pictures using text searching that is beyond the scope of this post and is something I am still experimenting with. ↩