Information Processing
cognition workflow

Motivation

The human brain has a capacity to store an estimated 2.5 petabytes of data and execute 6.418 nerve impulses per second. Even with such a formidable tool, the amount of information passing though us each day is impossible for the human brain to fully process and retain. This point is illustrated by Timothy Wilson, in his book, Strangers to Ourselves:

The human mind can take in 11 million pieces of information at any given moment. The most generous estimate is that people can be consciously aware of 40 of these.

The dawn of the information age has laid out a veritable feast of information to consume. Digitization has made knowledge accessibility easier than ever before and the resulting wake is challenging the limits of human cognition. There is no solution for processing large volumes of information that exceed human faculty, but this does not indicate that proficiency can not be improved.

This post is motivated by my own interests as to how information can be processed most effectively. Below I explore how to digest the never-ending streams of data brought forth by the birth of the information age and summarize my personal heuristics for solving this problem. This post is written as a broad overview of my workflow. I view the techniques described herein as media independent and use the same approach for consuming information in text, audio, or video form. I use the term information processing throughout this post as a phrase to describe the procedure of finding, consuming, and deriving insight from data.

I subdivide information processing into three distinct phases:

  1. Filtration
  2. Consolidation
  3. Retrieval

Filtration

The goal of filtering is to maximize the signal-to-noise ratio of information deemed important. The sad beautiful fact is that the amount of information in this world is almost infinite and only by focusing on a subset of related information can learning be achieved. Effective filtering requires at least two elements—defining filtering objectives and automation.

The purpose of defining a filtering objective is to maintain focus on a predefined goal. Failure to develop a clear filtering objective can lead to procrastination and distraction. I’ve lost many hours following interesting links around the Internet on topics that had nothing to do with what I was supposed to be researching. Constructing an explicit filtering objective keeps me focused and less likely to follow information that is not related to the task at hand. Defining a filtering objective may be as simple as stating the intended goal of filtering, for example:

I will collect information about the life of George Meade for a biography I am writing.

Sometimes the goal of filtering is complex. How should news, RSS feeds, Tweets, or post on Reddit be filtered? How can a filtering objective be defined for these goals? For these cases, I find the best solutions utilize reference characteristics to ‘train’ an automated tool as to what filtered should produce. An example of a this type of tool is Zite’s Worio algorithm, which suggests potentially interesting content to users using a training data set from a user’s RSS feeds or other data. Applications like Zite are in their infancy, but this type of tools is the only long-term solution to filtering at a large scale.

The best filtering workflows utilize automation to select relevant content, which frees time for consolidation. Manually filtering content is time consuming and inefficient. In addition to Zite, I’ve successfully used crowdsourcing tools like Mechanical Turk and oDesk as well as search APIs like DuckDuckGO and MediaWiki API to automate filtering.

Prebuilt tools are great, but to grok filtering I find it necessary to built my own tools. Machine learning algorithms can be extremely useful for creating custom filtering tools.1 If generating custom filtering tools seems daunting, consider a simple crowd sourcing solution that replaced manually reading most of my RSS feeds. Every geek reading this post probably uses Pinboard. Pinboard represents a human curated pre-filtered information data set containing the best Internet content updated in nearly real time. I’ve found Pinboard superior to social networks and news aggregation sites for finding information because Pinboard costs money and people only save things they themselves find valuable. Writing a custom script to filter this data set by popularity or a specific tag using the Pinboard API is trivial and saves an appreciable amount of time for downstream consolidation. The output of this workflow is an amazingly high quality corpus of reference material, links, and blog posts annotated by highly educated people devoid of link bait.

Consolidation

Once relevant data has been obtained through filtering, the consolidation phase of information processing is used to archive important information for long-term storage and conversion of the data into human long-term memory (LTM). I’ve adapted the term consolidation from the neuroscience term memory consolidation, which refers to the process of biologically encoding a memory so that it can be retrieved at a later time.2

A detailed description of memory consolidation is outside the scope of this post, but a few things about human memory are worth noting with respect to enhancing information processing:

  • There is strong evidence that short pulses of learning (distributed learning) spread out over days, weeks, and years is superior to mass learning where information is consumed all at one time.

  • LTM is subject to destabilization, modification, degradation, and loss. Reconsolidation is triggered by LTM recall and strengthens existing long-term memories.

  • Many of the essential molecular processes of memory consolidation and reconsolidation require sleep. See here and here for excellent reviews.

  • Insulin and insulin-like growth factor II are emerging as important factors that improve memory consolidation and LTM persistence if induced during the initial phases of memory formation.3

Setting up an effective filtering system ensures that related information is repeatedly encountered over time during the consolidation phase. The net effect is a distributed learning system where continual memory reconsolidation is triggered by LTM recall thus improving memory.

In addition to the behaviors supported by science, I’ve found a few anecdotal things that I find also help with consolidation:

  • Learning right before sleeping vastly improve consolidation
  • Exposure to the same material from different sources improves consolidation more than learning by repeated studying the same source
  • Drawing—particularly mind maps and graphs improve consolidation

I have tried to outline helpful aspects of how consolidation can be improved, but I think the best advice comes from Steve Pinker:

The best way to gain proficiency at a task or skill is by doing: No gimmicks. If you want to get a lot out of reading, read a lot; if you want to get better at remembering errands or birthdays, practice remembering errands or birthdays. No shortcuts, no cross-training - no Sudoku

Retrieval

Not all information can be retained in human memory. An important aspect of information processing is the ability to recall bits of data from memory and then refind the original source using a trusted system. Effective retrieval requires remembering at least enough detail about the original source to construct an effective query as well as adopting a system where the query produces the original source.

My retrieval system makes heavy use of search, so I try to limit the amount of places where I archive information. I try to limit archival data to one server, one computer, and one filing cabinet. Experience has taught me that archiving information in many places is a bad idea.

Some people eschew search for retrieval system that use location. Location systems organize retrieval by physical address. Some examples include products in grocery stores, books in the library, or digital file within folders.Try finding toothpicks in the grocery store; they can be in 10 different places! For retrieval, it is almost always best to use search instead of location whenever possible. I’ve written about the advantages of search and how to search effectively in my Naming and Searching Files: Part 1 and Part 2. IBM has even conducted research on the superiority of search compared with location as it pertains to email.

I find creating subject specific wikis extremely helpful for both consolidation and retrieval. Wikis act as an intermediary between data stored in human memory and the original source. They summarize large amounts of data in an abridged form that is searchable. The act of recalling, rewording, and annotating summary data also improves learning and helps to organize the information both in my mind and in the wiki. I keep all my wiki files version controlled within a git repo inside Dropbox using Bitbucket’s wiki system. This system gives me the power of version control coupled with the ability to edit the wiki on any computer or iOS device using a Dropbox-compatible writing tool.


  1. Programming Collective Intelligence is a useful resource for learning how to build filtering tools. ↩

  2. A detailed description of memory consolidation is outside the scope of this post. If you are interesting in a more comprehensive overview of human memory, The cognitive neuroscience of human memory since H.M. is an excellent read. ↩

  3. Given the essential role of protein synthesis in learning and memory, it is tempting to speculate that the role of insulin in memory formation may be to stimulate translation. ↩