March 18, 2010

Like most people I’m frequently asked what I do for a living. I tell people that I work with companies that have a large quantity of mission critical information that they need to be able to find in an instant and that information better be right every time.

The Economist magazine just had a really good special report on The data deluge that individuals, companies and governments face.

Some things that resonated with me from the report:

In 1971 Herbert Simon an economist wrote

“What information consumes is rather obvious: it consumes the attention of its recipients“.

I like his conclusion

“Hence a wealth of information creates a poverty of attention.“

The term “Data exhaust” was used to define the trail of Internet user clicks that are left behind from a transaction. This exhaust can be mined and useful. Google refining their search engine to take into account the number of clicks on an item to help determine search relevance is one example of using data smog. I really like this “Data exhaust” term and believe it fits well with trying to make sense of large data sets. Smoggy areas could indicate that instructions are not clear enough in service documentation or properly mined, it could also indicate an impending issue with a particular component in a product.

“Delete” written by Viktor Mayer-Schönberger argues systems should “forget” portions of a digital record over time. Systems could be designed so that parts of digital files could degrade over time to protect privacy yet items that remain could possibility benefit all of human kind. The concept of donating your digital corpse (medical reports, test results etc.) to science comes to mind as a good example of this concept. While I might not want people to be able to link my name to my medical records, the records themselves with no name attached would provide a lifetime of data that could be used to advance lots of different fields.

Being able to consistently create the right set of rules for the ethical use of various types of data exhaust will be tricky. The article in the Economist mentions six broad principles for an age of big data sets that I liked: