Wake Up and Smell the Data

I ran across several particularly interesting articles by Mike Orren, president and founder of an online local newspaper venture in Dallas called Pegasus News, in which he reviewed some of the lessons learned in launching this new site.

One of his most intriguing findings was that the primary traffic draw to his site was not the highly localized news content that has become the new sine qua non of so many newspaper publishers these days, but rather data. Indeed, he reports that a full 75% of the traffic to his site was from users looking for specific pieces of data as opposed to news. Whether it was restaurant reviews, movie guides or local event calendars, the big benefit to users of online newspapers appears to be the compilation and aggregation of locally relevant facts, not local news.

This may be a somewhat chilling finding from the perspective of the newspaper industry, but it does tend to support what we're seeing in the world of online databases: there seems to be continuing and potentially endless opportunities in both developing and compiling and aggregating highly specialized datasets for both business and consumer use. Users are drowning in information, creating a growing need to organize, normalize and summarize this information to make it more useful and easier to act upon. If you build a quality, useful dataset, there is ample evidence that you'll have an audience for it, provided users know you exist -- and that's increasingly where the business challenge and expense appear to reside.

In another fascinating online posting, Orren muses intelligently on the difference between news stories and data, a favorite hobby horse of mine. Writing for journalists, he explains in a clear and concise way that news content stored in a database is not really databased content. It's only when you break out key aspects of the news story into separate fields, done in a consistent manner that you are building a database. This is what Orrens refers to as storing data "atomically," and only after this is done can you extract maximum value from news content.

What Orrens is really examining is the limitations of full text content, and full text search. While both are useful, convenient and have a clear and needed role, neither can do the full job for the user, because they both limit and obscure information in the process of finding and delivering it. In a world increasingly driven by the automated discovery and processing of information, the most useful, discoverable and valuable information will be that which is optimized for these automated systems (read: computers).

As regular Perspective readers know, we have been underwhelmed by the newspaper industry’s digital initiatives, and this industry insider nails it when he says, "The news business as we know it is only going to continue to contract and weaken unless and until news organizations start treating everything as data rather than stories." We couldn’t make a better case for infocommerce ourselves.