If it sometimes feels like the data business is full of buzzwords and hipster technical jargon, then that’s probably because it is. But don’t panic! I’ve been at loads of hip and non-hip data talks here and there and, buzzwords aside, I’ve come across four actual categories of data business model in this hip data ecosystem. Here they are:

Big storage for big people

Money in, insight out: Vertically integrated data analysis

Internal data analysis on an organization’s own data

Quantitative finance

1) Big storage for big people

This is mostly Hadoop. For example,

Teradata

Hortonworks

MapR

Cloudera

Some people are using NoHadoop. (I just invented this word.)

Datastax (Cassandra)

Couchbase (Couch but not the original Couch)

10gen (Mongo)

Either way, these companies sell consulting, training, hosting, proprietary special features &c. to big businesses with shit tons of data.

2) Money in, insight out: Vertically integrated data analysis

Several companies package data collection, analysis and presentation into one integrated service. I think this is pretty close to “research”. One example is AIMIA, which manages the Nectar card scheme; as a small part of this, they analyze the data that they collect and present ideas to clients. Many producers of hip data tools also provide hip data consulting, so they too fall into this category.

Data hubs

Some companies produce suites of tools that approach this vertical integration; when you use these tools, you still have to look at the data yourself, but it is made much easier. This approaches the ‘data hubs’ that Francis likes talking about.

Lots of advertising, web and social media analytics tools fall into this category. You just configure your accounts, let data accumulate, and look at the flashy dashboard. You still have to put some thought into it, but the collection, analysis and presentation are all streamlined and integrated and thus easier for people who wouldn’t otherwise do this themselves.

Tools like Tableau, ScraperWiki, RStudio (combined with its tangential R services) also fall into this category. You still have to do your analysis, but they let you do all of your analysis in one place, and connections between that place, your data sources and your presentatino media are easy. Well that’s the idea at least.

3) Internal data analysis

Places with lots of data have internal people do something with them. Any company that’s making money must have something like this. The mainstream companies might call these people “business analysts”, and they might do all their work in Excel. The hip companies are doing “data science” with open source software before it gets cool. And the New York City government has a team that just analyzes New York data to make the various government services more efficient. For the current discussion, I see these as similar sorts of people.

I was pondering distinguishing between analysis that affects businessy decisions from models that get written into software. Since I’m just categorising business models and these things could both be produced by the same guy working inside a company with lots of data, I chose not to distinguish between them.

4) Quantitative finance

Quantitative finance is special in that the data analysis is very close to a product in itself. The conclusion of analysis or algorithm is: “Make these trades when that happens.” Rather than “If you market to these people, you might sell more products.”

This has some interesting implications. For one thing, you could have a whole company doing quantative finance. On a similar note, I suspect that analyses can be more complicated because the analyses might only need to be conveyed to people with quantitative literacy; in the other categories, it might be more important to convey insights to non-technical managers.

The end

Pretend that I made some insightful, conclusionary conclusion in this sentence. And then get back to your number crunching.