A chart of the big data ecosystem

My colleague Shivon Zilis has been obsessed with the Terry Kawaja chart of the advertising ecosystem for a while, and a few weeks ago she came up with the great idea of creating a similar one for the big data ecosystem. Initially, we were going to do this as an internal exercise to make sure we understood every part of the ecosystem, but we figured it would be fun to “open source” the project and get people’s thoughts and input.

So here is our first attempt.

A few things became apparent very quickly:

1) Many companies don’t fall neatly into a specific category

2) There’s only so many companies we can fit on the chart — subcategories as NoSQL or advertising applications, for example, would almost deserve their own chart.

3) The ecosystem is evolving so quickly that we’re going to need to update the chart often – companies evolve (e.g., Infochimps), large vendors make aggressive moves in the space (VMWare with Serengeti and the Citas acquisition)

Upon first glance, you may consider adding Pervasive Software, Cirro, and Kitenga to Analytics Solutions, FeedZai and ParStream to Real-Time, IBM Infosphere BigInsights and Greenplum HD/MR to Hadoop Related, Actuate and Quantum 4D to Data Visualization. Will suggest more later.

Hi Matt & Shivon, Dave Feinleib for Forbes did something similar recently http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/ but yours is by far more comprehensive. Well done. Two things:
1) I found Todd P’s breakdown of the Big Data Landscape quite interesting: Infrastructure/Plumbing, Dev/Mgmt Tools, Analytics & Apps. Some of the Mgmt Tools are under Infrastructure in your schema.
2) Search or Information Access seems to be missing. We hope you’ll add Q-Sensei in that box. Thanks!

1) Ah, that’s true, Todd Papaioannou did come up with that breakdown… mmm, let’s see if we can fit that in, space-wise.
2) As to search, who else would you put in that category, that’s specific enough to Big Data? Elastic Search?
As to the Forbes chart, yes, I know… we had been working on this for weeks on and off, but Dave beat us to it!

Thanks for putting this together. With such a broad landscape it’s difficult to capture all the key players. MarkLogic is missing from the infrastructure group. We’re an enterprise software company powering over 500 of the world’s most critical Big Data Applications.

No worries, with so many players having recently entered the Big Data Landscape it’s gotten to be a very crowded sector, as your chart clearly shows. You are correct that MarkLogic was a NoSQL database solving Big Data issues for clients long before the term was popular.

Hi Matt,
Thanks to BV, Shivon and you for doing this.
Companies I don’t see (some of these might be actually be a big, maybe huge, stretch or not fit your wiser criteria) that come to mind are:

Magnetic – look to go public just three year out of the blocks
C3 Metrics – very powerful attribution models cutting through mountains of well accepted myth.
VisibleMeasures – I can see why vm wouldn’t seem like big data, but video on the internet is big and very few people actually understand the punch, breadth and impact of VisibleMeasures capabilities.
GE Software’s Silicon Valley Industrial Internet
Medialets
MyCityWay – I’m biased to anyone that produces accurate meaningful subway realtime info. They’re improving.
Ensequence – interactive TV will tip scales imho
Altruik
SAP Hana
Brilig
Dtex Systems – when Dtex looks at big data, people get fired.
Adaptivity
Glue Networks
Lookingglass – these guys looked at big data and found very bad guys hidden within good guy domains

Also, missing beyond SAP’s Hana DB is a different subcategory altogether: eDiscovery or what I deem forensic analytics. The ability to datamine 3 million emails, legal, court, and brief docs in the law industry. It’s changing the way legal discovery has been conducted.

If you are to answer the Grids for each industry vertical, you must reach out to experts within that sector who already understand the lay of the land. My experience, and my company’s focus, is the Architecture-Engineering-Construction (AEC) industry. There’s a paucity of analytics in the industry, because it’s stuck in the legacy past.

While you have Vertica, you are missing a big part of HP’s big data solutions, e.g. Autonomy. http://www.autonomy.com/content/News/Releases/2012/0604a.en.html
IDOL 10 (Intelligent Data Operating Layer) is is a single processing layer that enables organizations to extract meaning and act on all forms of information, including audio, video, social media, email and web content, as well as structured data such as customer transaction logs and machine-based sensor data (http://idol.autonomy.com/). It provides the platform for solutions across Information Management, Information Governance, Web Commerce, Customer Interaction, Optimization and Marketing

Thanks… that’s one of the challenges of putting this chart together: there are a few companies like Autonomy that were around a number of years before anyone started talking about “big data”, and it’s not that easy to know where to draw the line. Let us figure out how/where we could include Autonomy in the next version. Others have suggested search and/or eDiscovery as missing pieces, maybe that could be an appropriate spot, assuming we can somehow fit all of it in on just one page…

It is more than Search/eDiscovery, it really emcompasses intelligent information processing to extract meaning from data to automate business processes and achieve whatever business results one can envision. All the “solutions” are really just “packaged” interfaces with business logic to achieve specific business objectives, however, the IDOL platform can be integrated to any information intensive application/business process to create additional insight and automation. You really need to think of it as an information platform, but unlike other Core Infrastructure providers, IDOL has connectivity to all repositories (500+) and can actual manage information in place (e.g leave it in Sharepoint or on the Z: drive, but gain insight, and automate processes from its existence in those “systems of record.”)

Hi Matt, Terracotta should be included in this graphic as well… they are a leading in-memory data core solution (just acquired by Software AG) and would fit in cross-infrastructure analytics category. We are the only leading in-memory data management solution that can linearly scale to terabytes of capacity, with predictable low-latency.

Hey Matt, Thanks for all the work and responses to all the folks who are weighing in… Just wanted to make sure that you reference Terracotta — not Teradata :) This is getting to be a big, deep exercise!
Thanks!