When is a technology offering a platform? Arguably, when people build products assuming it will be there. Or extend their existing products to support it, or add versions designed to run on it. Hadoop is there. The age of Bring Your Own Hadoop (BYOH) is clearly upon us. Specific support for components such as Pig and Hive vary, as do capabilities and levels of partnership in development, integration and co-marketing. Some vendors are in many categories – for example, Pentaho and IBM at opposite ends of the size spectrum interact with Hadoop in development tools, data integration, BI, and other ways. A few category examples, by no means exhaustive:

Database: some vendors, like IBM and Teradata offer their own distributions, and even appliances. Others like Actian, Calpont, Oracle and Microsoft partner with pureplay vendors. All provide connectors, management interfaces to their own management tools, etc. MarkLogic adds an “enterprise NoSQL” flavor; Rainstor adds an archiving solution for a highly compressed Hadoop environment.

Data Integration: Informatica and Talend both support HDFS and even have specific offerings for ETL, data quality, etc. Revelytix Loom offers data prep and metadata creation capabilities to shorten time to use cycles.

Development platforms:Continuuity is out to an early lead here, but is hardly alone and won’t be the last. For example, SQL Server development tool player Red Gate will enter the market soon.

Hadoop as a Service: Altiscale, Amazon, Qubole, Rackspace, Savvis, and Xplenty (who mask Hadoop development complexity) offer varying degrees of control and surrounding capabilities – and marketing, as the links demonstrate.

In-memory data grid (IMDG) engine: Gridgain offers GGFS, one of several HDFS substitutes, and like ScaleOut hServer offers an in-memory grid for execution of MapReduce code. Longtime IMDG player Terracotta has added a Hadoop connector.

Lifecycle Management: WANdisco is offering ALM and support for highly available distributed network deployments, and has recently partnered with Hortonworks.

Merv Adrian is an analyst following database and adjacent technologies as extreme data transforms assumptions about what to persist as well as when, where and how. He also watches the way the software/hardware boundary… Read Full Bio

Thoughts on BYOH – Hadoop’s a Platform. Get Used To It.

@Merv – great market landscape, good to see how far reaching it is. There are clearly many tools connecting to various constituents of Hadoop – HDFS, Hive, etc. – but Hadoop really becomes a platform when tools run inside/on top of it.

In the data integration/data quality space, that means running the transformation & cleansing jobs inside Hadoop, through MapReduce code generation for example (a bit like ELT jobs run inside a RDBMS). In the BI space that would mean crunching these queries in MapReduce, not just extract the raw data to load it in a distinct cube.

I know this is the direction the market is taking, but not all legacy architectures will be able to transition to that concept.

Thanks, Yves. It is really just a quick sketch; doing it right would take many pages. I’m hoping folks weigh in and say “Add this” – and I will, keeping this as a living post for a while.
There’s a good discussion to be had here about how the architecture will evolve, starting with “which pieces have to be in the stack to consider it Hadoop?” – and that will change as YARN proliferates and people use other, non-MR engines in their tacks. But certainly MR will remain the vehicle of choice for DI work in the HDFS-and-alternatives data layer.

Good post about what Hadoop is all about: the ecosystem! Far too often Hadoop is only seen as a cheap alternative to store massive amounts of data or batch processing it. Seeing it from platform or ecosystem perspective emphasises the potential it has to utilise the stored (or streamed) data.

I believe that the wide range of appliances and other so called big data solutions that are providing integration to Hadoop or are “powered by Hadoop” make it more and more obvious that this platform is here to stay (HP’s HAVEn, IBM’s InfoSphere BigInsights, Microsoft’s Parallel Data Warehouse etc.).

Great overview. One category you might consider adding is ‘In-Memory Hadoop’ a la ScaleOut Software, Grid Gain, Terracotta, Pivotal etc. Basically the vendors that are combining in-memory computing technology with Hadoop to enable developers to run MapReduce on live, fast-changing data and speed up job execution.

Nice overview, Don’t forget the ODBC Connection plumbing offered by Simba. We have the most of the Major Hadoop players (Cloudera, MapR, Hortonworks, Intel, Microsoft, etc) offering our ODBC drivers as part of their distributions, as well as solutions for MongoDB, Cassandra, Salesforce Google BigQuery and others.

We enable the ecosystem to talk to each other in a standardized and reliable way, via the most common interface.

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.