Sorting Through What's Really Going on in the Hadoop Stack

HCatalog is a table and storage management service for Hadoop data. It manages schemas and supports interoperability across the other data processing tools (e.g., Pig, MapReduce, Hive, etc.).

Everyone tends to focus on the “big” in Big Data, so much so that it’s easy to lose focus on the fact that Hadoop is really about data. Let’s regroup for a minute and really look at what’s going on with the data on Hadoop.

The Hadoop Distributed File System. What’s it doing with the data? It’s distributing it on nodes and storing it there.

MapReduce. This does the real work in the Hadoop core. If you want to run a process or computation on the data, it “maps” that out to the nodes and then runs the process, and “reduces” the results to your answer. So, it’s processing the data.

This is where the growing list of Apache Hadoop-related projects comes into play.

These projects go by an odd assortment of names: Pig, Hive, Flume, Zookeeper, but they’re often short-changed when we talk about Hadoop. Loraine has seen them referred to as the “Hadoop stack,” though some programmers prefer “Hadoop ecosystem." Forrester refers to them as “functional layers.”

For the most part, they’re of interest to developers more than executives, but hopefully a high-level view of these solutions will add some depth to your understanding of Hadoop and its capabilities.

Watson continuously learns from previous interactions, gaining in value and knowledge over time. Learn how companies are harnessing that AI power to create and improve products and services. ... More >>

Here are the top 10 strategic technology trends that will impact most organizations in 2017. Strategic technology trends are defined as those with substantial disruptive potential or those reaching the tipping point over the next five years. ... More >>

Subscribe to our Newsletters

Sign up now and get the best business technology insights direct to your inbox.