Apache Hadoop In Focus

Administrator

Posted on 1.06.2013

Lately, Apache Hadoop has entered the
lexicon of Web professionals thanks to
the big data explosion. But talking about
Hadoop can be tricky, and that’s without
considering all of the associated technologies
and architectural mumbo jumbo.

At the most basic level, Hadoop is simply an open
source software framework built for the distributed
processing of large data sets. In traditional non-distributed
architectures, scaling meant adding more
CPU and storage, but in the Hadoop distributed architecture,
both data and processing work simultaneously
across clusters of commodity servers. But
that’s only part of the story when it comes to
Hadoop.

Thanks to the Hadoop framework, there are
now a variety of products that can help companies
mine and use big data — something your enterprise
will likely do more of in the future. But first, you
will need to know more about how Hadoop works.

Building Blocks
The core of Apache Hadoop consists primarily of
two sub-projects — Hadoop MapReduce (a parallel-
processing engine) and the Hadoop Distributed
File System or HDFS (which makes it
possible to scale across servers and store data on
compute nodes).

While MapReduce and HDFS are unquestionably
the most important Hadoop-related projects
from Apache, there are others. The most notable are
the query languages Hive and Pig. The SQL-like
Hive acts as a data warehouse infrastructure that allows
for data summarization and ad hoc querying,
while Pig is a data flow language and execution
framework for parallel computation. Both make it
possible to process a lot of data without having to
write MapReduce code.

To ensure that it can perform required tasks of
various organizations, there are many other
Apache-endorsed products and projects for Hadoop
including Flume, ZooKeeper, Oozie, and HBase.
Learn more about these projects by visiting Website
Magazine at http://wsm.co/MWB6NO.

There are many Apache-created and third-party
distribution and management software offerings
that make Hadoop easier to deploy and run. As the framework grows in popularity, more of these systems
are emerging. Cloudera, Hortonworks, and
MapR for example each have their own unique distribution
products, while others including Platform
Computing and Zettaset offer software for managing
Hadoop clusters (most of which are distribution
software agnostic).

Using Hadoop
You may have a better idea of what Hadoop is, what
it’s made of and some of the software and service
options that developers build around it, but what is
the point of Apache Hadoop, and what can it do for
your business?

If you’ve read anything on big data in the past,
you’ll know that by collecting, organizing and
studying it, you can “find insights, discover emerging
data types and spot trends” that can be used to
asses the real business value of these hefty data sets,
and then turn these large volumes of information
into actionable resources.

How does Hadoop help? Well, when you’re interested
in looking at this data in a way that is both
deep and computationally extensive (such as clustering
or targeting), Hadoop offers a scalable, flexible,
fault tolerant and cost effective way to do so.
That is why it is the basis of numerous third-party
application software products like IBM Infosphere
BigInsights and HStreaming.

In short, Apache Hadoop offers a framework for
structuring and studying a significant amount of
data that may or may not be well-organized, and
can be built upon to provide a variety of products to
analyze and leverage big data in different ways depending
on the needs of the business.
Hadoop in the Future
At the moment, Apache Hadoop is going to be
most useful for companies with highly sophisticated
IT infrastructures, at least far more than the
average relational database adopter, largely because
there are very few applications that can simply
be opened and run on a Hadoop processor.
And, of course, these larger businesses are more
likely to have massive amounts of data on hand to
be leveraged.

However, as big data becomes an increasingly
common issue for all businesses, expect to see
more shrink wrapped Hadoop applications that
can be quickly and easily installed and used by
companies of all shapes and sizes.

Who is Using Hadoop?

There are already plenty
of companies out there
utilizing Apache Hadoop
every day to analyze and
make sense of all of their
data, and in most cases it
has been a smashing
success. If you’re only kind
of familiar with Hadoop,
head over to Website
Magazine online to get
a quick look at five
companies, including major
names like Amazon and
Facebook, that are using
this fast-emerging technology.
You can find them at