Search form

Working with Big Data: Assembling Your Toolkit

Posted on Mon, Mar 18, 2013 by DC Denison

When does it make sense to start up a Big Data program? If your email marketing system isn't talking to your sales force automation system, and neither is synched up with your online purchase system, are you really ready to tackle a Big Data project? The answer may surprise you as we examine Big Data and its impact on the next generation digital experience in this third installment of our ongoing series "Are You Ready for Big Data?"

The good news for those who want to tap into the power of Big Data: The rise of this data revolution has been powered by a number of prominent open source and low-cost cloud computing projects, in addition to an explosion of commercial offerings.

Hadoop

The most important software in Big Data, and the one that sits at the white hot center of this revolution, is Apache Hadoop, an open source project that runs on commodity Linux hardware.

Hadoop, which is named after a favorite stuffed elephant of the creator's daughter, was developed at Yahoo, and was initially inspired by papers published by Google outlining its approach to handling an avalanche of data.

Hadoop implements a framework named Map/Reduce, where the application is divided into many small fragments of work, and it assigns that work to the nodes in a cluster. Hadoop also provides a distributed file system, HDFS, that spans all the nodes in a Hadoop cluster for data storage. HDFS links together the file systems on many local nodes to make them into one big file system.

Hadoop's strength is that it can parallel process huge amounts of data across inexpensive, industry-standard servers that both store and process the data. It can scale nearly without limits, which makes it uniquely suitable for working with ever-expanding sources of data.

Hadoop is also supplemented by an ecosystem of open source Apache projects, such as Pig, Hive, and Zookeeper, that further extend the value of Hadoop and improves its usability.

Cloud computing has also come into the picture as an option for those considering a Big Data project. "Infrastructure as a Service" (or IaaS) providers enable users to buy time, and install and configure their own software, like an Hadoop cluster. Budget-constrained companies can use these services to launch a Big Data project without having to invest in expensive hardware.

The next level up: cloud services that provide an application layer. Some of these "Platform as a Service" (PaaS) providers have already implemented Big Data solutions.

Amazon Web Services and Microsoft’s Azure cross the boundaries between a service and platform, offering hybrid solutions. Google’s approach focuses on the application layer. California-based Joyent also offers hybrid products.

Because a lot of Big Data already lives in the cloud – such as data from social media and device sensors – cloud platforms are making more sense for hosting and analyzing Big Data.

However, merging this data with what a company has on-premises will continue to be a challenge in the near term.

Editor’s Note: This is the third post in the ongoing series “Are You Ready for Big Data?” by DC Denison. Download the complete "Are You Ready for Big Data" ebook to learn more about Big Data, its applications in creating the next generatlon digital experience, and what it takes to get into the game.

Plain text

Filtered HTML

Use [acphone_sales], [acphone_sales_text], [acphone_support],
[acphone_international], [acphone_devcloud], [acphone_extra1] and
[acphone_extra2] as placeholders for Acquia phone numbers. Add class
"acquia-phones-link" to wrapper element to make number a link.

To post pieces of code, surround them with <code>...</code> tags. For PHP code, you can use <?php ... ?>, which will also colour it based on syntax.