graduated technologies

Recently, jStart has promoted it's expertise in Apache™ Hadoop from an exploratory technology, to a jStart offering--in the sense that the team can help companies get started with Hadoop.

As massive amounts of data creates significant business challenges--and opportunities, jStart has been investigating how distributed computing might address some of those needs. Apache Hadoop™, an open source Apache project, is a technology which jStart has been leveraging with clients who generate significant amounts of data--data which is not being leveraged as effectively as it could be.

How Hadoop is being used today

Today, a number of jStart clients are leveraging Hadoop in conjunction with other IBM and jStart technologies. Technovated leveraged IBM's BigSheets (and with it Hadoop) to help it build a system to "build a better web" for retail. USC Annenberg's Innovation Lab used Hadoop on the back-end, and BigSheets on the front end to do everything from predicting how well movies will do on opening weekend to analyzing the Oscars, conducting sentiment analysis on premier sporting events (the Super Bowl and baseball's World Series), as well as measure sentiment in politically charged locations around the world. In short, Hadoop is being used to tackle big data analytic needs in a variety of markets for a variety of business needs.

All of these scenarios illustrate situations in which tremendous amounts of data need to be processed in order to understand not only what challenges the businesses might face, but what opportunities they may reveal.

What is Hadoop?

How can business process tremendous amounts of data--and do so in an efficient and timely manner? Hadoop allows developers to create distributed applications--applications capable of running on clusters of computers. This infrastructure can then be leveraged to tackle very large data sets--by breaking up the data into "chunks" and coordinating the processing of the data out into the distributed, clustered, environment.

Image: An example of what a Hadoop Cluster infrastructure diagram might look like.

The Business Bottom Line

Hadoop applications can then process your data rapidly and efficiently...in fact, once the data has been distributed to the cluster, follow-up queries of the data can be handled efficiently since the data has already been distributed to the various nodes. The bottom line: businesses can finally get their arms around massive amounts of data, and mine that data for valuable insights, in a more efficient, optimized, and scalable way.