What is Hadoop and Why are we Interested

Recently the computer science department at Worcester State has chosen to add two tracks to the major. One track is software development, which is similar to the current single track, and the other is Big Data Analytics. Because of the new track system, courses offered had to be changed which meant a change in a few course materials. Part of my independent study this semester has been to implement a Hadoop cluster, which is a tool that can be used for data analytics.

Part of big data analytics is dealing with very large sets of data. Many times, especially when we think of companies like Google or Amazon, it becomes evident that a single machine, or a couple of machines won’t get the job done. This is when distributing tasks throughout a series of machines works much more efficiently. One software platform that has been designed for this job is Apache Hadoop, which is installed on a cluster of machines that can handle large sets of data and jobs associated with this data much more efficiently.

From the Hadoop website (hadoop.apache.org):

“The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”

In this independent study, I am looking to implement a cluster running Hadoop and testing some applications of this. By the end of the semester, the goal is to have a fully functioning cluster that can be used in the courses that deal with big data analytics, and have a complete set of install instructions that can be used to rebuild the Hadoop cluster at Worcester State in the event something goes wrong (something which is currently tough to obtain).

Keep checking for updates about the install, setup, and use of Hadoop here at Worcester State University.