The personal view on the IT world of Johan Louwers, specially focusing on Oracle technology, Linux and UNIX technology, programming languages and all kinds of nice and cool things happening in the IT world.

Thursday, December 15, 2011

Hadoop explained by Mike Olson

Hadoop is a Apache project aimed to build a framework a framework for running applications on large cluster built of commodity hardware. The Hadoop framework transparently provides applications both reliability and data motion. Hadoop implements a computational paradigm named Map/Reduce, where the application is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, it provides a distributed file system (HDFS) that stores data on the compute nodes, providing very high aggregate bandwidth across the cluster. Both MapReduce and the Hadoop Distributed File System are designed so that node failures are automatically handled by the framework.

When we look at where the future of computing and the future of data we can see Hadoop on a very strategic location on the roadmap and within the overall framework. Hadoop is one of the ultimate building blocks in the framework which is responsible for parallelism within this framework and can be seen as one of the main engines for handling big-data.

In the below video Mike Olson is explaining some parts of the future framework of computing and explains Hadoop and some other parts in depth. Mike Olson is a the CEO of Cloudera, Cloudera is one of the leading companies who are investing in the Hadoop community.