The Big Data world has its own share of epic battles. In November 2012 Amazon announced Redshift, their cutting edge data warehouse-as-a-service that scales for only $1,000 per terabyte per year. Apache Hadoop, created in 2005, is not the only big data superhero on the block anymore. Now that we have our own Superman vs. Batman, we gotta ask, how does Hadoop compare with Amazon Redshift? Let’s get them in the ring and find out.

In the left corner wearing a black cape we have Apache Hadoop. Hadoop is an open source framework for distributed processing and storage of Big Data on commodity machines. It uses HDFS, a dedicated file system that cuts data into small chunks and spreads them optimally over a cluster. The data is processed in parallel on the machines via MapReduce (Hadoop 2.0 aka YARN allows for other applications as well).