The Top 10 Posts of 2014 from the Cloudera Engineering Blog

Our “Top 10” list of blog posts published during a calendar year is a crowd favorite (see the 2013 version here), in particular because it serves as informal, crowdsourced research about popular interests. Page views don’t lie (although skew for publishing date—clearly, posts that publish earlier in the year have pole position—has to be taken into account).

In 2014, a strong interest in various new components that bring real time or near-real time capabilities to the Apache Hadoop ecosystem is apparent. And we’re particularly proud that the most popular post was authored by a non-employee.

The Truth About MapReduce Performance on SSDsby Karthik Kambatla & Yanpei ChenIt turns out that cost-per-performance, not cost-per-capacity, is the better metric for evaluating the true value of SSDs. (See the session on this topic at Strata+Hadoop World San Jose in Feb. 2015!)

Based on the above, a significant number of you are at least exploring Apache Spark as an eventual replacement for MapReduce, as well as tracking Impala’s progress as the standard analytic database for Apache Hadoop. What will next year bring, do you think?