Apache Spark’s popularity as part of big data analytics solutions is exploding. Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS). However, Spark promises performance up to 100 times faster than Hadoop MapReduce for certain applications…and that’s why you should care!

Spark’s in-memory cluster computing is very well suited to machine learning algorithms. These Videos will give you a nice introduction to Spark, how it’s being used in business and why you should care…Watch Spark Videos…

So what is HBase? Yes, it’s the Bigtable-like structured storage for Hadoop HDFS, but how exactly does it work? What is the architecture? When is a good time to use it and when is not? This post will help inform those questions. More…

Watch this pre-recorded webinar to learn what Machine Learning is, why you should use machine learning algorithms, what the common challenges of machine learning are, and how Cloudera’s enterprise data hub supports machine learning. More…

Authors: Foster Provost and Tom Fawcett Department of Information, Operations, and Management Sciences Leonard N. Stern School of Business New York University GO TO PAPER Abstract Companies have realized they need to hire data scientists, academic institutions are scrambling to put together data-science programs, and publications are touting data science as a hot—even ‘‘sexy’’—career choice. However, there […]