Machine learning. Artificial Intelligence

Menu

Big Analytics Roundup (April 20, 2015)

Top news this week: a couple of Spark maintenance releases, some interesting new Apache projects, an announcement from Hortonworks and some interesting content from Databricks and Teradata.

Also in the news this week, North Bridge and Black Duck Software release their ninth annual Future of Open Source survey. Meanwhile, Hortonworks, IBM and Pivotal announce ODP harmonization, round up endorsements from their own executives. It’s touching to see such excitement.

Also, the Open Data Science Conference has released the schedule for its Boston events in May.

The Spark team releases two double-dot releases, Spark 1.2.2 and Spark 1.3.1. The former includes bug fixes in Spark Core and PySpark; the latter includes bug fixes for Spark Core, PySpark, Spark SQL and Spark Streaming. Ninety developers contributed to the two releases.

Huawei’s global big data team guest-posts on the Databricks blog, summarizes the newly added FP-Growth and Power Iteration Clustering algorithms. The article includes performance comparison of FP-Growth in Spark versus a similar algorithm in Mahout. Spoiler: Spark is a lot faster.

Bob DuCharme uses Spark’s GraphX library to build a graph from the U.S. Library of Congress’ subject headings.

Hortonworks announces GA for Spark 1.2.1 in HDP 2.2.4. Horton’s announcement includes ORC file support for Spark and Ambari integration and an endorsement for Apache Zeppelin, a notebook for data science. Horton also announces that it has “worked with the community to ensure that Spark runs on a Kerberos-enabled cluster.” I don’t know what that means, exactly — you either support a feature or you don’t — but it sounds positive.

Saptak Sen offers a hands-on tour of Spark in the Hortonworks Sandbox.

Loraine Lawson asks whether Apache Spark is enterprise-ready, which is kind of ironic given the seven previous items.

Databricks

Databricks publishes two primers, one for Apache Spark and the other for Databricks Cloud.