Apache Spark 2.2 is Out

Posted by, jgp on 2017/07/12

The Best Spark in Town

Yesterday, Apache Spark v2.2.0 has been released. Excitement started a few months ago, reaching a “summit” during Spark Summit where a lot of the features got described and talked about. I mention some of those updates in the write-up of Spark Summit I shared on SlideShare.

Databricks has already announced its availability in their platform in their blog.

Cost-Based Optimizer

With every new releases, you have your favorite features. In Spark v2.1, it was checkpoints. In Spark v2.0, it was the generalization of the dataframe as an abstraction layer for data storage (still luv it!).

In Spark 2.2, the optimization on Catalyst by adding a cost-based optimizer is really my favorite feature. If you want to know more about this feature, I highly recommend Dr. Kazuaki Ishizaki’s talk from Spark Summit. Some key takeaways include:

Java storage is expensive, Tungsten made it a lot more efficient.

Auto-boxing is crazy expensive. Avoid it.

Java seems more efficient than Scala. Ok, this one is really personal :).

Spark Java Cookbook for Spark v2.2

I updated my Spark Java Cookbook in GitHub to Spark v2.2. Note that I now use branches per Spark version. So far everything runs, but I have not double-checked everything…

Release Notes

Apache Spark 2.2.0 is the third release on the 2.x line. This release removes the experimental tag from Structured Streaming. In addition, this release focuses more on usability, stability, and polish, resolving over 1100 tickets.

To download Apache Spark 2.2.0, visit the downloads page. You can consult JIRA for the detailed changes. We have curated a list of high level changes here, grouped by major modules.

Changes of Behavior

SPARK-19291: This added log-likelihood for SparkR Gaussian Mixture Models, but doing so introduced a SparkR model persistence incompatibility: Gaussian Mixture Models saved from SparkR 2.1 may not be loaded into SparkR 2.2. We plan to put in place backwards compatibility guarantees for SparkR in the future.

Known Issues

None

Credits

Last but not least, this release would not have been possible without the following 233 contributors: