On the Edureka! blog, an anonymous blogger describes four ways to use R and Hadoop together: RHadoop, ORCH, RHIPE and Hadoop Streaming. That’s like saying there are four ways to fly from New York to Los Angeles: American, Delta, United and by flapping your arms vigorously.

(1) Databricks Previews Spark 1.6

Databricks announces availability of an Apache Spark 1.6 preview package. The preview is early release software; the general release is still planned for mid-December. Key new Spark bits:

IBM revs the PR machine for SystemML; stories here, here, here, here, here, here, here and here. Jessica Davis appears to be confused by the publicity, describing IBM’s donation of SystemML as a “milestone” for Spark, which is a stretch.

In case you missed last week’s story, SystemML is a high-level declarative machine learning language; the user can choose between an “R-like” syntax or a “Python-like” syntax. Users specify the model in a general way; SystemML converts the user request into an execution plan and runs the request either in MapReduce or Spark.

It’s hard to imagine why one would ever run a machine learning algorithm in MapReduce, so you can write an “optimizer” with one rule: If Spark is installed, run it there, otherwise…

SystemML’s library of MapReduce algorithms dates back a couple of years; IBM was unable to commercialize it. While the Spark algorithms align roughly to existing capabilities of Spark MLLib, it appears that IBM rewrote them and added a few, including stepwise regression and survival analysis.

IBM donated the software to open source last June. All active contributors are IBM employees.