Monday, August 5, 2013

Scala: 6 silver bullets

Bye bye Java, Hello Scala

In the JVM world, Scala is certainly the rising star. Created at EPFL in 2001, its strongly gaining in popularity. Depending on the indices, it ranks now as a "serious" language reaching far beyond the academic world and adopted in mainstream companies (twitter backend, Ebay research, Netflix, FourSquare etc.).
For data scientists, this language is a breeze. Above the religion war between functional and object oriented believers, it succeeded by merging the best of both worlds, with a strong drive at "let's be practical."
If Grails/Groovy was a big step forwards in productivity on the JVM, Scala goes even further, mixing static typing (thus efficiency) with many improvements in the language structure, collections handling, concurrency, backed by solid frameworks and a very active community.
In this post, I'll picked up six major (and subjective) improvements, showing my hardcore Java colleagues how jumping on this train would be a promise of a great journey.

Bullet #1: Object orientation

For many, scala means "the come back of functional programming." That is certainly true, but they also totally revisited the object approach on the JVM. Thanks to static typing, more or less all what you can imagine the compiler to do is possible.

Immutable variables

By default, variable are immutable. Once a value has been defined, you don't cannot change it. This is key for the functional paradigm, where a call on object should return the same value, as no state is stored. It can be disturbing at first, but one realize that most of our codes can be written with immutable variable val x:Int = 42 x=12 //error: reassignment to valvar i:Int = 0 i+=1 //OK

Type inference

Why specifying the variable type wherever it can be inferred (apart for readability, for public method for example)? val x = 42 // x is an Int

Companion object

If an object is defined in the same source file as a class, with the same name, it can have some "special" relation to it (implicit conversion, factory etc.) object Peak { implicit def formPair(p:Pair[Double, Double]) = new Peak(p._1, p._2) } val p:Peak = (1.3, 42.0)

Traits

Why an interface could not implement methods, and why a class could not inherit from multiple implemented parents?
While abstract class still exist, a trait play a more general role. It can define methods "to be defined in implemented class" and also actual methods (but no fields). trait HasIntensity{ def intens:Double def sqrtIntens = intens * intens } class Peak(val x:Double, val intens:Double) extends HasIntensity{...}

Bullet #2: Collections

Among many other variants, there are three types of collections:

List[T]: to be traversed, with head and last direct access,

Vector[T]: with random access on integer index,

Map[K, V]: for dictionary or hash map structure.

Instantiation

Has unveiled above: val l = List(1,2,3,4) val m = Map("x"->3, "y"->1984)
By default, collection are also immutable, although they also exist in a mutable flavor: import scala.collection.mutable.List

A more complex example

Let's pretend we have a list of peaks as defined above Peak(val x:Double, val intens:Double). We'd like to group them by integer bin on x, sum up the intensities and keep only the 2 binned values with the highest total intensity. peaks.groupBy(p=>math.floor(p.x)) .map({pl =>(pl._1, pl._2.map(_.intens).sum)}) .toList .sortBy(-_._2) .take(2)Each operation return a new collection, on which can be applied an operator. The succession of these operations are concisely described

for loops

Ever written for(i=0; i<arr.size();i++){arr[i]}? Well, we can do better. Let's consider two imbricated loops that build a list of points

Bullet #4: concurrency

Easy list parallelization

Imagine we have a heavy function to be called on each list members (here, it will be sleeping 100ms...). Having immutable variables allows more easily to parallelize such a code on the multiple available cores with the .par call:

Actors

Again, the functional trend of Scala enables easily to communicate between actors via message passing. Here a master send an integer to a slave, which decrease it by 1 at each step. Once finished, it send the word "stop". Pattern matching is use to select the correct behavior.

import scala.actors.Actor

import scala.actors.Actor._

object main {

class MinusActor(val master: Actor) extends Actor {

def act() {

loop {

react {

case (i: Int) => {

println(s"$i--")

master ! (i.toInt - 1)

}

case"stop" => {

println("ciao")

exit

}

case _ => println("whatever")

}

}

}

}

class MasterActor extends Actor {

val minusActor = new MinusActor(this)

def act() {

minusActor.start

minusActor ! 10

loop {

react {

case i: Int if i > 0 =>

minusActor ! i

case _ =>

minusActor ! "stop"

exit

}

}

}

}

new MasterActor().start()

Thread.sleep(1000)

}

And much more with akka, future, async

Bullet #5: the ecosystem

No matter how brilliant, a language cannot succeed if it is not supported by a strong ecosystem which encompasses many aspects

Java integration

Scala code is compiled into Java. A very good side effect is that available Java libraries can be used transparently in a Scala code. Here is an simple example using apache commons.math.

IDE

If Typesafe support an eclipse package, netbeans is used by many. Some development environment, such as Activator/Play! are strongly embedded into the browser and allow to use any text editor for the source code.

Web frameworks

if Play! is the most comprehensive one, some lighter alternative light scalar are available.

RDBMS integration

More than an ORM, slick is a mainstream solution. The comfort of an ORM, with the flexibility to manipulate list in the Scala fashion. Depending on the connected database, the generate SQL is optimized.

And NoSQL

Any Java driver is usable. But some tools are natively Scala oriented, such as Spark (in-memory large database) or reactive mongo.

Bullet #6 REPL

Experimenting is a key component of discovering a language. A REPL (Read-Eval-Print-Loop) allow to see the code executed on the fly.

worksheets

Even better, the eclpise IDE (at least) allows to have worksheet. Enter code, and each time the code is saved, it is evaluated. This offer the possibility to use object defined in the source base to be evaluated interactively.

This is the shortest bullet int this list. But it can be sometimes the most efficient.

Plenty more bullets

The selection of six was pretty subjective, I could have gone further with much more bullets that make the daily life much more productive:

xml parsing: mixing xpath and the list manipulation for a mixed SAX/DOM approach;

http queries;

file system traversal and files parsing;

Json serialization;

dynamic classes, to add handle whatever method code;

regular expression (ok, Perl is the king, but Scala is not bad);

macros;

string interpolation and reverse;

optional ';'

more Java integration;

abstract classes;

profiling with yourkit or jProfiler

context bound & parametric types

streams

foldLeft /:

for loops structure

map getOrElse

case class

implicit

Mockito, EasyMock

...

Cons

Several recurring criticisms on Scala are regularly flying back. Without making mine all of them, here a few common ones:

backward incompatibilities between major version: a jar compiled in 2.8 is not usable in 2.9. That maybe a price to pay for a maturing language, not being tight to a full backwards compatibility. That may be a serious issue for some, but personally, upgrading my code frmo 2.8 to 2.9 and 2.10, with its dependency has never been more that a couple of hour issue.

sbt is a pain: ok. But I've never made my way smoothly through maven neither. For most of the issues I faced, googling was enough and even though I cannot a full understanding of the tool, it does the job. Slowly, for sure, but it does it.

It's hard to hire people. That should hopefully evolve, with more and more adopters and major companies making commitment to the language.

It's hard to learn: that is a common wisdom among scalers. It is definitely not a language a normal developer picks up in a week (contrary to Groovy, where the learning curve is steeper). Reading a book, or better, following Martin Odersky online course could be a worthy investment. But the common wisdom also says that when you have picked up the basis, the return on investment is great.