Friday, September 12, 2008

Scala runs on the JVM and can directly call and be called from Java,
but source compatibility was not a goal.
Scala has a lot of capabilities not in Java,
and to help those new features work more nicely,
there are a number of differences between Java and Scala syntax
that can make reading Scala code a bit of a challenge for Java programmers
when first encountering Scala.
This primer attempts to explain those differences.
It is aimed at Java programmers, so some details about syntax which are
the same as Java are omitted.

This primer is not intended to be a complete tutorial on Scala,
rather it is more of a syntax reference.
For a much better introduction to the language,
you should buy the book
Programming in Scala by Martin Odersky, Lex Spoon and Bill Venners.

Most of these syntax differences can be explained by two of Scala's
major goals:

Minimize verbosity.

Support a functional style.

The logic behind some of these syntax rules may at first seem arbitrary,
but the rules support each other surprisingly well.
Hopefully by the time you finish this primer,
you will have no trouble understanding code fragments like this one:

(0/:list)(_+_)

Scala is an integrated object/functional language.
In the discussion below, the terms "method" and "function" are
used interchangeably.

You can run the Scala interpreter and type in these examples
to get a better feel for how these rules work.

Contents

Basics

Scala classes do not need to go into files that match the class name,
as they do in Java.
You can put any Scala class in any Scala file.
The only time it makes a difference is when you have a class
and an object of the same name and want them to
be companions, in which case they must be in the same file.

Semicolons are optional.
If you put one statement on each line, you don't need semicolons.
If you want to put multiple statements on a line, you can do so
by separating them with semicolons.
There are specific rules about when statements can span lines,
so sometimes you have to be a bit careful when doing so.

Every value is an object on which methods can be called.
For example, 123.toString() is valid.

Scala includes implicit transformations that allow objects to be used
in unexpected ways.
If you see some source code where a method call is operating on an
instance of a class which does not define that method,
then probably the instance is being implicitly converted to a class
on which that method is defined.

Scala's "Uniform Access Principle" means that variables and functions
without parameters are accessed in the same way.
In particular, a variable definition using the val or
var keyword can be converted to a function definition
simply by replacing that keyword with the def keyword.
The syntax at the calling sites does not change.

Scala includes direct support for XML.
This code fragment assigns an instance of type
scala.xml.Elem to val x:

val x = <head><title>The Title</title></head>

You can mix Scala code in with the XML by putting it in braces.
This code fragment produces the same resulting value as the above code:

val title = "The Title"
val x = <head><title>{ title }</title></head>

Just about everything can be nested.
Packages can be nested inside packages,
classes can be nested inside classes,
defs can be nested inside defs.

As in Java, annotations are indicated by the @ character.

Keywords

There is no static keyword. Methods and variables
that you would declare static in Java go into an object
rather than a class in Scala. Objects are singletons.

Scala has no break or continue statements.
Fortunately, Scala's support of a functional programming style
reduces the need for these.

Access modifiers such as protected
and private can include a scope
in square brackets, such as private[this] or
protected[MyPackage1.MyPackage2].
The default access is public.

The val keyword declares an immutable value (a val),
similar to the final keyword in Java.
The var keyword declares a mutable variable.

Multiple items can be imported with one import statement:

import java.text.{DateFormat,SimpleDateFormat}

Imported symbols can be renamed to other names,
which provides a means to work around the problem of importing
two symbols of the same name.
For example, if you want to import both
java.util.Date and java.sql.Date
and be able to use them both without having to type the whole
qualified name each time, you could do this:

import java.util.{Date=>UDate}
import java.sql.{Date=>SDate}

If an import is renamed to _, that symbol will not
be imported. This allows importing everything except a specified
symbol:

import java.util.{Date=>_,_}

The abstract keyword is only used for abstract classes
and traits.
To declare an abstract function (def), val, variable (var), or type, you
omit the = character and body of the item.

When overriding a method in a superclass, the override
modifier must be specified.
Overriding a method without using the override modifier
or using the modifier when not overriding a superclass method
will result in compilation error.

Some other keywords:
lazy, implicit

Symbols and Literals

Scala allows multi-line strings quoted with triple-quotes:

val longString = """Line 1
Line 2
Line 3""";

Symbol names can include almost any character.
In particular, they can include all of the characters normally used
as operators, such as *, +, ~ and :.
The backslash character (\) is a valid symbol character, and in fact
is used as a method name in the scala.xml.Elem class.
Note that "abc\"def" is a seven character String
with a double quote in the middle,
but if abc is an instance of scala.xml.Elem,
then abc\"def" is a call passing a three character String
to the backslash method, a method
that accepts a String argument and returns an instance of
scala.xml.NodeSeq.

The underscore character (_) is used as a wildcard character rather
than asterisk (*), such as in an import statement
or in a case statement to represent a "don't care" value.
This is because asterisk is a valid symbol character in Scala.

As in Java, by convention class names and object names start with
an upper case letter, variable names start with a lower case letter.

In one case, whether a symbol
starts with an upper or lower case actually matters to the compiler:
in a case statement, that difference is used to disambiguate
between a constant value, such as PI
if Math.PI has been imported
(starts with upper case)
and a placeholder name being introduced (whose scope is then limited
to the body of the case statement).

Expressions

Any place an expression is expected, a block of expressions
surrounded by braces can be used instead.
The braces act as parentheses.
For example, the expression 5 * { val a = 1; a + 2 }
is valid and yields the value 15.

A mentioned above, symbol names can include almost any character,
such as * and +. This is useful for defining methods
that will be used as operators (see the rules under
Function Calls for functions with one argument).

The precedence of operators is determined by their first character,
and is hardwired to match the usual precedence.
Thus if you create the operator methods
*^ and +^,
the *^ will have higher precedence.
The one exception to this rule is that if the operator ends with
an equal sign (=) and is not one of the standard
relational operators, then it will have the same precedence
as simple assignment.

When used as binary operators, any symbol which ends with a colon (:)
is right-associative; all other symbols are left associative.
The controlling object for right-associative operators goes on
the right side of the operator, with its one argument on the left side.
However, the left hand argument is still evaluated before the
right hand argument.

The characters +, -, ! and ~ can be used as prefix operators in any
class by defining the method unary_+, unary_-
etc.

Every statement is an expression whose value is the last expression
within that statement that was evaluated.
For example, there is no ?: ternary operator in Scala.
Instead, you use a standard if/then/else statement:

val x = if (n>0) "positive" else "negative"

When used on the right hand side of a value or variable declaration,
the underscore character means assign the default value.
This is the same as not specifying a value in Java;
in Scala, not specifying an initial value declares an abstract variable.

Instead of the switch keyword as in Java,
Scala uses a match expression.
The match keyword comes after the value being matched,
unlike the relative positions of the switch keyword
and the value in Java.

match works on all types, not just ints.
For example, you can match on a String variable and have
case statements each with a constant String value.

No break statement is required, and execution does not
fall through to the next case.

match statements return values.
The value of a match statement is the value of whichever
branch was executed.

The underscore is used to indicate the default case.

In addition to constants, a case expression can include
patterns, which allow for more complex matching.
Case matching is handled by extractors, which can be implemented by
writing an unapply method in an object.

A case pattern can include a variable declaration with
a type, in which case the variable is defined with that type and
set to the value of the matched data within the body of that case.

n match { //assume n is of type Number
case i:Int => //i is an Int here, like (int)n would be in Java
case d:Double => //d is a Double here, like (double)n in Java
case _ => //no values were defined in this case
}

When matching a more complex expression, you can assign a variable
name to an internal part of the pattern by writing the variable
name and the @ character before the pattern:

case Foo(a,b @ Bar(_)) //b gets set to the part that matches Bar(_)

Case expressions can be followed by a pattern guard before the =>.
The pattern guard is the keyword if followed by
a boolean expression:

x match { //assume n is of type Number
case Foo(a,b) if a==b => //here only when Foo with a==b
case Foo(a,b) => //here for all other Foo
case _ => //here for all non-Foo
}

Case expressions work for XML.
In this example, the variable b gets set to whatever
is inside the body element if there is only one
element there:

case <body>{ b }</body> => //b is the contents

To match multiple elements, use _* to match any sequence:

case <body>{ b @ _* }</body> => //b is the contents

The catch block of a try/catch statement uses the same
syntax as the body of a match statement:

This example is contrived, but the same kind of assignment works when
calling a method that returns a value which is an object.

A pattern can be used on the left side of the <-
operator in a generator in a
for expression.

For Expressions

For Expressions are also called For Comprehensions.

A for expression consists of the for keyword,
a sequence of specific kinds of elements separated by semicolons or
newlines and surrounded by parentheses, and the yield
keyword followed by an expression:

for ( n

The elements inside the parentheses can be any of the following:

A generator, such as n , which produces
multiple values and assigns them to a new val (here n).
The new val name appears to the left of the <-
operator; to the right of that operator is a value which
implements the foreach method to generate a
series of values.

A definition, such as e = n%2, which introduces a
new value by performing the specified calculation.

A filter, such as if e==0, which filters out
the values which do not satisfy that expression.

The val name in a generator can instead be a pattern,
similarly to how a pattern can be used in an assignment statement
or a case expression.
For example:

val list = List((1,2),(3,4),(5,6))
for ( (a,b)
yields List(3, 7, 11).

The elements can be placed inside of braces rather than parentheses
and separated by newlines rather than semicolons:

for {
n

When multiple generators are specified, each generator is repeated for
each value produced by the preceding generator.
For example, the expression

for ( x
produces a value starting with (0,0), (0,1), (0,2) and ending with (4,2).

The type of the value produced by a for expression is
the same as the type of the first generator.

As an alternative to using yield followed by an expression,
you can omit the yield keyword and use a block of code
in place of a single expression.

A for statement can always be translated into a series
of foreach, filter,
map and flatMap method calls.
In that sense, the for statement is syntactic sugar.

Arrays

Array indexes are specified with parentheses rather than square brackets.

Array access is implemented the same way as function access,
using the apply method.

The code arr(index) is converted to
arr.apply(index).

The code arr(index) = newval
is converted to arr.update(index,newval).

Arrays are declared using the Array keyword
and with the element type in square brackets,
rather than
using empty square brackets after a type as is done in Java.
For example, an array with space for three Strings would be declared
like this:

val x = new Array[String](3)

A two dimensional 3 by 3 array of Strings would be declared like this:

val x = new Array[Array[String]](3, 3).

Tuples

Scala has built in support for Tuples, from one element to 22 elements.
A Tuple is a small ordered collection of objects,
where each object can have a different type.

The types for Tuples of various sizes are Tuple1 through Tuple22.
These types have N type parameters, where N is the Tuple size.
For example, a two element Tuple with an Int and a String has type
Tuple2[Int,String].

The Pair object allows that word to be used instead of
Tuple2 for building and matching two element Tuples.

The Triple object allows that word to be used instead of
Tuple3 for building and matching three element Tuples.

You can create a tuple by enclosing the object in parentheses
and separating them by commas:
(1, 2, "foo") is a Tuple3[Int,Int,String].

You can create a Tuple2 (a Pair) by using the -> operator,
which works on any value:
"a" -> 25 is the same as ("a", 25).
The following expression is true:

("a",25)=="a"->25

This is done by an implicit conversion from Any to
Predef.ArrowAssoc,
which contains the -> method.

The elements of a Tuple can be accessed as member fields _1, _2, _3 etc.

If an expression returns a Tuple, that can be assigned to a set
of variables or vals.
The following code assigns 5 to the new val tens
and 8 to the new val ones.

This is a case of using a pattern on the left hand side of an assignment,
as mentioned in the section on
Cases and Patterns.

Classes

The primary constructor for a class
is coded in-line in the class definition,
i.e. the constructor statements are not contained within a definition
inside the class.
The constructor parameters are declared immediately after the class
name, and superclass arguments are placed after the name of the
class being extended.

Class parameters can be preceded by val to make them
immutable instance values (vals), or by var
to make them instance variables.

Class parameters can be preceded by an access modifier such
as private or protected.
By default, class parameters using val or var
are public.

The primary constructor can be made private by adding the
access modifier private before the parameter list.

trait is like interface in Java,
but can include implementation code.
Classes in Scala don't implement traits, they
extend them same as classes.
If a class extends multiple traits, or extends a class plus
traits, the keyword with is used rather than commas
as in Java.

Case classes are defined by adding the case keyword
before the class keyword.
This automatically does the equivalent of the following:

Prepends val to all parameters, making them
immutable instance values.

Creates equals and hashCode methods
so that instances of that class can safely be used in collections.

Creates a companion object of the same name with
an apply method with the same args as declared for
the class, to allow creation of instances without using the
new keyword,
and with an unapply method to allow the class name
to be used as an extractor in case statements.

Anonymous classes can be defined without reference to an extending class,
in which case they extend Object:

val x = new {
def cat(a:String, b:String) = a+b
}

The type-parameterized isInstanceOf method
is used to determine
if an object is an instance of a specific class:

if (x.isInstanceOf[Double]) ...

Similar to isInstanceOf, a value can be cast to a specific
type by using the type-parameterized asInstanceOf method:

However, the above construct is not typically used;
instead, that functionality is implemented with a
case statement, which simultaneously tests for a type
and sets a new value of that type:

x match {
case d:Double => //operate on d
case _ => //not a double
}

The isInstanceOf method can be used to test if an object
matches a trait as well as a class.
It can also be used to test an instance against a structural definition,
which can be used to test if an instance implements a specific method:

Class literals are written classOf[MyClass] as opposed
to MyClass.class as in Java.

Types

All values in Scala are objects, so (except for compatibility with
Java) there is no int/Integer or double/Double distinction.
All integers are of type Int and all doubles are
of type Double.
(In previous versions of Scala, either upper case Int
or lower case int was accepted, but convention now is
to use only the upper case version, and this may be enforced by
the compiler in the future.)

Type specifications are written as name:type rather than
type name as in Java.
This is to allow the type to be omitted in many cases, since Scala
does type inference.
For example, write n:Int rather than int n.

Types for generics are specified in square brackets [T]
rather than in angle brackets <T> as in Java.
Thus a generic type might be specified as F[A,B,C].

Scala supports covariant and contravariant type specifications at
the definition site.
These are declared with a leading + for covariant types and
a leading - for contravariant types.
Thus a function declaration F[+A,-B] means F is
qualified by a covariant type A and a contravariant type B.

Types can be specified with upper and lower bounds.
The expression T<:U means type U is an upper bound for T,
whereas T>:U means type U is a lower bound for type T.

Types can be specified with view bounds, which are similar to upper bounds:
The expression T<%U means type U is a view bound for T,
which allows for implicit conversion to T and can thus support more
actual types.

A higher kinded type with two type qualifiers, such as
Pair[String,Int], can be written in infix notation
by placing the higher kinded type name between its two type qualifiers,
such as String Pair Int.
This makes more sense if the higher kinded type name happens to use
operator characters such as +.
Thus when you see a type such as Quantity[M + M2],
as used in the Quantity class in
this file,
that is the same as Quantity[+[M,M2]], so look for a
type called + that takes two type qualifiers.

Existential types are supported with an expression like this:

T forSome { type T }

where the contents of the braces is some type declaration.
This is mainly used when interfacing to Java code that either has
raw types or uses Java's ? wildcard type.

The type T in an existential type specification can be
replaced by a more complex expression:

List[T] forSome { type T
In the above example, we are saying T is some type which is a subtype
of Component.

The shorthand

List[_]

is the same thing as

List[T] forSome { type T }

The shorthand

List[_
is the same thing as

List[T] forSome { type T

Type variables can be defined by using the type keyword.
Similar to a typedef in C, the type variable simplifies code when
a complicated type is used many times:

type ALS = Array[List[String]]
val a:ALS
val b:ALS

Type variables can also be abstract, in which case they must
eventually be defined by a subclass.

A trait may include code that accesses another trait, in which case
the class that includes the first trait must also include the second trait.
In order to make this work, the first trait must include a "self type"
referencing the second trait.
The self type declaration is the first line of the body, usually
declaring a type for this, but optionally using a
different name in place of this:

Inner class types can be referenced using the outer and inner class
names separated by a dot (.) as in Java, or using a pound sign (#).
The dot syntax specifies a path-dependent type; the pound syntax
specifies the generic inner class.
For example, if you had this code:

class Outer {
class Inner {}
}

then you would use Outer#Inner to refer generically
to that inner class.
If you had an instance x of class Outer, you would
refer to the specific class Inner in that instance by using
x.Inner, which is a distinct type from the Inner
class within any other instance of Outer,
and a subtype of the generic Outer#Inner class.

Function Definitions

The return type of a function is written after the function's
parameter list and preceded by a semicolon, similar to the type
specification for a variable.
For example, a function which would be declared in Java as

//Java code
public String toString(StringBuffer buf)

would be declared in Scala as

def toString(buf:StringBuffer):String

Functions which do not return a value are declared as having the
type Unit rather than void as in Java.
If a function never returns (such as if it always throws a Throwable)
the return type is Nothing.

A function with no parameters can be declared without parentheses,
in which case it must be called with no parentheses.
This provides support for the Uniform Access Principle,
such that the caller does not know if the symbol is a variable
or a function with no parameters.

The function body is preceded by "=" if it returns a value (i.e. the
return type is something other than Unit), but the return type
and the "=" can be omitted when the type is Unit (i.e. it looks
like a procedure as opposed to a function).

Braces around the body are not required (if the body is a single
expression); more precisely, the body of a function is just an
expression, and any expression with multiple parts must be
enclosed in braces (an expression with one part may optionally
be enclosed in braces).

Vararg parameters are declared by appending an asterisk to the argument,
like this:

def printf(format:String, args:Any*):String

The parameter gets turned into an array within the method, so
in the above example the args parameter
would have the type Array[Any]
within the body of the printf function.

Function Calls

When a class has an apply method, foo(bar)
(where foo is an instance of that class)
translates to foo.apply(bar).

Likewise for an object.
If you see Foo(bar) that is most likely a
call to the apply method of object Foo.

As with any method, the apply method can be overloaded,
with different versions having different signatures.

Functions are instances of a class (Function1, Function2, etc),
so the same rule applies to any function object.

A method named unapply in an object definition
is also treated specially:
it is invoked as an extractor when the object name is used
in a case statement pattern.

Functions with zero or one argument can be called without the
dot and parentheses.

But any expression can have parentheses around it, so you can
omit the dot and still use parentheses.

And since you can use braces anywhere you can use parentheses,
you can omit the dot and put in braces,
which can contain multiple statements.

Functions with no arguments can be called without the parentheses.
For example, the length() function on String
can be invoked as "abc".length
rather than "abc".length().
If the function is a Scala function defined without parentheses, then
the function must be called without parentheses.

By convention, functions with no arguments that have side effects,
such as println, are called with parentheses;
those without side effects are called without parentheses.

Function Sugar

"Syntactic sugar" is added syntax to make certain constructs
easier or more natural to specify.
The step in which the compiler replaces these constructs by their more verbose
equivalents is called "desugaring".

Functions with one parameter (including anonymous functions)
are instances of type Function1[A,B],
functions with two parameters are of type Function2[A,B,C],
etc.
The last type in the list of parameter types is the return value type,
so there is always one more than the number N of parameters.
A function with no parameters is an instance of Function0[A].
The name Function with no number is equivalent
to Function1.

(A,B)=>C is shorthand ("syntactic sugar")
for Function2[A,B,C].

A Function1[A,B] can be written as
(A)=>B, or as just A=>B.

A Function0[A] (i.e. a function with no parameters)
can be written as ()=>A.
This function can be called with or without parentheses (as
mentioned in the Function Definition section).

A function with no parameter list can also be specified
with no parentheses as =>A.
This function must be called without parentheses.
If you are declaring a variable x of this type,
the declaration looks like x: =>A.
This signature is often used for call-by-name parameters.

When passing an anonymous function (also called a function literal),
you can use a shorthand in which you directly write the body of the
function, using underscores where each of the function parameters is
to go (as long as Scala has enough information to infer the type).
For example, if you are folding a list to sum all the
elements, you can write it the long way:

val list = List(1,2,3,4,5)
list.foldLeft(0) { (a:Int, b:Int) => a+b }

or, by taking advantage of Scala's type inference
(and using the same value for list):

list.foldLeft(0) { (a, b) => a+b }

or, using underscores as in-line parameter placeholders:

list.foldLeft(0) { _ + _ }

or using the equivalent method /: (which also does a
foldLeft):

(0/:list)(_+_)

In this last form, we are taking advantage of the following shorthands:

The /: operator is equivalent to the
foldLeft method
(the List class defines both methods).

The foldLeft method
(and the equivalent /: operator method)
uses a curried parameter list, with the first parameter list having
only one method. This allows us to take advantage of the next step.

Since the foldLeft method takes only one parameter
(in the first parameter list), we can invoke it without
the dot and parentheses.

Since the operator name ends with a colon, it is right-associative,
so the list object goes on the right and the 0
argument goes on the left.

The second parameter list contains only one item (the function to
apply to the fold), and the function we are passing in has only
one expression, so we can use parentheses rather than braces.

Scala has enough information to infer the types of the two parameters
in the function literal, so we do not need to specify the types of
the parameters.

We are only using each parameter in the literal once, so we can use
the underscore shorthand and not have to declare the names of the
parameters in the function literal.

We can remove all the space without creating ambiguity.

If a function literal, as used in the above example, is a single method
call that takes only one argument, then the method name alone may
be specified.
Under this rule, this:

args.foreach( (x:Any) => println(x) )

becomes this (the other intermediate forms given above are also valid):

args.foreach(println)

Instead of using an underscore as a placeholder for an argument,
if a function name is followed by a space and an underscore,
the underscore is a placeholder for an entire argument list.
This is a partially applied function.