About Jason Baldridge

Variations for computing results from sequences in Scala

Introduction

A common question from students who are new to Scala is: What is the difference between using the map function on lists, using for expressions and foreach loops? One of the major sources of confusion with regard to this question is that a for expression in Scala in not the equivalent of for loops in languages like Python and Java — instead, the equivalent of for loops is foreach in Scala. This distinction highlights the importance of understanding what it means to return values versus relying on side-effects to perform certain computations. It also helps reinforce some points about fixed versus reassignable variables and immutable versus mutable data structures.

The task and its functional solution

To demonstrate this, let’s consider a simple task. Given a List of words, compute two lists: one has the lengths of each word and the second indicates whether a word starts with a capital letter or not. For example, start with the following list.

So, that’s it. However, let’s do this without using different calls to the map function (or multiple foreach loops, as we’ll see below). The easiest way to do this is to map each word to a tuple containing the length and the Boolean indicating whether its first character is capitalized; this produces a list of tuples, which we unzip to get a tuple of Lists.

The key thing here is that the map function turns the List[String] words into a List[(Int, Boolean)] — which is to say it returns a value. We can assign that value to a variable, or use it immediately by calling unzip on it, which in turn returns a value that is a Tuple2(List[Int],List[Boolean]). Before moving on let’s define a simple function to display the results of performing this computation, which we will do in various ways (and which all produce precisely the same results).

Okay, so now let’s start doing it the hard way. Rather than mapping over the original list, we’ll loop over the list with foreach, and perform a side-effect computation that builds up the two result sequences. This is the sort of thing that is typically done in non-functional languages with for loops, hence the use of foreach in Scala. We’ll explore each of these in turn.

The second variation: use reassignable, immutable Lists

We can use reassignable variables which are initialized to be empty Lists, and then prepend to them as we loop through the words list. We are thus using a variable that has the type of List, which is an immutable sequence data structure, but its value is being reassigned each time we pass through the loop.

Note that we build up the lists by prepending, which means they come out of the loop in reverse order and thus must be reversed before being displayed. You can of course append to a List by creating a singleton List and concatenating the two Lists with the ::: operator.

However, this is not recommended because it is computationally costly. Adding an element to the front (left) of a List is a constant time operation, whereas concatenating two lists requires time proportional to the length of the first list. That might not seem like a big deal until you are dealing with lists with thousands of elements, and then you’ll find that the same bit of code that prepends many times and then reverses is much faster than one which appends using the above strategy.

Third variation: use unreassignable, mutable (growable) ListBuffers

Next, we can use a ListBuffer, which is a mutable sequence data structure that also happens to support constant time append operations. We can thus declare it as a val, and then use the method append to mutate the sequence so that it has a new element at the end. So, the variables referring to the sequences are not reassignable, but their values are mutable.

Note that we must convert the ListBuffers to Lists for the call to display in order to have the right types as arguments to that function. Since they can efficiently grow (i.e., get longer), ListBuffers are a good choice for many problems where we need to accumulate a set of results, and especially when we don’t know how many results we will be accumulating. However, if you know the number of results you’ll be accumulating it’s probably better to use Arrays, as shown next.

Both of the above alternatives probably look a little strange to people coming from Java. In Java, you’d be more likely to do an imperative solution that involves initializing arrays that have the same length as words and then filling in respective indices as appropriate. To do this, use Array.fill(lengthOfArray)(initialValue).

We go through the indices and for each one compute the value and assign it to the appropriate index in the corresponding Array. Again, we need to convert the results to Lists before calling display. The indices method does exactly what you’d expect — it gives you the indices of the List.

A problem with the above foreach loop is that it requires indexing into Lists, which is generally a bad idea. Why? Because to get the i-th item from a list requires time proportional to i operations. Why? Because the implementation for obtaining an item at a particular index i involves peeling off the head of the list to get its tail, and then seeking for the i-1-th item of the tail, which requires peeling off its head and then seeking for the i-2-th item, and so on. So, if you want to get the 10000th item in a list, you have to perform 10,000 operations to get it. If the words list had 10,000 elements, you can now see that you’d perform 10,000 basic computations just on the foreach, and for each element you do 2*index operations to get the word at that index, which means doing 20,000 operations on the last index alone. Note that indexing into Arrays is a constant time operation, so there is no problem with the left hand side of the assignments in the above loop. You might think you can do better by first storing the word and then using it twice, e.g.

This is better, but it only saves us half the operations. Since we were perfectly happy to loop over the words themselves before, we actually shouldn’t have to do this look up — we can do better by having a reassignable counter index that allows us to set values to the correct positions in the new Arrays we are creating.

IndexedSeq is a supertype of sequences which are designed to be efficient for indexing, and Vector is the default “backing” implementation when you call toIndexedSeq on a List. Of course, if you are only ever going over all the elements of a sequence in order, then Lists are likely to be preferable since they have a bit less overhead and they have some nice properties for pattern matching in match statements.

Using predefined funtions for mapping over a sequence

Another thing worth pointing out is that if you have a predefined function, you can pass that as the argument to map, which can lead to very concise code for this task. Assume you have defined a function that takes a String and produces a Tuple of its argument’s length and whether it starts with an upper-case letter.

Of course, you would probably only do this if you needed that same function in other places. If not, it’s preferable to just use the anonymous function like in the first map example in this blog post. However, you can see that if you have a library of simple functions like this, you can now start writing much clearer and simpler code by reusing them when mapping over different lists.

For expressions

Notice that the previous loops were all foreach ones, whereas Java programmers and Pythonistas will be used to for loops. Scala doesn’t have forloops — it has forexpressions. A common question then is: What’s the difference? What is a for expression for and why isn’t it a for loop? The difference is that an expression returns a value, so while foreach allows you to plow through a sequence and do some operation to each element, a for expression allows you to return a value for each element. Consider the following, in which we yield the square of each integer in a List[Int].

Having said all this, it turns out you can use a for expression as a loop, without returning any values, e.g. as follows.

scala> for (num <- numbers) { println(num*num) }
16
81
81
4
9
64

I think it is generally better to use a foreach loop for such cases so that it is clear that you are only performing side-effects, like printing, reassigning the values of var variables, or modifying mutable data structures. However, there are some cases where a for expression can be more convenient, for example when working through multiple lists and doing various filtering operations. Here’s a quick example to give a flavor of this. Given two lists, we can enumerate the cross product of all their elements

There is much more to this, but I’ll leave it here since using for expressions in this way is a rich enough topic for several blog posts in and of itself. Also, there is a detailed discussion of it in Odersky, Spoon, and Venner’s book “Programming in Scala.”

Fifth variation: use a recursive function

It’s worth pointing out one other way of building up lengths and caps lists. Recursive functions are functions which look at their input and then either return a result for a base case or compute a result and then call themself with that result. It’s pretty standard stuff that computer scientists love and which tends to get used a lot more in functional programming than in imperative programming. Here, I’ll show how to do the same task done before using recursion, but without an in-depth explanation, so either you’ll already know how to do recursion and you can see it in Scala for the same problem context as above, or you don’t know much about recursion but can use this as an example of how it is employed for a task you already understand from the vantages given above. So, in the later case, hopefully it will be useful in conjunction with other tutorials on recursion. First, we need to define the recursive function, given below. It has three parameters: one for the list of words, one for the already computed lengths and another for the already computed caps. It returns a pair that has first the list of lengths with one additional item prepended to it and then the list of caps values with one additional item prepended to it. The items being prepended are computed from the head of the inputWords list.

We can call this function directly, but it is often convenient to provide a secondary function that makes the initial call to this function with empty result lists as the second and third parameters. The secondary function can then perform the reversal and return the desired computed lists.

A slight variation on this that is slightly cleaner is to “hide” the recursive function inside the secondary function, which then effectively acts as a wrapper to the recursive function. This is often considered cleaner because the programmer can ensure that the initialization is done correctly and that the recursive function itself isn’t given malformed inputs.

Conclusion

So, that provides an overview of different ways of obtaining the same results and some explanation of the different properties of each solution in terms of computational considerations that are likely to crop up in your code and you should be aware of.

Clearly there are many ways of getting the same thing done in Scala. This can be hard for newcomers to the language since they don’t have good intuitions about which approach is better in different circumstances, but it is quite valuable to have these options as you become more savvy and understand what the costs and benefits of using different data structures and different ways of iterating are.

All of the code from the above snippets are gathered together in the Github gist ListComputations.scala. You can save it as a file and run it as “scala ListComputations.scala“ to see the output and play around with modifications to the code.

Newsletter

Join them now to gain exclusive access to the latest news in the Java world, as well as insights about Android, Scala, Groovy and other related technologies.

Email address:

Recent Jobs

No job listings found.

Join Us

With 1,240,600 monthly unique visitors and over 500 authors we are placed among the top Java related sites around. Constantly being on the lookout for partners; we encourage you to join us. So If you have a blog with unique and interesting content then you should check out our JCG partners program. You can also be a guest writer for Java Code Geeks and hone your writing skills!

Disclaimer

All trademarks and registered trademarks appearing on Java Code Geeks are the property of their respective owners. Java is a trademark or registered trademark of Oracle Corporation in the United States and other countries. Examples Java Code Geeks is not connected to Oracle Corporation and is not sponsored by Oracle Corporation.