Sunday, September 25, 2011

kinna' stream attitude calling by-name my Clojure and my Scala

Hello, some very simple stuff - cross my heart - this week.

I recently decided to connect to the Project Euler site in order to practice on a regular base mathematics. The website proposes in order to begin some simple problems based on logic and solvable - most of them - by computer implementation.
I found the site problems fitting nicely in the context of execution of regular code kata. The idea being to (re)use some basic concepts of my favorite languages (java included, don't smile).
These problems matter to me because some of them involve the use and calculus of prime numbers. Believe me or not I actually use prime numbers almost every day.
In addition to the well known usage of prime numbers in computer science, from a personal point of view, prime numbers are very suitable in writing tests when you find yourself challenging the binding of long properties using range of parameters run by JUnit Theories, but they also can provide unique output results when provided as input to algorithms processes.
Test takes between a fifth to a third of my coding time whether doing TDD or laying test harness around c... sorry legacy code.
In order to generate prime numbers, you got to know how to identify them, but also how to generate them.

Why is it so ? Because as with natural integers, working with prime numbers means working with infinite suite, Yeah cool :)
This is where working with a recursive suite is not really possible because of the nested aspect of methods calls. A function in charge of defining one prime number, then the other and so on would run ad infinitum.
What we would like would be running a little our prime number generator, then use its result in some processing , than run it again and so on. This approach breaks a little bit our habits of thinking in terms of function invoking other functions (or routines invoking subroutines). The astute functional programmer can see clearly now where to this talk is going.
A solution is in delaying the calls. This approach has been nicely exposed in the sicp book but also in the incredible Peter Henderson's book: Functional programming: Application and Implementation. You can also watch H. Abelson lecture 6A here.

Let's take a naive approach. Just imagine we want to generate on will the natural numbers starting from 1. Invoking the recursive:

(defn integers-from [n]
(cons n [integers-from (+ n 1)]))

using

(first (integers-from 1))

would blow our stack because the function integers-from would call herself , evaluating first the (+ n 1) expression. Calling the function evaluating first (+ n is) is named a call by-value
What if we could delay this evaluation, promising to the function to evaluate herself if necessary (on demand). The implementation would look like

There is one simple way to delay this evaluation using a simple macro:

(defmacro promised [expression]
`(fn [] ~expression))

The macro evaluation will lead to the creation of a closure holding the integers-from expression. The example is contrived and only serves our purpose. When needed we could then force the delayed sequence to execute the following evaluation:

but we would have to redefine all the nice higher order functions in Clojure in order to serve our purpose, explicitly forcing the fulfillment of promises .
The generated sequences are named Streams.

Well, Clojure embeds nicely this lazy evaluation mechanics (in clojure.core ) using the force and delay macros, but even more nicely providing the lazy-seq macro, in extenso producing implicitly lazy sequences. Using lazy sequences you don't even deal with force and delay. Quoting Stuart Halloway and Aaron Breda, you pay only for what you need (Programming Clojure 2nd edition is out and covers Clojure 1.3, just released by the way).

The idea of delegating the evaluation of an expression passed to a function is named the call by-name in opposition to the call by-value.

But wait a minute. This means, that the expression should be evaluated each time it is explicitly used. Nope pals ! Creators of Clojure have inserted a memoization mechanic allowing the lazy sequence to cache the result on its very first invocation. So now we have call by-need invocation, the expression being evaluated once.
But beware. Although lazy sequences can be very valuable to delay the evaluation of big data structures, or very long I/O operations, there can be a memory saturation effect, specifically if you keep a reference to the head of the sequence.A kind of dog tail effect.

Using lazy sequences our number generator becomes:

(defn numbers-from [n]
(lazy-seq (cons n (numbers-from (+ 1 n)))))

and you can try taking the first ten numbers

algorithms.tools=> (take 10 (numbers-from 1))
(1 2 3 4 5 6 7 8 9 10)

using the nice embedded higher order functions.
What about my prime number list so I can solve my first problems ? We only need to create a stream of prime numbers:

I played with the function variable arity in Clojure and made the stream construction the result of a function instead of a var so no reference to the head is kept, in consequence no memory leak due to caching. Each time you need to manipulate it, you create it.
For reason of symmetry I kept the lazy-seq chain of calls in the empty signature declaration in order to create the beginning of the stream.
Making of (prime-stream 5) a lazy sequences is making the promise of calculating the upcoming prime numbers.
Good...almost.

Now I need to process the primes. My implementation is inspired from well known heuristics taking into consideration known facts like:

primes are not even

primes are certainly not dividable by five

every prime greater than 3 can be written in the form 6k+/-1

...

These known heuristics save the day avoiding a brute force approach consisting in trying to divide each number N by numbers from 1 to N-1, which can become very expensive in terms of CPU consumption and very long versus the time of completion. The algorithm can be refined and its time of execution shortened.
A first draft looks like :

Congratulation you have solved one of the Project Euler problems. The following problems are harder, believe me :). So take a shot if you have time.

Coming to Scala, streams are implemented as... Stream objects and are nicely exposed by Joshua Suereth in his book. The nice thing with streams in Scala is that methods can be defined or redefined with almost whatever symbols you may like and the cons method is specifically pleasant:

The sealed trait exists because of experiments I am achieving. The tail recursive optimization can be marked with the @tail.recur annotation. So far, so good, the other methods are carbon copy of the Clojure code:

although this time I preferred using the idiomatic match/case pattern of Scala. The Stream is guaranteed is to cache the values using some internal mechanics. And that is nice because having a look at the #:: method I discovered this:

This little exploration reveals the Scala form expected to reference a call by-name parameter:

=> Stream[A]

only invoked when explicitly used. That is slightly different from expecting

() => Stream[A]

where the expression is supposed to have been evaluated beforethe method call. (Nice and clean demo in Martin Odersky's book)
Playing a little bit with the call by-name in Scala lead me to confirm that called by-name parameters should always be free from side effect.

Basically I fire a runnable in charge of calling by name the method f, passing through a closure, closing onto the variable item (never use mutable values, never).
The standard output provides me with:

I volunteerly changed the mutable value and came to an unstable state. And yes indeed the call by-name parameter has been invoked twice, meanwhile the mutable value was altered.
We can simulate a call by-need effect using a lazy value:

4
comments:

Ouups, thank you for your remark. I was quite light on that one. I rewrote the small part which summarizes harshly the document referenced by the link. Thanks again and do not hesitate to come back to me if something is incorrect.I started reading your blog and really enjoyed it :)