Sunday, September 25, 2011

I recently decided to connect to the Project Euler site in order to practice on a regular base mathematics. The website proposes in order to begin some simple problems based on logic and solvable - most of them - by computer implementation.
I found the site problems fitting nicely in the context of execution of regular code kata. The idea being to (re)use some basic concepts of my favorite languages (java included, don't smile).
These problems matter to me because some of them involve the use and calculus of prime numbers. Believe me or not I actually use prime numbers almost every day.
In addition to the well known usage of prime numbers in computer science, from a personal point of view, prime numbers are very suitable in writing tests when you find yourself challenging the binding of long properties using range of parameters run by JUnit Theories, but they also can provide unique output results when provided as input to algorithms processes.
Test takes between a fifth to a third of my coding time whether doing TDD or laying test harness around c... sorry legacy code.
In order to generate prime numbers, you got to know how to identify them, but also how to generate them.

Why is it so ? Because as with natural integers, working with prime numbers means working with infinite suite, Yeah cool :)
This is where working with a recursive suite is not really possible because of the nested aspect of methods calls. A function in charge of defining one prime number, then the other and so on would run ad infinitum.
What we would like would be running a little our prime number generator, then use its result in some processing , than run it again and so on. This approach breaks a little bit our habits of thinking in terms of function invoking other functions (or routines invoking subroutines). The astute functional programmer can see clearly now where to this talk is going.
A solution is in delaying the calls. This approach has been nicely exposed in the sicp book but also in the incredible Peter Henderson's book: Functional programming: Application and Implementation. You can also watch H. Abelson lecture 6A here.

Let's take a naive approach. Just imagine we want to generate on will the natural numbers starting from 1. Invoking the recursive:

(defn integers-from [n]
(cons n [integers-from (+ n 1)]))

using

(first (integers-from 1))

would blow our stack because the function integers-from would call herself , evaluating first the (+ n 1) expression. Calling the function evaluating first (+ n is) is named a call by-value
What if we could delay this evaluation, promising to the function to evaluate herself if necessary (on demand). The implementation would look like

There is one simple way to delay this evaluation using a simple macro:

(defmacro promised [expression]
`(fn [] ~expression))

The macro evaluation will lead to the creation of a closure holding the integers-from expression. The example is contrived and only serves our purpose. When needed we could then force the delayed sequence to execute the following evaluation:

but we would have to redefine all the nice higher order functions in Clojure in order to serve our purpose, explicitly forcing the fulfillment of promises .
The generated sequences are named Streams.

Well, Clojure embeds nicely this lazy evaluation mechanics (in clojure.core ) using the force and delay macros, but even more nicely providing the lazy-seq macro, in extenso producing implicitly lazy sequences. Using lazy sequences you don't even deal with force and delay. Quoting Stuart Halloway and Aaron Breda, you pay only for what you need (Programming Clojure 2nd edition is out and covers Clojure 1.3, just released by the way).

The idea of delegating the evaluation of an expression passed to a function is named the call by-name in opposition to the call by-value.

But wait a minute. This means, that the expression should be evaluated each time it is explicitly used. Nope pals ! Creators of Clojure have inserted a memoization mechanic allowing the lazy sequence to cache the result on its very first invocation. So now we have call by-need invocation, the expression being evaluated once.
But beware. Although lazy sequences can be very valuable to delay the evaluation of big data structures, or very long I/O operations, there can be a memory saturation effect, specifically if you keep a reference to the head of the sequence.A kind of dog tail effect.

Using lazy sequences our number generator becomes:

(defn numbers-from [n]
(lazy-seq (cons n (numbers-from (+ 1 n)))))

and you can try taking the first ten numbers

algorithms.tools=> (take 10 (numbers-from 1))
(1 2 3 4 5 6 7 8 9 10)

using the nice embedded higher order functions.
What about my prime number list so I can solve my first problems ? We only need to create a stream of prime numbers:

I played with the function variable arity in Clojure and made the stream construction the result of a function instead of a var so no reference to the head is kept, in consequence no memory leak due to caching. Each time you need to manipulate it, you create it.
For reason of symmetry I kept the lazy-seq chain of calls in the empty signature declaration in order to create the beginning of the stream.
Making of (prime-stream 5) a lazy sequences is making the promise of calculating the upcoming prime numbers.
Good...almost.

Now I need to process the primes. My implementation is inspired from well known heuristics taking into consideration known facts like:

primes are not even

primes are certainly not dividable by five

every prime greater than 3 can be written in the form 6k+/-1

...

These known heuristics save the day avoiding a brute force approach consisting in trying to divide each number N by numbers from 1 to N-1, which can become very expensive in terms of CPU consumption and very long versus the time of completion. The algorithm can be refined and its time of execution shortened.
A first draft looks like :

Congratulation you have solved one of the Project Euler problems. The following problems are harder, believe me :). So take a shot if you have time.

Coming to Scala, streams are implemented as... Stream objects and are nicely exposed by Joshua Suereth in his book. The nice thing with streams in Scala is that methods can be defined or redefined with almost whatever symbols you may like and the cons method is specifically pleasant:

The sealed trait exists because of experiments I am achieving. The tail recursive optimization can be marked with the @tail.recur annotation. So far, so good, the other methods are carbon copy of the Clojure code:

although this time I preferred using the idiomatic match/case pattern of Scala. The Stream is guaranteed is to cache the values using some internal mechanics. And that is nice because having a look at the #:: method I discovered this:

This little exploration reveals the Scala form expected to reference a call by-name parameter:

=> Stream[A]

only invoked when explicitly used. That is slightly different from expecting

() => Stream[A]

where the expression is supposed to have been evaluated beforethe method call. (Nice and clean demo in Martin Odersky's book)
Playing a little bit with the call by-name in Scala lead me to confirm that called by-name parameters should always be free from side effect.

Basically I fire a runnable in charge of calling by name the method f, passing through a closure, closing onto the variable item (never use mutable values, never).
The standard output provides me with:

I volunteerly changed the mutable value and came to an unstable state. And yes indeed the call by-name parameter has been invoked twice, meanwhile the mutable value was altered.
We can simulate a call by-need effect using a lazy value:

Sunday, September 18, 2011

For once no Clojure, nor Scala in this post.There is no whining in saying that as a java developer by day, I too rarely have the opportunity to learn or teach something, because of the attitude of diletante people more concerned by "making a career" than committing into their developer's job. But last friday a nice thing happened. I went lucky playing with Junit rules.

Let us settle the environment. I am working on the worst project I have ever been working on. By itself the project is very small (less than 70k lines of code).
It does clearly break all the rules of decent architecture reasoning in any way, its internals being deeply rotten by terrible missuses of the already invasive and useless Spring framework, but also tons of other under used dependencies (typically Apache open sources).
I won't debate over the use of Spring because I don't want this blog to be polluted by frameworks advocates sentences without interest. I am an old dog paid for using his brain rather than pressing dumbly on buttons like a monkey. I never had the need to use anything but standards API. I certainly do not need a jackhammer framework invading my perm gen to pass parameters to a constructor. End of debate.

As a TDD advocate, armed with Martin Fowler and Keriewsky books on refactoring, but also Michael Feathers and Meszaros writings, I patiently implement or fix the modules using Junit. Junit is indeed a (really small) framework per se, but above all Junit is a tool (which Spring is not). Junit is a carpenter chisel compared to the cited jackhammer.
One impact of the unfortunate pervasive injection is the difficulty in testing classes, sometime hosting nearly twenty injections. One other common problem dealing with legacy code, is exception testing and resources freeing. These two aspects rarely live in perfect harmony.

I chose to solve the invasive injection problem using (alas) mocks, accessible via seam points, presented by M. Feathers in his book
As far as the exceptions are concerned Junit provides us with helpful classes and annotations.

Let see through an example. The following code reproduces a typical recurring pattern in the application but of course I changed names in order not to face copyright problems (who would want to copyright that anyway :)) :

This recurring pattern mixes technical exceptions with generic exceptions, expects some resources freeing, and also some logging. Of course there is also unconventional error code attached to the exceptions.

The translation from one error code to another is one of the strangest things I have ever witnessed. I cannot describe they purpose but take for granted that as a matter of fact an exception in this system is not passive and fires action on creation (yes, who can believe it)

All the logged information must be tested, and at least the exiting exception code must also be tested before refactoring. It is vital too, to check that resources are released and messages are logged. (The content matters, guess why ;))

The complete demo code I wrote for the purpose of testing, looks like:

We would like to test more on the exception, that these damned resources have been released and that messages have been logged.

The first point can be cleared, sort of. Here come the Junit rules. I encourage you to hit Junit rules on Google and you will find deeper explanations than mine (here and there for example). Basically, Junit rules provide you with a nice way to intercept the call to your test, using a decorating class named a Statement. The statements are ruled, meaning that a rule implementation can choose on which test to apply the statement.

In the previous links, Dale H. Emery provides a nice detailed explanation of the the sequence of events during the execution of ruled tests.
See the order of event as a kind of @InvokeAround like in J2EE 5/6.

I used an ExpectedException instance. At the very start I declare an empty expectation, that will be applied to all my tests. An empty expectation expects...nothing. If I do not want to challenge an exception, the behaviour is crystal clear for the test: nothing gets matched.

Before firing the tests per se I set my rule expectations using standard Junit matchers (such an elegant DSL...). The ExpectedException definition allows me to challenge both the class and the message content.
So far so good, but an exception is still raised and no check is applied onto the system under test (SUT) afterwards. The test is still green while we willingly declared a wrong assertion.

Hum, what about making our own rule?

What do we want? Some contract that would check at least that the resources are freed, something like that:

I propose you a solution implementing the TestRule interface, the MethodRule interface being deprecated as of the 4.9 version. No copyright, till I did the job for the guys in 4.8 using MethodRule ;) :

The apply method takes as input parameter a statement in charge of invoking the underlaying test method.
This base statement is then wrapped by a ContractedStatement that will run the action throwing a Throwable if anything goes wrong.

The contracted statement always executes the postChecked method hosted by the Contract.
In essence postCheck will execute a set of registered blocks of code to apply.

Unfortunately this is not Scala we are working with, so the Java abstraction that looks like most to a block of code is a Callable, as being able to throw an exception if necessary, and returning some value (although we do not need this feature).

From the test all becomes clearer. We only have to create callables on demand, each callable wrapping the expected assertions:

This step, I decided, to make it as a hack of the ExpectedException class creating my own ExpectedGenericException.
This test helper would then be specific to the family of tests in my business layer as it targets the whole hierarchy
of business/technical exceptions rooted by the same GenericException.
The Rule class is the ExpectedGenericException:

As in our previous example the apply method offers a possible entry point where, after decorating the incoming base
statement with our own statement, we resend it. Our stament will basically filter a generic exception raise and apply defined Junit matchers.

The matchers can be registered through the invocation of the expected method from the testing class.
The expect method, that reuses the hamcrest both method, chains all the matchers in charge of validating the exception content.
We there reuse the code pattern of the default ExpectedException in order to merge the differente matching expectations.

We basically match a class type, a message content, then a code value. In order to gracefully merge the matchers in
a chain of matchers we also decorates each of them into package scoped instances of the classes GenericExceptionMessageMatcher and GenericExceptionCodeMatcher:

Let resist to the attempt
of "generifying" more than necessary the wrappers and making our ExpectedGenericException
more... generic. We are not building a framework. More complexity is not needed (YAGNI principle)

Test green. We are done !

Soon we will talk again about Clojure and Scala I hope, so
be seeing you !!! :)

Sunday, September 11, 2011

Hello again. When time flies and you find yourself with your hands full of books and article to read, finding and writing about some subject can become very hard. In such times I try to look in my bag of things I-always-wanted-to-do-but-never-took-time-to-do (you can breath) . I had a little bit more than two hours to kill yesterday before digging again in Clojure in Action and the new chapter edited by Joshua Suereth in Scala in Depth.
There remain few basic algorithmic problems I want to tackle before switching to distributed algorithms. One of them is the famous eight queen problem. Fortunately Martin Odersky presented a Scala version in his book, in order to expose a nice use of the comprehensions in Scala. So, unfortunately for me Scala was not an option to implement a solution.

I wanted it recursive and having committed myself into the study of Clojure for a few months now, the language was an option. Although I am not at the level of expert Lisp and Clojure developers I took the liberty to try my very first personal implementation.
The result is not the result I expected, but as usual any critic will be welcomed: although being french I do like learning from critics as from failures and retries (really need to embrace a new nationality).
In conclusion a very brief and relaxing exercise today.

As a reminder you can get details about the the eight queen puzzle there. Basically our purpose is to dispatch eight queens on a chessboard so that, none of them can attack the others. Applying the basic rules of chess game, two queens cannot be located on the same rows, nor columns nor diagonals.
Numerous readers must have stopped there finding that too basic.
For the others let's continue. Indulge me and consider that it would more elegant to provide a solution dealing with the dispatch of n queens on a n sized chessboard (the others should not have left).
In order to solve the problem I adopted the same attitude as when solving the Hanoi Tower problem.
Let us consider the problem partially solved - one say - on a four sized chessboard. At step 3, my only partial solutions can be expressed as list of list of ...list:

A tuple represents a queen position.
The adopted rules being to index the upper left corner as (1 1) and the lower right as (4 4), the car of the list being the row number and the cdr being the column number.
A list of positions is a partial solution to be completed
And the list of list of list (:)) is the list of solutions.
No big deal there.

The solutions are to be read from left to right, the first element being the most recent position found. The annoying empty list at the beginning of the solution is due to my choice of starting from no solution.
All I have to do is considering all the possible positions on the fourth row and choose the matching ones that would lead to acceptable solution. The scenario is applicable whatever is the step index value in a n sized chessboard.
In the four cell chessboard the two solutions are:

One should notice that the solutions are symmetric and there might be interesting exercises to implement in order to reduce the number of solution to the reduced forms by symmetry considerations.

Considering the eight queen problem we are to find 92 solutions.

I usually do TDD. And I will present you the unitary tests. But I have to confess that I did not start like that. You generally start implementing some algorithm in pseudo code before writing anything, but the conciseness of Lisp languages (they are homoiconic languages) makes them natural candidates to write the algorithm as-is.

I wrote what came in mind and backwardly tested my methods one after the other adjusting the first shot. Worked nice for me:
I worked both with IntelliJ and Leinengen, with the following project.clj configuration:

The entry point of my algorithm is where I set my intention to match the different positions for a queen at step k , within a n dimension chessboard. Using Abelson and Sussman wishful programming approach lead me to:

This is where the little annoying empty list comes from.
As Martin Odersky, I define my no-solution expression as an empty list. In Clojure the empty list is not a bottom form marking the end of a list like Nil. I learned that during the exercise :), but too late.

At the very top we find the working namespace definition (algorithms.n-queens), so my file is located under some algorithms directory and the code implemented in a n-queens.clj file.
Starting, literally at step 0, I do return a (list(list(list))), otherwise I define my set of solutions as the solutions recursively found for a decremented number of steps and the range of positions to be tested.
I then return a list of the possible new solutions as a result from comparing all possible positions versus already found solutions.
The easier function to be tested is possible-positions in charge to extract all the possible positions in the row at-step. Driven by test we start with something like:

Provided at the top is the namespace definition including the clojure.test namespace.
At step 1, in a two dimensional chessboard I expect ((1 1) (1 2)) as possible positions.
The matching implementation is:

We use a map higher order function in order to apply a #(list at-row %) lambda function taking as parameter all the range from 1 to the boundary limit (upper limit is exclusive so we have to increment the boundary value)
In the placed-queens method the (possible-solutions for-positions) is surrounded by parenthesis because I decided to use a closure. I found more natural to close onto the set of possible positions, and then find a match for each possible solution. That 's the purpose of

(mapcat (possible-solutions for-positions) solutions)

mapcat allowing me to slice a list of list of solutions into recursive list of solution, and I must confess once more that I found it using TDD. No magic, just the good effect of testing. A typical test scenario for the possible-solutions implementation would be:

where starting at row 4 with the already found partial solution ( (3 1) (2 4) (1 2)) we would find ((4 3) (3 1) (2 4) (1 2))
Of course keep warm the test. It won't pass, we do not have the function yet. My function definition can be :

with the help of matching and the help of safe.
The expression (matching solution) defines yet another closure which captures a solution and allows the challenge of a possible position versus all the positions of the captured solution.
This closure is reused in the safe function to filter the possible positions matching, in effect the partial solution. From these filtered positions combined to the en-closed solution we will build new solutions. Take your time, higher order functions and their use is not always simple.

The function possible-solutions rebuild then a list of new solutions from nil, each new solution appending a filtered position to the en-closed solution. From each solution we can have multiple possible new solutions. We return a list of lists (solutions) of ...lists (positions).
The bunch of tests used to qualify this approach are:

trying effective different scenarii on four dimension chessboards...But you can't test'em till the safe? function invoked from matching is implemented.

The safe function basically creates yet (yet) another closure, closing onto a new possible position. This closure will test all the existing positions in a solution in order to ensure that the enclosed position can be appended to form a new solution.
I need tests to challenge the safe? function: