Wednesday, January 25, 2012

Today I want to share some of my ramblings while exploring a little bit the Disruptor framework from a Scala eye. It is never too late. This little inspection was encouraged by David Greco, and Martin Krasser kindly suggested me to also have a look to a Scala port implemented by Jamie Allen.

Before dissecting this last reference I decided to read the classic stuff on the subject, from the seminal technical article up to Trisha Gee's blog but also read again the very interesting articles written by Martin Thompson here. In order to get a better understanding I also suggest this reading and that one. Naturally on the pile of all that documentation do not forget Martin Fowler's must read production over here.

Everything has been said about the Disruptor and I really advocate you to read about the subject before reading the incoming article if interested. My incoming summary in a few line is going to be too harsh to be considered as a serious presentation, and I hopefully expect that some of the experts will read one day the article so to correct my misconceptions.

Roughly cloning Martin Fowler's article introduction let say that,as all financial trading platforms, the LMAX platform relies on business logic core components that must interact with the outside world as fast as possible, supporting a throughput of dozen of millions of messages per second.

The chosen approach of the implementation find its name in the disruptive way of thinking that one must adopt so to understand the internals of the Disruptor. Working with disruptors involves first thinking "asynchronous", then accepting that simple and elegant solutions to exchange messages can be better than messaging based on queues. The pre-cited articles explain in a detailed manner the why and of the how at the origin of the astounding high level of performance of Disruptors compared to standard queue systems.

The core business logic is surrounded by Disruptors. At the heart of the Disruptor, we find a ring buffer as the corner stone of the architecture, fed with one or more producers, and feeding one or more readers. And guess what, the one producer/ many readers configuration is the optimum combination.

Readers and producers run in their own thread of execution. We achieve the communication between readers/producers and the ring Buffer through barriers created by the ring buffer, although in the former case the producer(s) barrier is merged with the ring buffer.

The fixed size ring buffer creates all its indexed slots during instantiation, the memory being allocated once and for all avoiding annoying GC's. peaks.

The producer claims for the available slots index, get the matching slots, updates the slot's content, and commit them. '

A reader waits through its sequence barrier for the available slots.

On producer side, events are created by an event factory.

On the consumer side, event handlers process the dispatched events.

The framework produces a lot of support in order to ease the creation of typical consumers'configurations like pipe line configurations:

or diamond configuration for example:

Notice that the performance tests implement these configurations in the Disruptor source code.

Before diving into the Scala port, I wanted to reproduce some configurations in Scala, the code exploration allowing me to get a better understanding of how thing goes. This is me: I need a client API point of you before understanding it all.

I decided not to adopt the Disruptor wizard approach and to dive down to the layer just under the wizard, creating by myself the barriers, handlers and so on. BatchEventProcessor remains the only generic class definition I kept in my explorations. Roughly said, a BatchEventProcessor instance while executed in its own thread of execution, takes in charge the access to the sequence barrier, getting the event, and passing them to an EventHandler instance. So a BatchEventProcessor basically impersonates a consumer.

The (little code) code project I produced is located here (yes a Github project finally :)).
What I want to run are little scenarii reproducing some of the configurations. Compared to tests I over simplified the design. The (very basic) event management is hosted in the body othe the EventModule object declaration:

Our events wrap a long value using the LMAX PaddedLong class definition. Handlers will count the numbers of received instances. I used the counter for personal manual testing and checking all the events where passed to handlers. I introduce some optional countdown latch, very useful to track the time when the last consumer in my configurations has achieved its duty.
What we basically need, is to measure the times of execution. So we need something like that:

The BatchEventProcessor instances must be run into their own thread of execution. The implicit throughput method will allow for the application of a small mean calculus to any Long in the scope of the definition.

Therefore, It would be nice if we could run our tests without having to take into account all that blurry management of

So far so good. I have all the tools I need to run a Scenario. The * character at the end of the typing declaration of the first parameter expresses a variable number of argument processors. Why not creating a trait to gather all the basics:

An Scenario allows to play with a set of processors a (micro!) bench execution. A Configuration case class instance groups all what's needed to play multiple times a fixed number of iterations. Notice that we use our implicit throughput method application on the result of time measurement so to return after a play the measured number of messages per second.
The run method is the entry point of the little bench execution, returning a list of throughput values, their number also being set by an input configuration instance.

No better start than implementing the most simple possible example: a producer fills a ring buffer while a consumer handles the events. This code sample will provide us with an overview of the main class involved in the process:

From the creation of the barrier to the settings of the ring buffer gating sequence, the trained eye can recognize a typical Disruptor code pattern. But what is a Shooter ? I defined a class able to feed the ring buffer, from its own thread of execution and taking a minimum number of parameters, aka, the number of iterations, a ring buffer and a way to set up a created event (fillEvent method).

If you did not read Philip Haller and Franck Sommers' book on actors and actors in Scala...you should :).
In order to shorten the size of the react method implementation I used an implicit method declaration allowing for an "ad-hoc" extension of the ring buffer API, allowing it to shoot some event update strategy by itself.

After all who better than a ring buffer knows how to get a sequence and publish it ?

Although not comfortable with implicits, I do not find disturbing using them very close to their context of application, specifically when they help in solving a kind of expression problem.

When it comes to reuse the same patterns of code I can become very lazy (as a lot of people). Having a little time at home I decided I could simplify a little bit the creation of other configurations. What if I could create a pipe line configuration like that :

in order to pipe three consumers, just getting back the BatchEventProcessors paired with the handlers for purpose testing.

This is when I switched to panic mode. Of course we are building a state configuration for the ring buffer so we have to meet again with the state monad we studied last time. Implementing it was hard the first time, not being sure I had understood everything about it. I have difficulties with the sate monad, so here comes an opportunity to learn how to use the pattern...again

Everything I need can be gathered in the pipe method, located in the Builder object:

Aliasing a pair (BatchEventProcessor/EventHandler) as a Context for a given Event type, I define this type alias as my mutable state.

We are going to pile each step, because we need all the created processors at the end of the configuration execution, and maybe we need the handlers too so to check their internal counters. The value returned by the application of state Monad instance will be a sequence barrier. Returning a sequence barrier can be useful when attaching multiple consumers to the same barrier.

Ideally we would need only the processors to submit them in some executor service. Piping the consumers consists in creating the barrier of one consumer using the sequence of the previous one, so the former can handle an event after the latter did. I advocate you once more to study a little bit the articles I provided the links to, before understanding the implementation of the tool methods used for configuration in the Builder object.
As a result the implementation of a pipeline becomes:

Returning a list of tuples (BatchEventProcessor/EventHandler), requires a little of trickery with the unzip method, but that cluttering can be easilly removed, because except for testing we do not need to get back the handlers.

The Github repository project content provides two more methods used to build diamonds and I am certain, other possible variations on configurations.

Running my tests provides the same results as running the tests located in the LMAX source code in terms of throughput values.

Having reduced the number of
iterations to 25 millions and extended generously the ring buffer size, depending on the tested configuration, I get from 3.5 millions to 5 millions messages per second.

My old machine not being sized for this kind of stress (only two cores) , the results values are smaller than the benchmarks' result values presented by the LMAX team.

But my old laptop still runs five to ten time faster the Disruptor tests than the message queue tests. I find the result quite interesting, being sure that I do not host any parallel desktop application on this laptop that can handle 5 million message per second or even one million message per second.

Somehow adopting a disruptive mind could be a smart move not only in large scaled applications.

Wednesday, January 18, 2012

between administrative intricacies this week, among other things, I took the time to reproduce both in Clojure and Scala a small exercise found in Real World Haskell (RWH).

This blog entry will be very small as I simply provided in each language a way to implement the algorithm.

The algorithm is the Adler32 checksum algorithm as presented in RWH. (You will be able to see the link at the end the protest on the Wikipedia site). Trying to decode the three code samples while the Wikipedia link is blacked out for protest, can be also seen as an interesting exercise :).

The Adler32 algorithm is an algorithm invented by Mark Adler in 1995 and used in the zlib compression library. I see these katas as an interesting mean of learning new things on a daily base (isn't it our job to learn and understand better than use blindly external frameworks?).

For copyright purpose I provide here my version of the algorithm and not the one in the book, as I tried to produce my own haskell version

where I naively used a derive routine in order to dispatch my multimethod using the class function as a dispatcher. My dispatching mechanism resolves now the clojure.lang.LazySeq instances as children of ::collection :

Monday, January 16, 2012

While looking for more real world example in order to complete a previous blog entry, I found myself struggling with the State Monad in order to solve what I supposed to be a typical State Monad problem.

My rambling to solve this specific problem did not succeed while in the mean time I successfully reproduced the canonical sample of a stack manipulation extracted from Learn You a Haskell (LYAH). Although I am not satisfied with the result I would like to expose this "kata", and will ask for your feedback in order to reach to the expected goal.

I specially thank Nilanjan Raychaudhuri - the author of Scala in action - for his precious help. Reading chapter 10 from his book confirmed I was working into the right direction.

Reproducing the LYAH example remains a fruitful exercise in that sense that it constrains you to use Scala idioms (typing, self type annotation etc.) and forces you to think about some of the inner mechanics of the for comprehensions.

In order to expose the interest of reproducing state management in functional programming languages, Miran Lipovaca presents a three coins problem simulating the extraction of results from tossing a coin, and a stack manipulation problem. From the point of view of imperative languages, the random generator internals or the stack internal would be easily modifiable, mutable objects allowing to generate new numbers or alter the stack state.

In pure functional language, we manipulate immutable data. We have to create a new value object each time the equivalent of a state change occurs. But what if we could separate the flow of data from the side effect manipulation of the change of state.

And that, is specifically our purpose, embedding a change of state in a dedicated instance. The secret lays into the abstract representation of this change of state as a function:

def apply(state: S): (T, S)

where S references the type of the state to be changed, and T is the type of the result of the stateful computations. The whole class hosts the apply function (so is applicable by itself in Scala), and impersonates the context that contains the state management. You apply the context in order to get your result value:

contextInstance(previousState) = (result, newState)

For the same price, you get the altered state.
In the case of a stack, the state is the stack content. The provided manipulation contexts will be class instances implementing context templates for stack manipulation like pop and push. We will represent a stack state as a List of items of type A:

List[A]

Consequently, if we choose to name our state context StateMonad, the pop and push operations can be gathered in a Stack scope definition like the following:

I believe the case pattern matching in the pop method body, to be self explanatory.
Chaining the state modifications, then, can be achieved using both definitions of map and flatMap. The application of the map method is helpful in transforming the result embedded into the context, producing a new state Monad taking into account the expected transformation:

The map function produces a resulting new container instance in charge of applying the new state transformation on the transformed result from the original container instance. The typed self annotationowner, helps in referencing the original container from the apply method body of the new anonymous StateMonad instance:

owner =>

How do we extract the result from the previous container ? Again, applying the previous container itself:

val (a, y) = owner(state)

The result of the new anonymous StateMonad container will be

(f(a), y)

The body of the apply method in the container of the StateMonad instance resulting from the flatMap application will lead to

the application of the previous container (so to extract the previous result and state),

then the application of the transformation function to the result

and finally the application of the new StateMonad instance f(a) to the y intermediate state.

We have chained the previous container state change to the state change expected after the f function application. Whole this chaining is itself transparently hosted by a containing monad.
The complete stack example can be reproduced:

Wednesday, January 11, 2012

I wanted to share today the implementation of one of my exploration of the SICP book, and get your feedback on the solution I came with.

While reading the book, I take the time to challenge both the exercises and code samples in Scheme - although I do regret not knowing it very well - and in Clojure. Somehow trying to reproduce each line of code in both languages can explain why it takes so much time :).

Part two of the bookis dedicated to the exploration of data abstraction and more specifically, section 2.4 leads us slowly to a data directed programming example. Starting from the canonical example of operations on complex numbers Abelson and Sussman, creates a layered complex-arithmetic system, starting at the bottom from primitive list structures, implementing complex numbers, up to program functions that use selector and constructor methods to manipulate complex numbers.

The top higher order operations are not supposed to be aware of the lower details of the implementation as we should use as many representations of complex numbers as we would like to, in the bottom layer.

Complex numbers can be represented by pairs describing rectangular or polar coordinate. Each of these representation suits more to specific operations. Adding two complex numbers is easier using a rectangular representation, while polar representation seems more adapted to multiplication for example. At any step we should be able to create new complex number instances from polar or rectangular data, which we can express as:

(make-from-real-imag 1 1)

(make-from-mag-ang 1 1)

In the first line we create a complex number for rectangular coordinates.The idoms real and imag respectively stand for real (x) and imaginay (y) components

In line two we create a complex number for polar coordinates. mag and ang respectively stand for magnitude(r) and angle (theta) components

At any step we should be able to select (x,y) components in rectangular representation and (r, theta) components in polar representation. What we wish to express is something like the following:

(real-part z1)
(imag-part z1)
(magnitude z1)
(angle z1)

All these constructors n' selectors should totally be unaware of the underlying representations.

But in addition, to conform to the SICP state of mind, the previous make-from-real-imag and make-from-mag-ang constructors should be adapted in order to choose the most suitable implementation. There, serious things begin.

In order to both explore SICP book while playing with Clojure idioms, I voluntarily chose to use protocols. In a certain way, like Abelson and Sussman (see section 2.4.1), I do not want my higher order operations to depend on specific implementations because there can be many of them, but I want my implementations to be contracted by my higher order module operations.

What I am expecting looks like a basic Inversion Of Control (IOC is a much higher concept than dependency injection, so check you do not confuse both ideas). Quoting Michael Fogus and Chris Houser's Joy Of Clojure: "protocols are set of function signatures, each with at least one parameter, that are given a collective name.

To digg a little deeper into protocols, check also Stuart Halloway and Aaron Bedra's book Programming Clojure and Stuart Halloway presentation over here.

To be perfectly honest I am not at ease with protocols as they do remind me to much of OOP in a Functional Programming context. That is debatable I think.
Both sets of constructors and selectors functions are perfect candidates for protocols.

that would make use of the make-from-real-imag and the make-from-mag-ang functions , I would like to be able to choose a suitable data representation on each .

Wait a minute, I wanted to invert the control, so naturally at the time of defining both the functions, in the same module as the protocols, I have no idea about which representation could be used. In order to get out of this trap enter multi-methods. The trick I used is not - I think - so different from Abelson and Sussman data directed programming.

Their proposal aims to update or get data in a two dimension table, one axis being the "type" of the representation and the other axis being a symbol defining the operation expected.

So each table cell hosts a reference towards the function, matching both a type and its operation. Implementing the protocols provides me with a dispatch on type. All I need is to identify my functions. Each function can be a multi-method declaration in the same module as the complex numbers protocols definition:

where in white color, rectangular-datum and polar-datum reference instances of the ComplexConstructor protocol, each of one being respectively invoked thruough the def-methodsmake-from-real-imag and make-from-mag-ang.

The dispatch functions used in the sicp.complex module being very simple our binding remains too very simple. We could maybe try to experiment on and make evolve the dispatching function to extend the power and flexibility of the binding in the sicp.test.complex-spec module.

Tests green.

So we have been able to define protocols, then we used elements from that protocols from higher order function and multi-methods, imposing our control to upcoming implementations, and finally we have been able to bind our higher order functions to the expected implementations. No that bad for a simple morning. Do not hesitate to come back with comments.

Monday, January 9, 2012

I would like to thank last year readers as I did not expect so much people visiting the blog.

I hope the readings provided you with helpful information, and thanks again to the persons who kindly commented providing each time precious feedback information.

In addition to health and good fortune, I wish you to explore all the possible domains you would like to explore and learn as many things as possible (even Spring if you truthfully want to explore it :)).

Among all resolutions I promised myself to find a day job allowing for more regular practice of Scala and/or Clojure in order to forge the necessary base of a practice aiming to sustain my night rambling in functional programming, type theory, category.

The incoming year would be a nice year if these ramblings also drove me to explore a NoSQL database and read both Thompson's Type Theory and Pierce TAPL. So what are your programming wishes ? :)

For this first blog of the year I would like to share my modest ramblings on "contextual" programming in Scala, experiments started as a continuation of my reading of Learn You a Haskell. This experience naturally lead me to Martin Odersky's article Generics of Higher Kind and Scalable Component Abstractions, then to Joshua Suereth writing Scala in Depth, specifically chapter 11 (Joshua if you ever read this modest blog entry, thank you for providing us with such an enlightened book).

My first purpose is to mimic, in many different upcoming proof of concept, the Haskell way of separating the pure from the impure, aka separating the referential transparency of functions from side effects. Yes, we are talking about Monads.

It took me time to start dealing with Monads because of all the negative fuss around the Category Theory world. Like many of you, my mathematical background is far away. I belong to that part of people who thinks that it will be a necessary move to awaken this part of my knowledge in order to leverage it up to Category Theory understanding from the mathematical point of view, in the mean time as trying some things from the ground, as to say from nearly real world examples.

I am also convinced that part of defiance to the monadic world comes from the name of the concept itself. Trying to work on a canonical example I realized that after all we are dealing with Design Patterns just named after a mathematical domain. Quoting the gang of four Monads should only be "recurring solutions to common problems in software design".
Naturally people will be less frightened by the name "Decorator Pattern" than by the expression "Monad Pattern". "Decorator" remains a familiar term, a kind of more everyday life expression while the word Monad looks like magic mysterious formula.

Let's (nearly) forget about the name and focus on the today's canonical example. Acting using Habelson and Sussman wishful thinking we will not use tests today, just exploration, based on correct compilation of our code until we run a sample.
The canonical example is a simple Logger.
Many may have written about that in Scala, and I guess for the impatient ones you should jump to Tony Morris' blog here to get a full brilliant implementation.
For the others, let's work on the current exploration and why not express critics in the comments so to improve the implementation. The following example is an exploration and surely can be improved. So do comment and do feedback if you feel the urge to. We may travel together to this new land together :) .

We would like to execute a pure function while logging information about different steps of the process. Logging can become a very serious side effect source as you might want to log to external systems, flush information into files etc. For the sake of simplicity just imagine we would append the logs. For example multiplying 3 and 5 while logging could produce the following result:

[15, Got number 3 Got number 5 ]

where we clearly have traced the fact that 3 was created fist and then 5 . Providing 3 and 5 to the multiplication operator will always provide the same result, but here we have been able to trace the input processes at the origin of the values.

Our approach lays onto the use of a context holding both the processed value and the cumulative logs. One must be able to extract on will the embedded value while continuing on appending the log information. This is when using higher order function handling functions, as first class citizens saves us once again.

How ?
Let's imagine we have can handle such a container of type M, container holding both a value of type T and a log, a kind of

Container[T, M[_]]

A first step in the pattern implementation implies that we can be able to split the responsibilities between creating container instances and applying the pure function alone to the embedded value. The following map method on our container can grant this separation:

def map[U](f: T => U): M[U];

nearly the same map as in the List class definition for example or every Traversable implementation in scala. Having applied internally the f pure function to the method execution, , the container handler must be able to produce a new container embedding a new value of type U. At this step the container should implement a set of behaviors in concordance to what is called a Functor. As we are growing our knowledge step by step, assume we conform to the expected behavior and try to define the following:

Implicitly, I suppose that my container of type M will also implement the functor trait. Joshua Suereth provides more elegant ways to separate the container implementation from the pattern traits definition itself. For now, we just try to progress in our little exploration.

That's nice, but our multiplication hosts two operands and we are just mapping on embedded value to another !
As a tribute to Learn You a Haskell(LYAH), and supposedly naming our Logging context StringWriter, imagine having at our hand the following value factory:

def logNumber(x: Int) = StringWriter(x, "Got number " + x + " ")

We could execute our multiplication :

logNumber(5).map(y => x * y))

Where the hell would come the y from ?
Could we simply imagine producing that :

logNumber(3).map(x => logNumber(5).map(y => x * y))

Well... almost because the type of the produced result would be StringWriter[StringWriter[Int]] and not StringWriter[Int]. This flattening operation is then provide by the Monad Pattern per se (hurray !) as a flatten operation :

def flatten[T](m: M[M[T]]): M[T]

Just a minute ! This is it? No, as you can imagine , Monads should obey rules as their cousin the Functors, but from now I just propose you to use the following method

and so on and so on etc... producing at each flatMap function invocation a new StringWriter[Int] typedcontainer instance.

The beauty from my point of view, comes from the separation of the extraction/insertion of the value into the working context from the function application (z => x * y * z), cleanly isolated.
The complete definition I provided was

I voluntarily split the flatten function into a second trait as this generic operation is bound only to the type of the container. I expect the compiler to check at compile time that implicit object helpers are available in the scope. (I don't like implicits specifically but they can become very handy if not over used)
As I expect my StringWriter to become some day reusable, I reused one of my existing traits, theMonoidto define my writer trait as in LYAH. Monoid not being the purpose of today blog, just keep in mind the LYAH definition remembering that these are simple structure allowing to isolate in handy way the manipulation of elements of the same type versus a simple function operation:

The StringWriter class will implicitly use the String concatenating monoid.

The StringWriter class is itself both a Monad and a Functor

The specific Monad helper is defined in the scope of the companion object as an implicit internal object writerMonadHelper for the compiler to implicitly bind it to the flatMap method of the Monad trait

This is when we can play, and where the magic of the Scala compiler plays a nice part.

We can of course reproduce the LYAH example using the chained invocation of flatMap/map methods, but as we implement both the methods flatMap and map, the Scala environment permits us to use for comprehension, in extenso:

Not fond of the imperative style of the for comprehension, I concede that their use does clarify the code. Unlike Tony Morris I do not have provided more detailed example of usage but promise to work on them as soon as I will have ended writing the blog.

Thank you for following me into this first exploration and do not hesitate to produce helping comments.