Gabble - Some stories are great fun. Where it comes to Polity, it's
uncomfortably close to the great Iain Banks' Culture. Neal Asher
should've tried something original and not rip off Banks. Still, worth reading

Sunday, June 16, 2013

ForkJoin has been available to us Java folks since Java 7 and if you consider the JSR 166 packages, then even longer. I found the time to explore this API only recently.

Having written about Phasers a couple of years ago and realizing that I'd still not found a use for it in production, I was not too eager to explore another "thread-pool" (just kidding - where would we be today without j.u.c classes).

Mind you, the JavaDocs for ForkJoin and related classes are quite elaborate and expect you to set aside some time to go through it in detail... which you can probably postpone if you read this post.

ForkJoin is recommended as a thread-pool if your main task has to divide itself into a lot of smaller tasks, usually recursively. Usually in such scenarios the number of children tasks are not known upfront. Technically, the work-stealing aspect of ForkJoin and the claim that it scales well when faced with a large number of tasks makes it a good fit for such workloads.

The RecursiveAction is fairly simple. It embodies the logic to work on the root of your computation problem. It also splits its work into smaller sub-tasks recursively. Very similar to a binary search but searching each half will be spawned off as a sub-task recursively. The computations for this tree of tasks completes when the leaf nodes are processed.

I can think of a simplified but realistic use case where you'd want to do a mix of sync and async, parallel sub-tasks:

Receive purchase order request from client

Convert request payload (JSON, XML) to Java object

Make synchronous authorization check with LDAP

Make some async requests

Make async request to inventory service to check and reserve stock

Make async request to shipment service and find closest free shipment date to requested destination

(Ok, this is turning out to be a longer post than I had expected. Not a quick exploration after all)

Going back to our example, incorporating the 3 asynchronous operations in step 4 might constitute as sub-tasks of step 4. Although in reality, the JavaDoc for ForkJoinPool says that the ForkJoinTasks should ideally not block on external resources like I/O. This is called "unmanaged synchronization" as it involves waiting for resources outside the fork-join system. For that the ManagedBlocker is recommended, although to me it looks like it was added only as an after thought.

So, sadly the above seemingly real-world example might not be a good case for ForkJoin. Which means the ideal use case is something that involves recursively decomposing and pure computation - a.k.a in-memory map-reduce.

So, we make our way back to the overly geeky sort-merge example used in the JavaDocs. In my case, I decided to dispense with the sorting part and simplified the problem even further - purely for illustration purposes.

In my examples, I use ForkJoin to recursively split and list numbers from "start" to "end". At each step if the start to end range is larger than 5 it splits that range into 2 equal halves and forks them off as sub-tasks. Otherwise that task is the leaf level and just adds the numbers in a for-loop from start to end into a queue that is passed around to all tasks.

The first test is a naive implementation of RecursiveAction where it just keeps forking away sub-tasks till the leaf levels. So, the thread that created the root level task attempts to wait for the whole tree of computations to complete. Since each level that spawns the next level of 2 sub-tasks asynchronously and does not wait ("fork()") for the children to complete, the whole tree completes asynchronously. This way the caller thread in the "main()" method comes out of "invoke()" prematurely. As a result this recursive task is almost what we wanted but not entirely.

Since the naive approach of forking did not suffice, we make a small change by making the parent task wait for its children to complete by calling the "join()" method on its children.

An even better approach is to allow each task to fork away sub-tasks and not have to "join()" on them. Because waiting only means that a thread is not in idle-wait state where it should've been "stealing" work from other threads and making progress. What we need is for a way to let the sub-tasks notify the parent task that it has completed. We can let this bubble up all the way and register a listener at the root.

For the listener we will even use the fancy Lamda feature and something from the new java.util.function package to register a listener. In fact completion listeners can be registered at any level - for example to print to the console that certain % of the tree is complete and so on. There are 2 versions of this - one that sub-classes CountedCompleter to simply let the completions bubble up and then eventually notifies the blocked calling thread in "main()".

The more sophisticated implementation using Lambdas.

Here's an even more sophisticated example that wraps the fork-join pool as a CompletionService and submits 5 tasks and then picks up the results as they complete.

There are a few other things worth reading up about. I skipped the use of RecursiveTask. I also skipped mentioning the differentmethods to "steal" tasks. The RecursiveAction JavaDocs even has an example that keeps track of spawned sub-tasks and then follows that chain to try and complete them if another thread has not already done it. The reason I did not venture into this bit is because I'm not sure as of now whether it is worth doing this instead of letting the FJ framework do the scheduling internally.

Without studying the source code I can only guess that manually keeping track of spawned sub-tasks and then trying and unforking them would be to help complete that sub-tree of tasks quickly. If we were to use a simple queue to just dump sub-tasks like in the ThreadPoolExecutor, then they would get mixed up with other sub-tasks from other threads in the pool. This means that the current sub-tree may not complete on time because the dependent sub-tasks are somewhere at the back of the queue. This is where FJ shines in addition to it scalability.

One thing we do lose with FJ is that tasks do not have priorities unlike using a PriorityBlockingQueue with TPE so you might end up using multiple FJ pools.

Saturday, June 08, 2013

Nice, secluded, close to I-280. Trails always in the shade, good even in summer.They even have camp sites and picnic benches.

$6 entrance fee. No maps, trail directions are a little confusing. Especially when you are coming back to the parking area there are many roads and unmarked trails branching off. Funnily, we had trouble finding our parking lot at the end. We did not have such problems while hiking though. The single used map they did have was lacking in detail about the parking areas.

Wednesday, June 05, 2013

In part 1 we explored some simple ways to fake a DSL using JSON and YAML. In part 2 we will expend a little more effort to build a more powerful mini-DSL.

By "powerful", I mean a DSL that not only supports the language structure we like but also allows for more complex constructs like method calls and expressions.

For this task, I chose Groovy, which is a really nice, Ruby-like language that is tightly integrated with Java. Later, I will also explore Ruby using JRuby just to show how similar Groovy and Ruby are in many aspects.

Groovy runs on the JVM and so it integrates seamlessly with Java. Even IntelliJ comes with native support for Groovy.Groovy can be used like a plain scripted, interpreted language or even compiled. It is a little slower than Java but offers a lot of powerful metaprogramming features to balance it out. I will not go into the details but suffice to say that it is particularly useful for building (surprise!) DSLs. For full blown examples see Cloudify, Gradle, this or this.

At first, I chose the simple approach of just using Groovy as a scripting language to specify the stocks. It didn't really look like a DSL because it is not.

Next, with just a tiny bit of setup where I make some ready made expressions and methods available to the script, I was able to specify my stocks in a nicer and more powerful format. The cool "with" syntax in Groovy also helped.

To demonstrate that I could also write executable code, I had the script print the date and name of the file in the first line.

I've just scratched the Groovy surface because I could've spent a lot more time overloading numeric types like the goodFor property where I could've used "30.days" but I didn't. There are obviously holes in my implementation but you cannot dismiss the speed at which you can get at least this much functionality. Perhaps with a little more Groovy proficiency and time, I could've done better.

Now on to Ruby. To show how similar Groovy and Ruby are, I used JRuby to build this DSL:

I also have a simpler, raw JRuby script but then it's not a DSL but just a script. Again note the similarities with Groovy.

Getting JRuby to integrate neatly with my Stocks beans was a little challenging. It required a slightly different setup. I also ran into some issues which are not documented well in JRuby. So, I asked for help on the mailing list but haven't heard from them. This coupled with the fact that JRuby startup takes several seconds made testing and experimenting a little frustrating.

Ruby in itself is used in a lot of places to build DSLs like Chef, RSpec and a lot of other projects.

One option I obviously overlooked is Scala. Scala is well known (among the Scala users) for its "apparently" powerful language features. However, in my opinion Scala's complex, sometimes bizarre, dense and obtuse syntax might keep it out of reach of average engineers like myself. I've shared this opinion earlier too.

So, that's it for now. I may extend this preliminary work on Diesel later when I have the time. Or better yet, you can fork it and share it.

I've been meaning to write about my series of little experiments building a mini-DSL on the JVM.

I've worked with and written about expression evaluators before.
However, there've been many instances where I've felt the need to
quickly build a part pseudo-language and part configuration script.

Also, I did not have the time, resources nor the justification to
build a full fledged grammar/parser. There were times where I did morph
an existing ANTLR grammar for something else, but in the end I realized
that a simple hand built tokenizer and AST would've done the trick. (Note to self: try Parboiled)

So, I was curious to see what options I had to build an "almost
language", quickly. "Quickly" being the operative word. "Dirty" being
the unsaid word.

I'm not going to go into the details of what a
DSL is, or spend time debating over internal or external DSLs etc. Enough
material is available on the internet and some books too.

If you want to learn more about Java API based DSLs - more commonly
known as a Fluent DSL, there are several good places to start learning by example - Jooq, Google Guava ComparisonChain etc. I've built Fluent DSLs several times and it is a cleaner and better way to implement the Builder design pattern - like Google Protocol Buffers' Builder.

This time though, I wanted to evaluate options to build something
that could read/parse/load files that look like structured, readable
English. Some common cases where you'd need this:

Configuration
scripts are prime candidates for this. In the early-mid 2000's, XML
would've been the way to go; with XPath and XSDs/DTDs; built in support
in the JDK and support for hierarchical structures

Glue to stitch together different modules in a program -
something that usually involves some configuration code and basic
expressions

Actual mini-languages that allow business
analysts or IT/DevOps people to plug in some logic without writing
complex Java code. Also without having to get developers and a full build cycle involved

So, let's cut to the chase and see what I came up with.

For
my tests, I wanted to accomplish something very simple. I wanted a way
to describe stock buying or selling instruction to my stock broker. It's
a completely contrived example of course but it seemed valid for this
test. I wanted a way to specify which stock to buy or sell, at what price,
for how long the instruction is valid and some other little things.

Since I brought up XML, I'll talk about the simplest approach first - XML's slightly less ugly cousin JSON:

Using JSON and calling it a DSL is not only dumb but also cheating. But there are obviously a lot of places where this would suffice. Unlike XML, this is less verbose, but it still needs the user to know where to put double quotes, braces, square brackets and all this without a schema to validate the file.

It does have its advantages. All I had to do was create a JavaBean with all the possible combinations my "stock specification" could have and then use Google Gson to do the serialization/deserialization to/from JSON.

This is what the JavaBeans look like:

Assuming that this was enough, all I had to do was read the JSON into the Stocks bean and related inner classes and start using it.

The keen reader will notice that the Order class has some properties - limit, stopLimit, market which are really mutually exclusive. JSON does not prevent me from providing values to all 3 which would be wrong. I could've spent some more time fleshing those properties into an enum or a complex string but I'll leave that as an exercise for later (or the reader).

The full source along with scripts can be found on my GitHub Diesel repo for your reference.

So, I decided that JSON wouldn't cut it. A while ago I had played with YAML briefly, which is JSON's distant cousin. Actually, the latest YAML spec makes it JSON's parent (how convenient).

YAML is like JSON but without the frivolous double quotes and braces. Compare this YAML file with the previous JSON file, it speaks for itself:

It is without doubt, cleaner and more usable than JSON. You use SnakeYaml to do automatic ser/deser into the Stocks JavaBean like Gson.

Also, like Gson, if you don't have a bean or your configuration makes it difficult to map directly to a bean, you can just read it free form as a map of maps. This would be a poor man's AST. Gson's free form structure is actually better that way, in that it almost looks like XML Nodes.

If YAML is good enough for you, you can stop reading right here. In fact, YAML is also my favorite for simple configuration files that involve lists and hierarchies. This is miles ahead of and better than the flat format used in Java Properties.

But, defining YAML still has the same issues that JSON had with regards to semantic validations like limit, market etc. However this is really an issue with the way I've created the beans. Think of the YAML file as a free form AST. You'd have to write your semantic and syntactic validations in your Java code by walking this AST. I prefer to do this in Java because it's easier to have all the validations and exception messages in one file than split it across multiple ANTLR and Java files.

In part 2, we will explore other framework and language choices. Until next time, take care!
Ashwin.