Wednesday, June 10, 2009

How to implement Accumulate Functions

Posted by
Edson Tirelli

Developing solutions for problems is not an easy task, specially when the tools we got to solve a particular problem are good enough for that 80% part of the task, but fail to enable us to solve that remaining 20%.

Drools is built from scratch with extensibility in mind and this is one of the distinguishing characteristics from it to other products in the market. From support to higher level abstractions, like Domain Specific Languages and Decision Tables, to engine extensions like pluggable evaluators and functions, Drools enables the technical people to make business people feel more comfortable while writing rules, using a known vocabulary, constraints and abstractions.

In my talk during the October Rules Fest I will dive into all the ways in that Drools can be extended to improve the development of domain specific solutions. For now, I just want to throw some bones while saving the meat for the conference.

In this spirit I would like to show you one of the easiest ways to extend the engine: Accumulate Functions.

It is quite common the need for rules to execute operations on sets of data. The operations range from actual set operations, to calculation/scoring, to whatever you need that is executed on a set of facts. Drools accumulate CE supports inline custom code in its init/action/reverse/result blocks of code, but that is not declarative, nor is reusable among multiple rules and it is good only for a one-time need.

Accumulate Functions to the rescue: implementing an accumulate function is a 20 minutes task. It makes all your rules easier to write, read and maintain. It is unit test friendly and Drools Eclipse plugin understands and validates your rules with accumulate functions.

Lets look at an example scenario so that everyone understands what accumulate functions are. Imagine that you have a rule that needs to calculate the sum of the price of all products. Without accumulate functions, the rule would look like:

As you can see, even for a very simple case it is quite verbose. More than that, if another rule needs to calculate the sum of something else, you need to rewrite all the code, what makes maintenance very difficult.With Accumulate Functions, things get much nicer:

Now the intent of the rule is explicit. It is much shorter and less error prone. Drools ships with several accumulate functions that are available out of the box, like sum, average, min, max, count, collectSet and collectList.

Now imagine that your application needs a set operation. How hard is it to implement it as an accumulate function? As I mentioned before, so hard that you can have it done in 20 minutes and then re-use it everywhere. Imagine complex financial interest calculations, or streaming processing functions, or monitoring correlations... all these can be implemented as an accumulate function and re-used by every rules author in your company.

For this example here, I will implement something simple, but very unusual with the goal of, hopefully, opening the minds of the readers. Imagine there is a store business that has a marketing promotion that says: "if the customer order is above $100, the customer is entitled to a gift that is randomly chosen among a list of available gifts". How would you implement that? Exactly:The randomSelect Accumulate Function

Drools is designed to enable sharing of the KnowledgeBase among multiple sessions. This way, an accumulate function can not contain any attribute/data that is specific to a single session or rule. Any data specific to a rule is stored in a "context" object. The context object can be an instance of any class. It is instantiated by the createContext() method. So, lets say we have a RandomSelectData class that will store all the context data for us. The method will look like:

As we can see from the method signature, our data class needs to be Serializable. So, lets create a private static inner class to use as data store:

/** * A private static class to hold all the rule specific data for the random select function */ private static class RandomSelectData implements Serializable { // the list of objects to chose from public List<Object> list = new ArrayList<Object>(); // a random number generator public transient Random random = new Random(System.currentTimeMillis()); }

Since the class is private we will just keep the attributes public for ease of use.Now we need to implement all other methods from the AccumulateFunction interface. The first method is the init() method, that is called every time a new calculation is started. In this case, we will just clear the list of available objects:

The second method is the accumulate() method that is called every time a new object is added to the calculation process. In this case, all we want to do is add the object to the list of available objects:

The third method is the reverse() method that is called every time an object is removed from the calculation, i.e., should no longer be used to achieve the results. This method is optional, but implementing it improves the performance of the function as not only additions are incrementally calculated, but also removals.

The fourth method tells the engine if your functions supports (implements) the reverse method above. Since we did implemented it, we will just return true.

public boolean supportsReverse() { return true; }

And finally, the fifth method is getResult(), that must return the result of the calculation for the current set of data. In our case, we will just randomly pick one element from the available list of elements:

An AccumulateFunction is Externalizable, so we must also implement the read/writeExternal() methods. In most cases, this methods will be empty, but if the function contains any attribute that are shared among sessions, they should be serialized here.

And that is it! Our function is implemented. I will not show the unit test here, as this post is already huge, but you can see that since the class is completely self-contained, implementing the test for the methods is a piece of cake.

The last step is to make the function available to the rules engine. Again, there are several ways of doing that. My preferred way is to create a configuration file in the classpath with the following path and name:

META-INF/drools.packagebuilder.conf

Using the configuration file allows the eclipse plugin to discover and support the function in the rules. The file is a regular property file, and to configure the function you need to use the following format:

Other options to configure accumulate functions are through the API, using the KnowledgeBuilderConfiguration class or setting a system property, but in these cases, the Eclipse plugin will not automatically understand your accumulate function.