My last blog entry explained how you can combine Kafka and The RuleEngine and what are the advantages. Here comes a step-by-step example. The data, shell script and libraries required are available at my Github repository.

You will need to have at least one Kafka broker running. If you run Kafka on a different machine, then apply the settings in the properties file.

The program reads data from a kafka topic named "travel_discount". The data is processed using the ruleengine and then the data is output to a topic named "travel_discount_ruleengine".

The business logic (rules) will do the following:

check if the person travelling is below the age of 3. If so a 100% discount is applied.

check if the person is 50 or older and if so apply a 25% discount

Data from the "destination_region" field is translated to the appropriate three letter region code (lookup)

Ok. We got the data in a Kafka topic. Let's now process it. But lets first have a quick look at the Business Rules Maintenance tool - the web UI for designing the business rules/logic.

The complete logic is encapsulated in a project. Projects contain groups of rules, subgroups, rules and actions.

Here is a screenprint of the content of one rulegroup:

It shows the project and selected rulegroup on top. Rules that build a certain logic can be collected in a rulegroup. Rulegroups can have subgroups to allow for "and" and "or" conditions between subgroups. There are two rules: one to check the age (50+) and the other checks the destination region (Europe). The rules are connector by an "and". So if both rules pass, then the rulegroup passes. If any or both of the two rules fail, then the rulegroup fails.

At the bottom there are three actions. The first sets the discount percentage to 25% if the rulegroup passes and the next sets it to 0% if the rulegroup fails. The last action calculates the actual discount price based on the regular price and the discount percentage.

The content of the whole project looks like shown below. It contains three rulegroups. These groups contain the rules/logic to calculate the discount percentages and adjust the region.

From the web UI the project can be exported into a single zip file containing all the logic.

Now run the shell script "run_kafka_ruleengine.sh". It will connect to the Kafka Topic, run all rules against the data, calculate the discount percentage and map the correct region code for each individual message.

Compared to the original records in the Kafka input topic there are fields added:

the discount_price was set by the actions

the field destination_region_short was mapped from the destination_region

the discount_price was calculated from the original price and the discount percentage.

This is what the result looks like in the output Kafka topic. The first record has no discount because the age is above 78 but the destination region is Asis - no discount according to the rules.Whereas the second record gets the 25% price reduction because the age is greater than 78 and the destination region is Europe.

So now I have shown you a generic way of processing data from a Kafka topic with the ruleengine. The good thing is, you can process any kafka topic (in flat Json format) with it. Just orchestrate your business logic in the Business Rules Maintenance Tool according to your requirements and the given data, export the project and then run the program. The results will be in the configured output topic.

You can also specify a second output topic for logging the detailed results of the execution of the rules (set it in the properties file). The data and the results including the relevant message why the rule failed or passed will be in the topic. Watch out that you will get one output message for each rule and input message. If you have 1000 messages and 10 rules, you will get 10000 messages in the detailed results topic.

Hope you enjoyed it. I will work on more details to also allow processing CSV and Avro format soon. Another idea is to route data to a topic based on the results of the logic (passed or failed).

And finally I will try to integrate the ruleengine also in Kafka Connect, so that you can apply complex business logic while running your Kafka Connect ingestion from a file or database or other sources. That would give us an enormous amount of flexibility to process data: filter, check, route and massage data . All without writing code and the logic is kept centrally in one place for maintenance and review.

Check my other repositories on Github for the ruleengine and the Business Rules Maintenance tool.

Kafka is a great platform for storing and handling realtime data. Once you have your data in Kafka there is a multitude of ways how to process and work with it.

Now we all know, that the data is often in a shape that it needs further massaging for other downstream processes. If data is adjusted (quality!) or completed early in the flow, then multiple downstream processes can benefit from it. This can be done e.g. using KSQL or by using the streams API - just to name these two.

But hardcoding business or processing logic either in (K)SQL statements or in your code has some disadvantages:

there is no central place where somebody (developers or business people) can look it up or do central maintenance: this often ends up in duplication of logic.

it is difficult to understand and overview the business logic, if it is spread all over the place: that is a time consuming and quality issue.

the logic is maintained by a KSQL or coding specialist - by IT: IT does all the work...

business users have no clue if you try to explain the logic in your code, e.g. to find errors or discuss changes: it is intransparent to the business

there is no proper separation of responsibilities between the business users and IT: the business users - the experts for their domain - should really be the ones to maintain the business logic

code in any form that contains business logic is difficult to maintain. It clutters the code and thus slows down maintenance and development in general. Removing the logic from the code will make the process more agile and clear code is a definite plus for quality.

I have written a ruleengine in Java. It is already in production at least in two bigger companies that I know of. There is also a plugin available for the Pentaho PDI (Data Integration/ETL) tool. And there is a processor available for Apache Nifi.

But the ruleengine is also a good fit for Kafka. What does the ruleengine do? It allows to execute logic against data. Based on the results of the execution one can decide how to further process the data - data either passes or fails the applied logic. The ruleengine gives you tremendous flexibility to construct (complex) logic.

There is a web interface available to do the central maintenance of the logic. The logic is maintained in projects that collect the logic for a certain purpose. It comes with features such as user access, search, history of changes, copy functions and more. Groups of rules - building a certain logic - have a validity date and can have dependencies on other rule groups.

The ruleengine addresses all the disadvantages listed above: a plus for centralizing rule and business logic, separating IT and business responsibilites and making the logic more transparent.

Now how can this be used in Kafka? I am showing two scenarios here which the ruleengine can be used for. There are others, but I focus this time on these two:

Adjust or modify the data from a Kafka source topic: data is corrected or supplemented and output to a target topic

Adjust or modify the data from a Kafka source topic and route the failed and passed data to different target topics: the data is devided into "failed" and "passed" or "correct" and "incorrect"

In both cases - and I think it is a very cool feature - there is also the possibility to output the detailed results of the execution of the business logic to another topic. The detailed results show the results of each rule that ran and why it failed. It is a sort of logging of the quality of the data.

Scenario 1:Data is read from a Kafka topic, it runs through the ruleengine and it applies the logic to the data. The data is then paased onwards to the target topic. Data is checked and subsequently can be modified based on the results of the check. Also new data (fields) can be created to supplement the data: e.g. based on the content of one field a value for a second field is set. One can concat field values, do calculations, lookup from a map and much more.

As indicated above, the detailed results of running the rules against the data may be output to a separate topic and thus you have an explanation why certain updates where made and others not. It allows to detect errors in the data, with explanations why a record passed or failed. A trail of the decisions if you want to.

Scenario 2:Similar to the first scenario but here the passed and failed data (records) are separated into different topics, based on the results of running the rules. What is passed or failed can be based on very complex logic.

And here is a screenshot of the web based tool for the central maintenance of the business logic. It is a (business) user friendly way to construct logic for multiple purposes. Actually it is of course not limited to Kafka or the other tools I mentioned. It can be used in any Java related project, be it web based or an application.

Massaging data from incomplete sources or from systems with data quality issues is a task we all know. Often, the sources can not be changed. Applying logic to standardize, enhance or complete data and doing it in a centralzed approach enhances quality and transparency. It also relieves our code or processes from clutter and thus makes it more agile for changes.

This all is freely available under the Apache license: the ruleengine and the web tool. Also the Pentaho ETL plugin and the Nifi processor.

If this al makes sense to you then stay online for the second part, where I will show some samples and more details of how to use the ruleengine with kafka and how things come together.