The ruleengine contains a suite of so called "checks". These are kind of modules that can be used to evaluate a condition or state of the data.

A check is used by one business rule. The combination of multiple rules - and thus multiple checks - allows you to create complex logic and flexible ways of checking your data.

You can easily extend the functionality and implement your own checks, if you want to.

Here is a list of these checks. I only list the "positive" checks. Most checks also have a "negative" counterpart, such as CheckIsNotEqual, CheckIsNotBetween, etc.

- CheckContains: Checks if a String is contained in another String- CheckIsEqual: Checks if one String is equal to another String- CheckIsEmpty: Checks if a String is empty (zero length)- CheckStartsWith: Checks if a String starts with another String- CheckEndsWith: Checks if a String ends with another String - CheckIsBetween: Checks if a number is between two given numbers- CheckIsEven: Checks if a number is an even number- CheckIsGreater: Checks if a number or a date is greater than another one. In case of a String checks if the length of the String is greater than the given length- CheckIsGreaterOrEqual: Checks if a number or a date is greater or equal to another one. In case of a String checks if the length of the String is greater or equal than the given length- CheckIsSmaller: Checks if a number or a date is smaller than another one. In case of a String checks if the length of the String is smaller than the given length- CheckIsSmallerOrEqual: Checks if a number or a date is smaller or equal to another one. In case of a String checks if the length of the String is smaller or equal than the given length- CheckIsInList: Checks if a String is contained in a list of values, separated by commas- CheckListHasMember: Checks if a list of values (comma separated) contains a given String- CheckIsLowercase: Checks if a String is all lowercase- CheckIsUppercase: Checks if a String is all uppercase- CheckIsNegativeNumber: Checks if a number is smaller than zero (0)- CheckIsNull: Checks is a given valie is Null- CheckIsNumeric: Checks if all characters of a given String are numbers- CheckIsPrime: Checks if a number is a prime number- CheckLength: Checks if the length (number of characters) of a String or number is equal to a given length- CheckMatches: Checks if a String matches a given regular expression pattern- CheckSoundsLike: Checks if the given String sounds like another String using the soundex algorithm- CheckDistanceIsEqual: Checks if the Levenshtein distance between two Strings is equal to a given number- CheckDistanceIsGreater: Checks if the Levenshtein distance between two Strings is greater than a given number- CheckDistanceIsGreaterOrEqual: Checks if the Levenshtein distance between two Strings is greater or equal to a given number- CheckDistanceIsSmaller: Checks if the Levenshtein distance between two Strings is smaller than a given number- CheckDistanceIsSmallerOrEqual: Checks if the Levenshtein distance between two Strings is smaller or equal to a given number

At times we need some test data to play with. Sometimes for a quick proof of concept, sometimes because we don't have access to production system data.

But generating "dummy" data is not always useful for testing. I have written - already a while ago - a data generator tool. It allows to generate mass data using word lists, using regular expressions or purely random.

Word lists are nice, because they provide real-life data. Word lists are simple files containing words/expressions. During data generation the expressions will randomly be pulled from the files. There are several files already available for usage and also in different languages.

Regular expressions are a bit particular in this case: usually you have data and test if it fits to a regular expression. With the data generator it is the opposite: You define a regular expression and the tool generates data that fits to it.

Another (big) advantage of the data generator is, that you can create date related data over several columns. So e.g. you have one column with the full date like e.g. 2015-04-11 and another column with the quarter. Now the quarter is not a random number but related to the full date mentioned before - quarter 2 in this example . Or another column carries the weekday which in this example would be Monday.

The data generator uses a xml file to define the columns of the generated data. Here is an example:

On top some reference fields are created. They can also reference each other. Like the id "quarter" reference the id "date1". And then further below the references are used to generate data. The fields of type "category" use word lists. Further below there is also a field of type "regex". This is a regular expression and the data generator will generate data according to it. And at the end there is a field of type "random" which generates purely random data.

Here is an example output according to the xml definition file above:

I loaded the generated data into OpenOffice Calc. You can see that column E of the spreadsheet contains the date, column F contains the year of the date, column G the quarter, column H the month and column I the day of month. Column J contains the regular expression data: a number of minimum length 4 and of maximum length 7 - according to the xml definition file. I believe you will get the rest yourself.

You can adjust the output of the data in several ways throught a properties file.

So if you need to generate mass data - e.g. 1 million rows - and the data has to have some meaning in real life, the data generator helps you to generate it. Or you could generate the dates for a date dimension in your database/DWH. You can then further process the data in Pentaho PDI or use it for other purposes.

The tool is written in Java and is availabe on github/uwegeercken like all of my tools.

In the last blog post I showed you how to write rules, so you can check your data.Today I show you that the plugin step for Pentaho PDI not only returns the statistical numbers of how many groups and rules failed, but you can also retrieve the full details of what happened when the ruleengine ran.

The plugin step has two output streams: one gives you only the statistics of how many groups and rules failed. You can use this to split your data into failed rows and passed rows and further process them. For each input row there will be one output row.

The second output stream (which is optional) will give you one output row for each input row and rule. So if you have 10000 rows and 10 rules, you will get 100000 output rows. Each row will show the name of the group, subgroup and rule and if the group failed, if the subgroup failed and if the rule failed. And it will give you a message (the message you defined when writing the rule) showing why the rule failed or passed. With these details you can debug your rule engine results easily. And of course you can also further process the detail rows in PDI.

I have added two more steps to the transformation we used the last time: one select step to reduce the number of fields (so it is easier to demonstrate to you) and a dummy step.

When you open the rule engine step, you can select the step where the rule results go to (Rule Result Step). And in the dropdown below, you can select which data to output to the step. You can pass all rows to the step or all rows where the group failed. Or also all rows where the group passed.

The screenshot above shows a preview of the detailed results from the rule engine as described above. On the very right there is a column showing the message that was produced by the rule engine, indicating why the rule passed or failed.

This is of course just a quick example for the purpose of demonstration. A typical use case would be to have a set of data, where you would expect to get no errors. But when errors occur, the rule engine details as described above give you a very good way to determine what happend.

Yesterday we saw the basics how to create a project with the Business Rules Maintenance Web Application and connect the resulting Zip file to a Pentaho PDI transformation.

Today I show you how to write two rules. The transformation starts with a CSV file input step. The file contains data of US airports. We will check two fields of the file which contain information about the airport type and the US State it is located in.

The final result will be a list of rows of airports that are of large size and located in the state of New York.

Below find the screenprints for a step-by-step walkthrough.

The project we created yesterday with the reference to the project file exported from the web application

Modified the transformation to read data from a CSV file, run it through the rule engine and then filter all rows where the number of failed rulegroups is zero - meaning that the rulegroup passed.

A preview on the data from the CSV file. We will write rules for Field_002 (the type of airport) and Field_009 (the US State)

Login to the Business Rules Maintenance Web Application

Click on "Project" to go to the project overview page

Select the project we created the last time - click on the project name

Click to add a rule group.

Enter a name and description for the group. Select a valid from and valid until date for the rule group. Then save it. Finally click the arrow to go back

The rule group we created. If necessary do maintenance of the group from here

Click on the rule group name

On top there is the project and rulegroup info displayed. Rule groups contain one or many subgroups, which group rules together. When a rulegroup is created, a first subgroup is created automatically.

Click to add the first rule

Enter a name and description for the rule. Next enter the exact name of the field (here: "Field_002" - the name of the field from the PDI transformation) and select the data type of the field. Next select which check you want to use (here: CheckIsEqual to check for equality). Next enter the value to check against (here: "large_airport") and select the data type. Next enter a message in case the rule fails and one in case it passes. Placeholders can be used - $0 is the value to check against you entered before ("large_airport") and $1 is the actual value from the CSV file in the transformation. Click on save and then go back.

Click to add another rule

Add another rule like before but for field "Field_009" containing information about the state the airport is located in. We want to check that the airport is in the state "US-NY" (New York). Click on save and go back.

Back to the overview the two rules are visible. The first checks the airport type and the second the state the airport is located in. The subgroup shows, that the rules are connected by an [and] condition. So both must be true for the rule group to pass.

Click on "Export" to export the project

Select the project and generate the Zip file

Select the folder and enter the filename and click on "Save". Use the same folder and filename as yesterday, so you won't need to change the transformation.

Go back to Pentaho PDI. Right-click the last step to get a preview of the data. The data will come from the CSV file, run through the rule engine and all rows that pass the rulegroup we defined (consisting of the two rules: airport type = large_airport AND state= US-NY) will go to the step labeled "Passed Rows".

This is the preview result. It shows all rows having an airport type = large_airport AND state = US-NY. Only 5 rows of about 45000 passed the test. The columns on the right are added to the stream by the rule engine and give you some statistics and results to filter on.

You can now easily add more rules - additional logic - without cluttering the transformation! You will have noticed that after writing the rules, we did not touch the transformation anymore. That is because the logic is external to the transformation.

There are a lot of checks predefined to check for smaller, larger, equality, matching strings, null or empty values, check is in list, is negativ and much more. The checks can be easily extended for new funtionality.

Next time I will show how to add actions. Actions allow you to modify data or add data, do calculations or many other things based on the results of a failed or passed rule group.

Below are some screenshots, showing how to create a project in the Business Rules Maintenance Web Application and use it in a Pentaho PDI transformation.

The project in this sample will not contain any rules. I will show in more detail how to do this a a later point in time.

when the web application is running, click on "Login"

enter your userid and password to authenticate

once logged in, click on "Project" in the menu on the left

click the plus sign to add a new project

enter the project details: name, description and group

the project was added - click the arrow to go back to the project overview

you may edit or delete the project or attach a Pentaho PDI transformation

click "Export" in the menu on the left to create a project zip file

select the project to export then click on "generate"

select a folder where to store the file, enter a filename and click on "Save"

in Pentaho PDI create a new transformation and drap the "Rule Engine" step onto the canvas

add an input and output step and double-click the "Rule Engine" step. Enter the path and name of the zip file exported, select the main output step (next step) and click "Ok"

Now you should be able to run the transformation. As there are no rules defined in the project not much will happen in regards to the rule engine step. The data will just flow through it.In another post I will show you how to create rules and combine rules with complex logic - but in an easy way.Remember: no coding is required to define rules - it's done all through the web interface.