2016/01/26

Advanced queries in jBPM 6.4

While working with BPM, access to data that are being processed by the engine are very important. In many cases users would like to have options to easily and efficiently search for different data:

process instances started by...

process instances not completed until...

tasks assigned to ... for a given project

tasks not started for a given amount of time

process instances with given process variable(s)

tasks with given task variable(s)

These are just few examples of advanced queries that are useful but might be tricky to provide out of the box because:

different data bases have different capabilities when it comes to efficient searches

ORM in between adds layer of complexity while it helps to mitigate db differences

out of the box solution relies on compile time data - that can be used in queries - like jpa entities

not possible to build data structure that will fit all cases and that will be efficient to query on

Again, just few items that makes the query out of the box limited in terms of functionality. jBPM in version 6.3 comes with efficient query builders based on JPA Criteria API that aims at solving many issues that are listed above but is blocked by compile time dependency as this is JPA based solution so the entity manager must be aware of all possible types used in queries.

What's new in 6.4?

jBPM 6.4 comes with solution to address these problems. And this solution is based on DashBuilder DataSets. DataSets are like data base views - users can define them to pre filter and aggregate data before they will be queried or filtered if you like.

QueryService is part of jbpm services api - a cross framework api build to simplify usage of jBPM in embedded use case. At the same time jbpm services api is backbone of both KIE workbench and KIE Server (with its BPM capabilities).

QueryService exposes simple yet powerful set of operations:

Management operations

register query definition

replace query definition

unregister query definition

get query definition

get queries

Runtime operations

query - with two flavors:

simple based on QueryParam as filter provider

advanced based on QueryParamBuilder as filter provider

DashBuilder DataSets provide support for multiple data sources (CSV, SQL, elastic search, etc) while jBPM - since its backend is RDBMS based - focuses on SQL based data sets. So jBPM QueryService is a subset of DashBuilder DataSets capabilities to allow efficient queries with simple API.

How to use it?

Let's define use case that we can use throughout this article...

We are about to sale software and for doing that we define very simple process that deal with the sale operation. For that we have data model defined that represents our produce sale:

To be able to take advantage of the advanced queries we need to make sure we have various data being processed by jBPM so we can actually measure properly how easy we can find the relevant data. For that we create 10 000 process instances (and by that 10 000 user tasks) that we can then try to search for using different criteria.

Define query definitions

First thing user needs to do is to define data set - view of the data you want to work with - so called QueryDefinition in services api.

second query - by process id that matches "org.jbpm%" and status is in active or aborted

but that's still not very advanced, isn't it?? Let's look at how to work with variables.

Perform queries with process and task variables

Common use case is to find process instances or tasks that have given variable or have given variable with particular value.

jBPM from version 6.4 indexes task variables (and in previous versions it already did that for process instance variables) in data base. The indexation mechanism is configurable but default is to simple toString on the variable and keep it in single table:

Process instance variables - VariableInstanceLog table

Task variables - TaskVariableImpl table

equipped with this information we can define data sets that will allow us to query for task and process variables.

now we have registered new query definitions that will allow us to search for process and task and return variables as part of the query.

NOTE: usually when defining query definitions we don't want to have always data set to be same as the source tables so it's good practice to initially narrow down the amount of data for example by defining it for given project (deploymentId) or process id etc. Keep in mind that you can have query definitions as many as you like.

... and here we can find tasks that have task variable 'Comment' and with value 'Write a Document'.

So a bit of a progress with more advanced queries but still nothing that couldn't be done with out of the box queries. Main limitation with out of the box variables indexes is that they are always stored as string and thus cannot be efficiently compared on db side like using operators >, < between, etc

... but wait with query definitions you can take advantage of the SQL being used to create your data view and by that use data base specific functions that can cast or convert string into different types of data. With this you can tune the query definition to provide you with subset of data with converted types. But of course that comes with performance penalty depending on the conversion type and amount of data.

So another level of making this use case covered is to externalize process and task variables (at least some of them that shall be queryable) and keep them in separate table(s). jBPM comes with so called pluggable variable persistence strategies and ships out of the box JPA based one. So you can create your process variable as entity and thus it will be stored in separate table. You can then take advantage of mapping support (org.drools.persistence.jpa.marshaller.VariableEntity) that ensures that mapping between your entity and process instance/task will be maintained.

Here is sample ProductSale object that is defined as JPA entity and will be stored in separate table

When such entity is then used as process or task variable it will be stored in productsale table and referenced as mapping in mappedvariable table so it can be joined to find process or task instances holding that variable.

Here we can make use of different types of data in that entity - string, integer, double, date, long and by that make use of various type aware operators to filter efficiently data. So let's define another data set that will provide use with tasks that can be filtered by product sale details.

second same as first but joined with potential owner information to get tasks only for authorized users

In second query you can notice third parameter in the constructor which defines the target - this is mainly to instruct QueryService to apply default filters like user or group filter for potential. Same filter parameters can be set manually so it's just short cut given by the API.

Marked in blue are variables from custom table and in orange task details

Now we can perform queries that will benefit from externally stored variable information to be able to find tasks by various properties (of different types) using various operators

Here you can see how easy and efficient queries can be using variables stored externally. You can take advantage of type based operators to effectively narrow down the results.

As you might have noticed, this time we use another type of mapper - UserTaskInstanceWithCustomVarsQueryMapper - that is responsible for mapping both task information and custom variable. Thus we need to provide column mapping - name and type - so mapper know how to read data from data base to preserve the actual type.

Mappers are rather powerful and thus are pluggable, you can implement your own mappers that will transform the result into whatever type you like. jBPM comes with following mappers out of the box:

Mappers are registered by name to simplify lookup of them and to avoid compile time dependency to actual mapper implementation. Instead you can use:

org.jbpm.services.api.query.NamedQueryMapper

that simple expects the name of the actual mapper that will be resolved on time when the query is performed

Here you can find complete product-sale project that can be imported into KIE Workbench for inspection and customization.

QueryParamBuilder

Last but not least is the QueryParamBuilder that provides more advanced way of building filters for our data sets. By default when using query method of QueryService that accepts zero or more QueryParam instances (as we have seen in above examples) all of these params will be joined with AND operator meaning all of them must match. But that's not always the case so that's why QueryParamBuilder has been introduced so users can build up their on builders can provide them at the time the query is issued.
QueryParamBuilder is simple interface that is invoked as long as its build method returns non null value before query is performed. So you can build up an complex filter options that could not be simply expressed by list of QueryParams.

Here is basic implementation of QueryParamBuilder to give you a bit of jump start to implement your own - note that it relies on DashBuilder Dataset API.

Hi Maciej.I have many classes that I want to use queries with them. So when the MappedVariable table is created, its column "MAP_VAR_ID" generate a foreign key referencing to only one of my classes, so then I have an error with constraints violation with this foreing key.What I must do to resolve this?

Hi Maciej. Let me explain with more details. I have two classes (Requirement and Budget) that extends from "org.drools.persistence.jpa.marshaller.VariableEntity". When kie-server start, hibernate try to make a FK between Requirement_id and Budget_id on column "MAP_VAR_ID" of the MappedVariable table. Obviously, only one of them is successfull, because a FK can only reference to one table. The other one throws an error that the FK can not be created.So, for example, if hibernate create the FK for Budget, and I try to persist a Requirement, I have an error of constraint violation.There is a way to say to hibernate or kie that not create a FK for each class that extends from VariableEntity or create them on different columns?

thanks for explanation, now I understand the issue, though can't think of any generic solution.

Have you tried to drop the FK manually on the table?

You could try to use http://docs.oracle.com/javaee/6/api/javax/persistence/JoinColumn.html#columnDefinition() to provide the SQL DDL for creating the column maybe it can help in anyway. Alternatively you could generate the table yourself and set hibernate auto ddl to none to avoid any work done by hibernate automatically.

The query should return the "task-input-data" for custom Data Object as actual input value but instead it return like "addressIn" : "com.sample.params.Address@d22b867" (this is how it stores in TaskVariableImpl)

I am using mapper=UserTasksWithVariables

But I get the result properly when I am using GET tasks/{task_id}?withInputData=true

TaskVariableImpl table contains only toString value of the object and not the actual object instance as it's "indexed" to be query able. While when you get single instance of a task you have an option to load it's variables - as actual objects. So use taskvariableimpl for query purpose and task inputs/outputs of given task for more advanced operations on data.

Thanks for the reply, so If I get understand this correctly, if I need to get custom variables (business objects) as input parameters part of my query above I need to implement something similar to what has been described above?

Note - when I run dependency tree on my sample project for hibernate i see this[INFO] com.sample:jbpm-example:jar:1.0.2[INFO] \- org.jbpm:jbpm-test:jar:6.4.0.Final:provided[INFO] \- org.hibernate:hibernate-core:jar:4.2.21.Final:provided

Hi, i would like to ask for JBPM 6, i have added a custom task event listener to my task. During the afterTaskCompletedEvent, i would like to insert the task variable my custom table. How can i use back the same transaction as in afterTaskCompletedEvent? Is code below the correct way to use the jta transaction in jbpm? Is it possible to use the same transaction during task event in the listener?

if you use JPA entity as your custom table mapping then it should be enough to add that entity to persistence.xml of workbench and use the code that you linked, though I'd recommend to rather use em.persist rather than query.executeUpdate

TaskEventListener methods are executed within active transaction so when you use entity manager that comes from persistence unit with transaction type JTA it will automatically join transaction.

Hi Maciej, i would like to ask if there is some exception being throw in the event listener, will the transaction be rollback? I tried to insert some data to my custom table in the beforeProcessStarted event and when exception is throw in the listener, the process is still continue to start.

Hi Maciej, in task delegate event, is it possible to get the user id who perform the task delegation? The method below pass in a userId and i need to get the userId in the listener event. taskService.delegate(taskId, userId, targetUserId);

Hi Maciej, thanks for clarification. I try to catch the exception but this is the reason that cause my transaction not rollback. After remove the try catch clause the entire transaction is now rollback. Thanks a lot.

Hi Maciej, from jboss admin console, i notice the jpa "org.jbpm.domain" hibernate session is opened and not closed after success workflow call. i am using remote rest api to call to workflow. i also save some extra data to my custom table in the listener event as below:EntityManagerFactory emf = EntityManagerFactoryManager.get().getOrCreate(PERSISTENCE_UNIT); EntityManager em = emf.createEntityManager();em.persist(customObj);

Can you please explain in detail the steps to create a custom Mapper. We are trying to do it but the plugin jar is not loaded by JBPM and hence the mapper never gets registered. We are using JBPM 7.0.3

Lets say there are 4 potentials owners - group1, group2, group3, group4 for a human task. Now if a User A belongs to both group1 & group2. When this advance query is called via REST API by User A, BPMS internally creates the following query

Now this works fine if pageSize & page is not used in the REST API as query parameters, but when pageSize & page are used as query parameters the REST API returns incorrect results. If you look at the above SQL closely, it returns multiple records for the same task (1 task record per oeid) thereby returning incorrect number of tasks with pagination parameters.

I don't think it is related to pagination in anyway as even if you don't give them explicitly as query params defaults are applied (0 and 10).

In general the problem is in the query which obviously is duplicating it in case there are many matching potential owners on the given task. This is because you use use potential owners/org entities in the select clause and thus it will duplicate it.

I have to use the oeid in the select clause as target set for this advanced query is 'PO_TASK' which expects oeid as one of the columns to filter tasks only for the potential owner/actual owner. If I don't set the oeid, I get an error that column oeid doesn't exist.

I have noticed that this issue occurs only when the user belongs to more than one groups which are set in the GroupID attribute of the human task. In this scenario, the query returns more than 1 record for the task (1 record per oeid) which is expected. I think the implementation needs to support this scenario. Everything works fine if the user just belongs to one group.

Let's consider a scenario where the user A belongs to 3 Groups (GRP_1, GRP_2, GRP_3) which are part of the GroupID attribute (GRP_1, GRP_2, GRP_3, GRP_6, GRP_7) of this human task A. Let's assume that there are 5 instances of Task A in Ready state.If we set the pageSize to 3 when calling the REST API for advanced query, the REST API will return only 1 task record instead of 5 as the first 3 records returned by the query belong to the same task.

version 7 provides support for group by operations when running queries so that would allow to remove duplicated entries. For v6 you could try to use CUSTOM as target for the query and then use QueryParamBuilder that would collect group for given user or pass in the groups as part of the request.

Thanks Maciej! Group by operation in v7 will certainly help removing duplicates. Regarding your suggestion to use CUSTOM target in v6, I think I will still run into duplication issue if I were to pass multiple groups for an user in the QueryParamBuilder. Am I missing something?

I have a use case where I need to expose a API that needs to query running case instances from remote kie-server based on different values of the caseFile parameters. I found this article very interesting, as this is exactly what I want to achieve. However, looks like I am missing a very simple part of the puzzle which is how to get instance/reference of queryService so that I could carry out all these things that you have described here? I see so many folks have managed to use it, so certainly this is something straight forward but for some reason, I am not able to figure it out. I will appreciate any pointers here.

I am trying to associate a custom entity similar to ProductSale that you explained here with the case instance. However, workbench is not able to resolve VariableEntity that my entity is extending and build fails with error org.drools.persistence.jpa.marshaller.VariableEntity cannot be resolved to a type. I tried to add dependency of org.drools:drools-persistence:7.7.0.Final , however I see that this also fails with error:

Can you suggest what dependency should I add here? I am trying this on 7.7.0.Final release and essentially I want to be able to pass a custom object as caseData while starting a case, and then query it using custom queries.