A few days ago I have read the interesting interview with Ming-Yee Iu about JINQ. JINQ is, like the name already suggests, the attempt to provide something similar to LINQ for Java. The basic idea is to close the semantic gap between object-oriented code that executes queries on a relational data model. The queries for the relational database model should be easily integrated into the code such that it feels more natural.

The research behind LINQ came to the conclusion that the algorithms transforming the code into relational database queries work best with functional code. As Java 8 comes with the streams API, the author uses it to implement the ideas of his PhD in Java.

To get our hands dirty, we start with a simple project that uses Hibernate over JPA together with an H2 database and JINQ:

Using the method streamAll() of the previously created JinqJPAStreamProvider gives us access to all persons within the database. In this simple example we only want to output the first name of each person; hence we map the list and collect all results into a List. This list gets printed using the forEach() method and a reference to the println() method.

Taking a look at the generated SQL code, we see that all columns are selected:

select
person0_.id as id1_4_,
person0_.FIRST_NAME as FIRST_NA2_4_,
person0_.ID_CARD_ID as ID_CARD_4_4_,
person0_.LAST_NAME as LAST_NAM3_4_,
from
T_PERSON person0_

As can be seen from the code above, we use the first where() clause to select all time and material projects. The joinList() invocation joins the geek table while the subsequent where() clause also restricts to only select geeks with first name “Christian”. Et voila, that is the created SQL query:

Conclusion: Having worked with JPA’s criteria API some time ago, I must say that the first steps with JINQ are more intuitive and where easier to write down. JINQ really helps to close the gap between the relational database world by using streams in Java 8.

Since JPA version 2.0 the EntityManager offers the method getCriteriaBuilder() to dynamically build select queries without the need of string concatenation using the Java Persistence Query Languge (JPQL). With version 2.1 this CriteriaBuilder offers the two new methods createCriteriaDelete() and createCriteriaUpdate() that let us formulate delete and update queries using the criteria API.

For illustration purposes lets use a simple inheritance use case with the two entities Person and Geek:

Like with pure SQL we can use the method from() to specify the table the delete query should be issued against and where() to declare our predicates. This way the criteria API allows the definition of bulk deletion operations in a dynamic way without using too much string concatenations.

But how does the SQL look like that is created? First of all the ORM provider has to pay attention that we are deleting from an inheritance hierarchy with the strategy JOINED, meaning that we have two tables T_PERSON and T_GEEK where the second tables stores a reference to the parent table. Hibernate in version 4.3.8.Final creates the following SQL statements:

As we can see, Hibernate fills a temporary table with the ids of the geeks/persons that match our search criteria. Then it deletes all rows from the geek table and then all rows from the person table. Finally the temporary table gets purged.

The sequence of delete statements is clear, as the table T_GEEK has a foreign key constraint on the id column of the T_PERSON table. Hence the rows in the child table have to be deleted before the rows in the parent table. The reason why Hibernate creates a temporary table is explained in this article. To summarize it, the underlying problem is that the query restricts the rows to be deleted on a column that only exists in the child table. But the rows in the child table have to be deleted before the corresponding rows in the parent table. Having deleted the rows in the child table, i.e. all geeks with FAV_PROG_LANG='Java', makes it impossible to delete afterwards all corresponding persons as the geek rows have already been deleted. The solution to this problem is the temporary table that first collects all row ids that should be deleted. Once all ids are known, this information can be used to delete the rows first from the geek table and then from the person table.

The generated SQL statements above are of course independent from the usage of the criteria API. Using the JPQL approach leads to the same generated SQL:

When we change the inheritance strategy from JOINED to SINGLE_TABLE, the generated SQL statements also changes to a single one (here the discriminator column is DTYPE):

delete
from
T_PERSON
where
DTYPE='Geek'
and FAV_PROG_LANG=?

Conclusion: The new additions to the criteria API for deletion and update let you construct your SQL statements without the need of any string concatenation. But be aware that bulk deletions from an inheritance hierarchy can force the underlying ORM to use temporary tables in order to assemble the list of rows that have to be removed in advance.

The N+1 problem is a common issue when working with ORM solutions. It happens when you set the fetchType for some @OneToMany relation to lazy, in order to load the child entities only when the Set/List is accessed. Let’s assume we have a Customer entity with two relations: a set of orders and a set of addresses for each customer.

Hibernate 4.3.5 (as shipped with JBoss AS Wildfly 8.1.0CR2) will generate the following series of SQL statements out of it for only two(!) customers in the database:

Hibernate:
select
customeren0_.id as id1_1_,
customeren0_.name as name2_1_,
customeren0_.numberOfPurchases as numberOf3_1_
from
CustomerEntity customeren0_
Hibernate:
select
orders0_.CUSTOMER_ID as CUSTOMER4_1_0_,
orders0_.id as id1_2_0_,
orders0_.id as id1_2_1_,
orders0_.campaignId as campaign2_2_1_,
orders0_.CUSTOMER_ID as CUSTOMER4_2_1_,
orders0_.timestamp as timestam3_2_1_
from
OrderEntity orders0_
where
orders0_.CUSTOMER_ID=?
Hibernate:
select
orders0_.CUSTOMER_ID as CUSTOMER4_1_0_,
orders0_.id as id1_2_0_,
orders0_.id as id1_2_1_,
orders0_.campaignId as campaign2_2_1_,
orders0_.CUSTOMER_ID as CUSTOMER4_2_1_,
orders0_.timestamp as timestam3_2_1_
from
OrderEntity orders0_
where
orders0_.CUSTOMER_ID=?

As we can see, the first query selects all customers from the table CustomerEntity. The following two selects fetch then the orders for each customer we have loaded in the first query. When we have 100 customers instead of two, we will get 101 queries. One initial query to load all customers and then for each of the 100 customers an additional query for the orders. That is the reason why this problem is called N+1.

A common idiom to solve this problem is to force the ORM to generate an inner join query. In JPQL this can be done by using the JOIN FETCH clause like demonstrated in the following code snippet:

In situations where you know that you will have to load all orders for each customer, the JOIN FETCH clause minimizes the number of SQL statements from N+1 to 1. This comes of course with the drawback that you now transfer for all orders of one customer the customer data again and again (due to the additional customer columns in the query).

The JPA specification introduces with version 2.1 so called NamedEntityGraphs. This annotation lets you describe the graph a JPQL query should load in more detail than a JOIN FETCH clause can do and therewith is another solution to the N+1 problem. The following example demonstrates a NamedEntityGraph for our customer entity that is supposed to load only the name of the customer and its orders. The orders are described in the subgraph ordersGraph in more detail. Here we see that we only want to load the fields id and campaignId of the order.

Hibernate supports the @NamedEntityGraph annotation since version 4.3.0.CR1 and creates the following SQL statement for the JPQL query shown above:

Hibernate:
select
customeren0_.id as id1_1_0_,
orders1_.id as id1_2_1_,
customeren0_.name as name2_1_0_,
customeren0_.numberOfPurchases as numberOf3_1_0_,
orders1_.campaignId as campaign2_2_1_,
orders1_.CUSTOMER_ID as CUSTOMER4_2_1_,
orders1_.timestamp as timestam3_2_1_,
orders1_.CUSTOMER_ID as CUSTOMER4_1_0__,
orders1_.id as id1_2_0__
from
CustomerEntity customeren0_
left outer join
OrderEntity orders1_
on customeren0_.id=orders1_.CUSTOMER_ID

We see that Hibernate does not issue N+1 queries but that instead the @NamedEntityGraph annotation has forced Hibernate to load the orders per left outer join. This is of course a subtle difference to the FETCH JOIN clause, where Hibernate created an inner join. The left outer join would also load customers for which no order exists in contrast to the FETCH JOIN clause, where we would only load customers that have at least one order.

Interestingly is also that Hibernate loads more than the specified attributes for the tables CustomerEntity and OrderEntity. As this conflicts with the specification of @NamedEntityGraph (section 3.7.4) I have created an JIRA issue for that.

Conclusion: We have seen that with JPA 2.1 we have two solutions for the N+1 problem: We can either use the FETCH JOIN clause to eagerly fetch a @OneToMany relation, which results in an inner join, or we can use @NamedEntityGraph feature that lets us specify which @OneToMany relation to load via left outer join.

Using an ORM to abstract from your specific database and to let it create and issue all the SQL statements you would have to write by hand yourself seems handy. This is what made ORM solutions popular.

But it also comes with a downside: As the ORM does a lot of work for you, you lose to some degree control over the generated SQL and you have to rely on the ORM to create a high-performance statement for you. But it can happen that the SQL generated by the ORM is not what you might have written by hand and expected the ORM to do for you. In this case you have to get back control over the SQL and put your hands on the code again.

In huge applications this task is not as trivial, as there might be hundreds of statements issued to the database that stem from hundreds of lines of Java code that makes heavy usage of JPA features. Tracing the SQL statement that your database profiling tool has identified as problematic down to the actual code line becomes tedious.

We know that we can enable SQL statement logging for Hibernate with the following two lines in our persistence.xml:

But this will only output the already generated SQL; the actual Java code line is still not visible. For smaller applications it might be feasible to attach a debugger to the application server and debug through the code until you have found the line that logs the problematic SQL statement, but for bigger applications this is time consuming.

As Hibernate itself does not provide any means of intercepting the logging and enhance it with more information, we will have to do this on our own. The JBoss documentation indicates that it is possible to write your own custom logging handler. As this logging handler receives all the logging messages and therewith also the messages produces by Hibernate with enabled SQL logging, we can try to find the line we are looking for and then output a stack trace to our own log file.

Writing a custom logging handler turns out to be very simple. All you have to do is setup a small project with a class that extends the class Handler from the JDK package java.util.logging:

The publish() method receives all logging output in form of an instance of LogRecord. Its method getMessage() lets us access the output directly. Hence we can match this message against some keywords we have loaded from some configuration file:

Buffer is here some simple data structure (e.g. guava’s EvictingQueue) that buffers the last lines, as the method publish() is called for each line(!) of output. As a complete SQL statement spans more than one line, we have to remember a couple of them. Next to the buffered lines and the current line we also output a String representation of the current stack trace. This tells us later in the log file from where we are called and therewith which line of Java code in our project causes the current statement.

Once we have compiled the project we can copy the resulting jar file to the newly created folder structure under: $JBOSS_HOME/modules/system/layers/base/com/mydomain/mymodule/main (for JBoss AS 7.2). In order to tell JBoss AS about our new module, we have to create a XML file called module.xml with the following content:

Here we can see clearly which OneToMany relation causes the problematic select statement we were looking for.

Conclusion: Using a custom logging handler to inject the current stack trace into the logging of the SQL statements may help you when you want to find the
exact location in the source code where a concrete query is issued. It turned also out that writing your own custom logging handler for JBoss AS is also a straight forward task.

You may come to the situation where you have to perform a bulk deletion on a huge amount of datasets stored in a relational database. If you use JPA with Hibernate as underlying OR mapper, you might try to call the remove() method of the EntityManager in a way like the following:

First of all, we load a reference representation of the entity we want to delete and then pass this reference to the EntityManager. Let’s assume the RootEntity from above has a child relation to a class called ChildEntity:

If we now turn on the property show_sql of hibernate, we will wonder what SQL statements are issued:

select
rootentity0_.id as id5_1_,
rootentity0_.field1 as field2_5_1_,
rootentity0_.field2 as field3_5_1_,
childentit1_.PARENT as PARENT5_3_,
childentit1_.id as id3_,
childentit1_.id as id4_0_,
childentit1_.field1 as field2_4_0_,
childentit1_.field2 as field3_4_0_,
childentit1_.PARENT as PARENT4_0_
from
ROOT_ENTITY rootentity0_
left outer join
CHILD_ENTITY childentit1_
on rootentity0_.id=childentit1_.PARENT
where
rootentity0_.id=?
delete
from
CHILD_ENTITY
where
id=?
delete
from
ROOT_ENTITY
where
id=?

Why does Hibernate first load all data into memory in order to delete this data immediately afterwards? The reason is that JPA’s lifecycle requires that the object is in “managed” state, before it can be deleted. Only in this state all lifecycle functionality like interceptors is available (see here). Therefore Hibernate issues a SELECT query before the deletion in order to transfer both RootEntity and ChildEntity to the “managed” state.
But what can we do, if we just want to delete RootEntity and ChildEntity, if we know the id of RootEntity? The answer is to use a simple DELETE query like the following one. But due to the integrity constraint on the child table, we first have to delete all depending child entities. The following code demonstrates how:

The above code results in the three SQL statements we would have expected by calling remove(). Now you may argue, that this way of deletion is more complicated than just calling the EntityManager’s remove() method. It also ignores annotations like @OneToMany and @ManyToOne we have placed in the two entity classes.
So why not write some code that uses the knowledge about the two entities that already exists in the two class files? First of all, we look for @OneToMany annotations using reflection in the RootEntity class, extract the type of the child entity and then look for its back relation field annotated with @ManyToOne. Having done this, we can easily write the three SQL statements in a more generic way:

The methods getFirstActualTypeArgument(), getManyToOneField(), getIdField() and getOneToManyFields() in the code above are not depicted here, but do what their name sounds like. Once implemented we can easily delete all entities beginning with the root of the tree.

A simple example application that can be used to examine the behavior and solution described above, can be found on github.