A few days ago I have read the interesting interview with Ming-Yee Iu about JINQ. JINQ is, like the name already suggests, the attempt to provide something similar to LINQ for Java. The basic idea is to close the semantic gap between object-oriented code that executes queries on a relational data model. The queries for the relational database model should be easily integrated into the code such that it feels more natural.

The research behind LINQ came to the conclusion that the algorithms transforming the code into relational database queries work best with functional code. As Java 8 comes with the streams API, the author uses it to implement the ideas of his PhD in Java.

To get our hands dirty, we start with a simple project that uses Hibernate over JPA together with an H2 database and JINQ:

Using the method streamAll() of the previously created JinqJPAStreamProvider gives us access to all persons within the database. In this simple example we only want to output the first name of each person; hence we map the list and collect all results into a List. This list gets printed using the forEach() method and a reference to the println() method.

Taking a look at the generated SQL code, we see that all columns are selected:

select
person0_.id as id1_4_,
person0_.FIRST_NAME as FIRST_NA2_4_,
person0_.ID_CARD_ID as ID_CARD_4_4_,
person0_.LAST_NAME as LAST_NAM3_4_,
from
T_PERSON person0_

As can be seen from the code above, we use the first where() clause to select all time and material projects. The joinList() invocation joins the geek table while the subsequent where() clause also restricts to only select geeks with first name “Christian”. Et voila, that is the created SQL query:

Conclusion: Having worked with JPA’s criteria API some time ago, I must say that the first steps with JINQ are more intuitive and where easier to write down. JINQ really helps to close the gap between the relational database world by using streams in Java 8.

The JPA version 2.1 brings a new way how to handle the synchronization between the persistence context and the current JTA transaction as well as the resource manager. The term resource manager comes from the Java Transaction API and denotes a component that manipulates one resource (for example a concrete database that is manipulated by using its JDBC driver). Per default a container-managed persistence context is of type SynchronizationType.SYNCHRONIZED, i.e. this persistence context automatically joins the current JTA transaction and updates to the persistence context are propagated to the underlying resource manager.

By creating a persistence context that is of the new type SynchronizationType.UNSYNCHRONIZED, the automatic join of the transaction as well as the propgation of updates to the resource manager is disabled. In order to join the current JTA transaction the code has to call the method joinTransaction() of the EntityManager. This way the EntityManager’s persistence context gets enlisted in the transaction and is registered for subsequent notifications. Once the transaction is commited or rolled back, the persistence context leaves the transaction and is not attached to any further transaction until the method joinTransaction() is called once again for a new JTA transaction.

Before JPA 2.1 one could implement a conversation that spans multiple method calls with a @Stateful session bean as described by Adam Bien here:

The persistence context is of type EXTENDED and therefore lives longer than the JTA transactions it is attached to. As the persistence context is per default also of type SYNCHRONIZED it will automatically join any transaction that is running when any of the session bean’s methods are called. In order to prevent that to happen for most of the bean’s methods, the annotation @TransactionAttribute(TransactionAttributeType.NEVER) tells the container to not open any transaction for this bean. Therefore the methods persist() and list() run without a transaction. This behavior is different for the method commit(). Here the annotation @TransactionAttribute(TransactionAttributeType.REQUIRES_NEW) tells the container to create a new transaction before the method is called and therefore the bean’s EntityManager will join it automatically.

With the new type SynchronizationType.UNSYNCHRONIZED the code above can be rewritten as depicted in the following listing:

Now that the EntityManager won’t automatically join the current transaction, we can omit the @TransactionAttribute annotations. Any running transaction won’t have an impact on the EntityManager until we explicitly join it. This is now done in the method commit() and could even be done on the base on some dynamic logic.

In order to test the implementation above, we utilize a simple REST resource:

This resource provides methods to persist a person, list all persisted person and to commit the current changes. As we are going to use a stateful session bean, we annotate the resource with @SessionScoped and let the container inject the Controller bean.

By calling the following URL after the application has been deployed to some Java EE container, a new person gets added to the unsynchronized persistence context, but is not stored in the database.

http://localhost:8080/jpa2.1-unsychronized-pc/rest/persist

Even a call of the list() method won’t return the newly added person. Only by finally synchronizing the changes in the persistence context to the underlying resource with a call of commit(), the insert statement is send to the underlying database.

Conclusion: The new UNSYNCHRONIZED mode of the persistence context lets us implement conversations over more than one method invocation of a stateful session bean with the flexibility to join a JTA transaction dynamically based on our application logic without the need of any annotation magic.

The N+1 problem is a common issue when working with ORM solutions. It happens when you set the fetchType for some @OneToMany relation to lazy, in order to load the child entities only when the Set/List is accessed. Let’s assume we have a Customer entity with two relations: a set of orders and a set of addresses for each customer.

Hibernate 4.3.5 (as shipped with JBoss AS Wildfly 8.1.0CR2) will generate the following series of SQL statements out of it for only two(!) customers in the database:

Hibernate:
select
customeren0_.id as id1_1_,
customeren0_.name as name2_1_,
customeren0_.numberOfPurchases as numberOf3_1_
from
CustomerEntity customeren0_
Hibernate:
select
orders0_.CUSTOMER_ID as CUSTOMER4_1_0_,
orders0_.id as id1_2_0_,
orders0_.id as id1_2_1_,
orders0_.campaignId as campaign2_2_1_,
orders0_.CUSTOMER_ID as CUSTOMER4_2_1_,
orders0_.timestamp as timestam3_2_1_
from
OrderEntity orders0_
where
orders0_.CUSTOMER_ID=?
Hibernate:
select
orders0_.CUSTOMER_ID as CUSTOMER4_1_0_,
orders0_.id as id1_2_0_,
orders0_.id as id1_2_1_,
orders0_.campaignId as campaign2_2_1_,
orders0_.CUSTOMER_ID as CUSTOMER4_2_1_,
orders0_.timestamp as timestam3_2_1_
from
OrderEntity orders0_
where
orders0_.CUSTOMER_ID=?

As we can see, the first query selects all customers from the table CustomerEntity. The following two selects fetch then the orders for each customer we have loaded in the first query. When we have 100 customers instead of two, we will get 101 queries. One initial query to load all customers and then for each of the 100 customers an additional query for the orders. That is the reason why this problem is called N+1.

A common idiom to solve this problem is to force the ORM to generate an inner join query. In JPQL this can be done by using the JOIN FETCH clause like demonstrated in the following code snippet:

In situations where you know that you will have to load all orders for each customer, the JOIN FETCH clause minimizes the number of SQL statements from N+1 to 1. This comes of course with the drawback that you now transfer for all orders of one customer the customer data again and again (due to the additional customer columns in the query).

The JPA specification introduces with version 2.1 so called NamedEntityGraphs. This annotation lets you describe the graph a JPQL query should load in more detail than a JOIN FETCH clause can do and therewith is another solution to the N+1 problem. The following example demonstrates a NamedEntityGraph for our customer entity that is supposed to load only the name of the customer and its orders. The orders are described in the subgraph ordersGraph in more detail. Here we see that we only want to load the fields id and campaignId of the order.

Hibernate supports the @NamedEntityGraph annotation since version 4.3.0.CR1 and creates the following SQL statement for the JPQL query shown above:

Hibernate:
select
customeren0_.id as id1_1_0_,
orders1_.id as id1_2_1_,
customeren0_.name as name2_1_0_,
customeren0_.numberOfPurchases as numberOf3_1_0_,
orders1_.campaignId as campaign2_2_1_,
orders1_.CUSTOMER_ID as CUSTOMER4_2_1_,
orders1_.timestamp as timestam3_2_1_,
orders1_.CUSTOMER_ID as CUSTOMER4_1_0__,
orders1_.id as id1_2_0__
from
CustomerEntity customeren0_
left outer join
OrderEntity orders1_
on customeren0_.id=orders1_.CUSTOMER_ID

We see that Hibernate does not issue N+1 queries but that instead the @NamedEntityGraph annotation has forced Hibernate to load the orders per left outer join. This is of course a subtle difference to the FETCH JOIN clause, where Hibernate created an inner join. The left outer join would also load customers for which no order exists in contrast to the FETCH JOIN clause, where we would only load customers that have at least one order.

Interestingly is also that Hibernate loads more than the specified attributes for the tables CustomerEntity and OrderEntity. As this conflicts with the specification of @NamedEntityGraph (section 3.7.4) I have created an JIRA issue for that.

Conclusion: We have seen that with JPA 2.1 we have two solutions for the N+1 problem: We can either use the FETCH JOIN clause to eagerly fetch a @OneToMany relation, which results in an inner join, or we can use @NamedEntityGraph feature that lets us specify which @OneToMany relation to load via left outer join.

Using an ORM to abstract from your specific database and to let it create and issue all the SQL statements you would have to write by hand yourself seems handy. This is what made ORM solutions popular.

But it also comes with a downside: As the ORM does a lot of work for you, you lose to some degree control over the generated SQL and you have to rely on the ORM to create a high-performance statement for you. But it can happen that the SQL generated by the ORM is not what you might have written by hand and expected the ORM to do for you. In this case you have to get back control over the SQL and put your hands on the code again.

In huge applications this task is not as trivial, as there might be hundreds of statements issued to the database that stem from hundreds of lines of Java code that makes heavy usage of JPA features. Tracing the SQL statement that your database profiling tool has identified as problematic down to the actual code line becomes tedious.

We know that we can enable SQL statement logging for Hibernate with the following two lines in our persistence.xml:

But this will only output the already generated SQL; the actual Java code line is still not visible. For smaller applications it might be feasible to attach a debugger to the application server and debug through the code until you have found the line that logs the problematic SQL statement, but for bigger applications this is time consuming.

As Hibernate itself does not provide any means of intercepting the logging and enhance it with more information, we will have to do this on our own. The JBoss documentation indicates that it is possible to write your own custom logging handler. As this logging handler receives all the logging messages and therewith also the messages produces by Hibernate with enabled SQL logging, we can try to find the line we are looking for and then output a stack trace to our own log file.

Writing a custom logging handler turns out to be very simple. All you have to do is setup a small project with a class that extends the class Handler from the JDK package java.util.logging:

The publish() method receives all logging output in form of an instance of LogRecord. Its method getMessage() lets us access the output directly. Hence we can match this message against some keywords we have loaded from some configuration file:

Buffer is here some simple data structure (e.g. guava’s EvictingQueue) that buffers the last lines, as the method publish() is called for each line(!) of output. As a complete SQL statement spans more than one line, we have to remember a couple of them. Next to the buffered lines and the current line we also output a String representation of the current stack trace. This tells us later in the log file from where we are called and therewith which line of Java code in our project causes the current statement.

Once we have compiled the project we can copy the resulting jar file to the newly created folder structure under: $JBOSS_HOME/modules/system/layers/base/com/mydomain/mymodule/main (for JBoss AS 7.2). In order to tell JBoss AS about our new module, we have to create a XML file called module.xml with the following content:

Here we can see clearly which OneToMany relation causes the problematic select statement we were looking for.

Conclusion: Using a custom logging handler to inject the current stack trace into the logging of the SQL statements may help you when you want to find the
exact location in the source code where a concrete query is issued. It turned also out that writing your own custom logging handler for JBoss AS is also a straight forward task.

One year has passed by since the Java EE 7 specification has been published. Now that Wildfly 8 Final has been released, it is time to take a closer look at the new features.

One thing which was missing since the beginning of the Java EE days is the ability to work with fully-fledged Java EE threads. Java EE 6 has already brought us the @Asynchronous annotation with which we could execute single methods in the background, but a real thread pool was still out of reach. But all this is now history since Java EE 7 introduced the ManagedExecutorService:

@Resource
ManagedExecutorService managedExecutorService;

Like the well-known ExecutorService from the Standard Edition, the ManagedExecutorService can be used to submit tasks that are executed within a thread pool. One can choose if the tasks submitted should implement the Runnable or Callable interface.

In contrast to a normal SE ExecutorService instance, the ManagedExecutorService provides threads that can access for example UserTransactions from JNDI in order to execute JPA transactions during their execution. This feature is a huge difference to threads started like in a SE environment.

It is important to know, that the transactions started within the ManagedExecutorService’s thread pool run outside of the scope of the transaction of the thread which submits the tasks. This makes it possible to implement scenarios in which the submitting thread inserts some information about the started tasks into the database while the long-running tasks execute their work within an independent transaction.

Now, after we have learned something about the theory, let’s put our hands on some code. First we write a @Stateless EJB that gets the ManagedExecutorService injected:

Here we can inject the EntityManager to persist some entities into our database. The UserTransaction that we need for the commit has to be retrieved from the JNDI. An injection using the @Resource annotation is not possible within a normal managed bean.

To circumvent the UserTransaction we could of course call the method of another EJB and use the other EJB’s transaction to commit the changes to the database. The following code shows an alternative implementation using the injected EJB to persist the entity:

That’s it. With these few lines of code we have implemented a fully working Java EE application whose functionality can be called over a REST interface and that executes its core functionality asynchronously within worker threads with their own transactions.

Conclusion: The ManagedExecutorService is a great feature to integrate asynchronous functionality using all the standard Java EE features like JPA and transactions into enterprise applications. I would say the waiting was worthwhile.

You may come to the situation where you have to perform a bulk deletion on a huge amount of datasets stored in a relational database. If you use JPA with Hibernate as underlying OR mapper, you might try to call the remove() method of the EntityManager in a way like the following:

First of all, we load a reference representation of the entity we want to delete and then pass this reference to the EntityManager. Let’s assume the RootEntity from above has a child relation to a class called ChildEntity:

If we now turn on the property show_sql of hibernate, we will wonder what SQL statements are issued:

select
rootentity0_.id as id5_1_,
rootentity0_.field1 as field2_5_1_,
rootentity0_.field2 as field3_5_1_,
childentit1_.PARENT as PARENT5_3_,
childentit1_.id as id3_,
childentit1_.id as id4_0_,
childentit1_.field1 as field2_4_0_,
childentit1_.field2 as field3_4_0_,
childentit1_.PARENT as PARENT4_0_
from
ROOT_ENTITY rootentity0_
left outer join
CHILD_ENTITY childentit1_
on rootentity0_.id=childentit1_.PARENT
where
rootentity0_.id=?
delete
from
CHILD_ENTITY
where
id=?
delete
from
ROOT_ENTITY
where
id=?

Why does Hibernate first load all data into memory in order to delete this data immediately afterwards? The reason is that JPA’s lifecycle requires that the object is in “managed” state, before it can be deleted. Only in this state all lifecycle functionality like interceptors is available (see here). Therefore Hibernate issues a SELECT query before the deletion in order to transfer both RootEntity and ChildEntity to the “managed” state.
But what can we do, if we just want to delete RootEntity and ChildEntity, if we know the id of RootEntity? The answer is to use a simple DELETE query like the following one. But due to the integrity constraint on the child table, we first have to delete all depending child entities. The following code demonstrates how:

The above code results in the three SQL statements we would have expected by calling remove(). Now you may argue, that this way of deletion is more complicated than just calling the EntityManager’s remove() method. It also ignores annotations like @OneToMany and @ManyToOne we have placed in the two entity classes.
So why not write some code that uses the knowledge about the two entities that already exists in the two class files? First of all, we look for @OneToMany annotations using reflection in the RootEntity class, extract the type of the child entity and then look for its back relation field annotated with @ManyToOne. Having done this, we can easily write the three SQL statements in a more generic way:

The methods getFirstActualTypeArgument(), getManyToOneField(), getIdField() and getOneToManyFields() in the code above are not depicted here, but do what their name sounds like. Once implemented we can easily delete all entities beginning with the root of the tree.

A simple example application that can be used to examine the behavior and solution described above, can be found on github.