On page 179 of Pro EJB 3 (Java Persistence API) by Mike Keith and Merrick Schincariol (2006), there is exists a cautionary paragraph which reads : "Caution. The setFirstResults() and setMaxResults() methods should not be used with queries that join across collection relationships (one-to-many and many-to-many) because those queries may return duplicate values. The duplicate values in the result set make it impossible to use a logical result position".

I did leverage these methods of the Query interface with Hibernate 4.0 as my JPA provider and did appear to in fact use joining across collection relationships, but the results I was seeing were exactly what I had hoped for, which were different than what I would have expected due to the cautionary paragraph cited above. So, I reached out to Mike Keith to attempt to explain what I was seeing, and am including that email exchange here for anyone wishing to have more detail on this topic.

Here is my note to Mike :
-------------------------
I was recently very motivated to try and make use of the pagination support in the javax.persistence.Query class via setFirstResult() and setMaxResults(). So, I wire up an implementation using Hibernate as the JPA 2.0 provider and my results after my change match my expectations. *Then* I see your caution on page 179 in Pro EJB 3 stating that this technique "should not be used with queries that join across collection relationships ...". So, I am wondering if your statement could be more precise, or if possibly my query is working due to what has been implemented in Hibernate 4.0.

Here are the relevant points of the Domain Model. There are User, Resource, and UserPractice classes. There is a OneToMany from User to Resource. There is a OneToMany from User to UserPractice. The JQPL query passed to EntityManager.createQuery() is :

The JPQL query also contains an "order by" clause that leverages non-relationship fields in the User class, but I don't expect that is a contributing factor.

Given the statement on page 179, I would not expect the Query to work due to the impossibility of using a logical result position. Hence my question. Can your statement there on page 179 be refined to explain this behavior, or do you rather expect instead this is exposing a defect in the Hibernate JPA 2.0 implementation?

Here is Mike's response to my note :
------------------------------------
The warning was really there because of the comment in the previous paragraph (at the bottom of the previous page). Different providers can implement it in different ways, but the most common way would be just to pass the calls directly to JDBC, and this could result in the problems I listed in the cautionary note. For example, if A has an m-m to B, and A has a1 and a2, and a1 contains b1, and a2 contains b1, b2, and b3 then when you do the join you will get four rows in your result:

{ (a1, b1) (a2, b1) (a2, b2) (a2, b3) }

If you set the page size to 2, you would likely get the first two rows in the first page (a1, and a2) and the second page will contain a2 and a2 again. Basically you just need to be careful about duplicates when you join. Some providers can remove them (so you only get one a2 in the second page) but they generally would not be able to notice duplicates across pages.

If you have determined how your provider works across joins and you don't get duplicates or are happy with whatever you get then you're fine.
------------------------------------

For those of you that do not have a copy of the book, here is that previous paragraph cited by Mike:
"The Query interface provides support for pagination via the setFirstResults() and setMaxResults() methods. These methods specify the first result to be received (numbered from zero) and the maximum number of results to return relative to that point. A persistence provider may choose to implement support for this feature in a number of different ways, as not all databases benefit from the same approach. It's a good idea to become familiar with how your vendor approaches pagination and what level of support exists in the target database platform for your application."

There are two issues with pagination and join fetching, the first is that because you get back more rows than root objects the number of results may not match the page size.
i.e.
If you select 10 employees each with 10 phone numbers you get back 100 rows. If you page size was 10, you would only get back 1 employee, not 10.

This may not be a major issue as long as your application expects it.

The second is that if the page ends before all of the joined rows, then you will not get the correct result in your collection.
i.e.
If your page size for the above was 25, then the first two employees would be ok, but the last would only have 5 phone numbers instead of 10.
This works in EclipseLink because EclipseLink ensure the query is ordered by the id, and does not use join fetching on the last (and first) objects in the page, as they may not be complete. This means you get 3 queries instead of 1, but get the correct data back.

EclipseLink also supports batch fetching, which is much better to use for 1-m as their is no duplicate data. You can use IN batch fetching with pagination without any issues.