Pages

Friday, December 17, 2010

Java performance optimization is part analysis, and part superstition and witch craft.

In deciding how to optimize their applications a lot of developers search the web for what people say is faster, or go by what they have heard from their co-workers sister's boyfriend's cousin, or by what seems to be the "public opinion". Sometimes the public opinion is correct, and their application benefits, and sometimes it is not, and they waste their time and effort and make little performance improvement, or make things worse.

True performance optimization involves measuring the current performance. Profiling the application and determining the bottlenecks. Investigating and implementing optimizations to avoid the bottlenecks.

But how should one code on a day to day basis for optimal performance? Are synchronized methods slow? Are final methods fast? What is faster Vector, ArrayList or LinkedList? What is the optimal way to iterate a List? Is refection slow? I will attempt to answer these questions in this post.

In the EclipseLink project we are generally very concerned about performance, as we are a persistence library used by other applications, so like the JDK we are part of the plumbing and need to be as optimized as possible.

We have a number of performance test suites in EclipseLink, mainly to measure our persistence performance, but we have one test suite that has nothing to do with persistence. Our Java performance test suite just measures the performance of different operations in Java. These tests help us determine how to best optimize our code.

The tests function by running a certain operation for a set amount of time and measuring the number of operations in the time period. Next the next operation is run and measured the same way. This is then repeated 5 times, and the max/min values are rejected, the average of the middle 3 is computed, the standard deviation is computed, and the averages of the two operations are compared. Tests are single threaded.

The tests were run on Oracle Sun JDK 1.6.0_07 on 32 bit Windows.

Maps

There are several different Map implementations in Java. This test compares the performance for various sizes of Maps. The test instantiates the Map, does n (size) puts, then n gets and n removes.

Map Operation Performance Comparison

Map

Size

Average (operations/10 seconds)

%STD

%DIF (with Hashtable)

Hashtable

10

3605235

0.03%

0%

HashMap

10

2908854

0.02

-23%

LinkedHashMap

10

2594492

0.03%

-38%

IdentityHashMap

10

1346278

0.01%

-167%

ConcurrentHashMap

10

1009259

0.0%

-257%

HashSet

10

2927254

0.02%

-23%

Hashtable

100

357229

0.01%

0%

HashMap

100

274587

0.03%

-30%

LinkedHashMap

100

269840

0.03%

-32%

IdentityHashMap

100

110801

0.02%

-222%

ConcurrentHashMap

100

119068

0.01%

-200%

HashSet

100

281960

0.04%

-26%

Hashtable

1000

34034

6.2%

0%

HashMap

1000

27818

0.06%

-22%

LinkedHashMap

1000

25514

0.03%

-33%

IdentityHashMap

1000

11650

0.04%

-192%

ConcurrentHashMap

1000

12420

2.9%

-174%

HashSet

1000

26888

0.04%

-26%

These results are quite interesting, I never would have expected such a big difference in performance between classes doing basically the same well defined thing, that has been around for quite some time. It is surprising that Hashtable performs better than HashMap, when HashMap is suppose to be the replacement for Hashtable, and does not suffer from its limitation of using synchronized methods, which are suppose to be slow. Note: After exploring other JVMs in my next post, it seems that this anomaly only exists in JDK 1.6.7, in the latest JVM, and most other JVM, HashMap seems to be faster than Hashtable.

I would expect LinkedHashMap and ConcurrentHashMap to be somewhat slower, as they have additional overhead and perform better in specific use cases, but it is odd that IdentityHashMap is so much slower, given my test object did not define a hashCode(), so was using its identity, so the map was doing the identical thing as HashMap and Hashtable. Also interesting that HashSet has the same performance as HashMap, given it in theory could be a simpler data structure (in reality it is a subclass of HashMap, so performs the same).

Note that these are single threaded results. Although Hashtable outperformed HashMap in this test, in a multi-threaded (and multi-CPU) environment, Hashtable would perform much worse because of the method synchronization. Assuming read-only access of coarse, as concurrent access to a HashMap would blow up. ConcurrentHashMap does perform the best in a concurrent read-write environment.

Lists

For Maps, Hashtable turned out to have the best performance. Lets now investigate Lists, will the old Vector implementation have better performance than ArrayList? The list test instantiates a list, then does n (size) adds followed by n gets by index.

List Operation Performance Comparison

List

Size

Average (operations/10 seconds)

%STD

%DIF (with Vector)

Vector

10

11625453

0.006%

0%

ArrayList

10

18448453

0.08%

+56%

LinkedList

10

12362290

0.03%

+6.0%

Vector

100

1241420

0.2%

0%

ArrayList

100

1893581

0.1%

+52%

LinkedList

100

1012740

0.01%

-22%

Vector

1000

132182

0.2%

0%

ArrayList

1000

223969

0.1%

+69%

LinkedList

1000

12689

0.02%

-941%

So, it seems that ArrayList is much faster than Vector. Given both Vector and Hashtable are synchronized, I assume it is not the synchronized methods, just the implementation. LinkedList surprisingly performs as good as Vector for small sizes, but then performs much worse on larger sizes because of the indexed get. This is expected as LinkedList performs good in specific use cases, such as removing from the head or middle, but this was not tested.

Iteration

There are many way to iterate a List in Java. For a Vector a Enumeration can be used, or an Iterator in any List. A simple for loop with an index can also be used for any List. Java 5 also defines a simplified for syntax for iterating any Collection. This test compares the performance of iteration, the List has already been pre-populated.

Iteration Performance Comparison

List

Average (operations/10 seconds)

%STD

%DIF (with Vector)

for index (Vector)

6079272

0.06%

0%

for index (ArrayList)

6048324

0.03%

-0.5%

Enumeration (Vector)

4736049

0.02%

-28%

Iterator (Vector)

2840094

0.05%

-114%

Iterator (ArrayList)

1935122

0.008%

-214%

for (Vector)

2841567

0.05%

-113%

for (ArrayList)

1933576

0.04%

-214%

So using a for loop with an index is faster than an Enumerator or Iterator. This is expected, as it avoids the cost of the iteration object instance. It is interesting that a Enumerator is faster than an Iterator, and a Vector Iterator is faster than an ArrayList Iterator. Unfortunately there is no magic in the Java 5 for syntax, it just calls iterator().

However, the Java 5 syntax is simpler, so I must admit I would use it in any non performance critical code, and maybe some day the JVM will do some magic and for will have better performance.

Method Execution

A method in Java can be defined in several different ways. It can be synchronized, final, called reflectively, or not called at all (in-lined). This next test tries to determine the performance overhead to a method call. The test executes a simple method that just increments an index. The test executes the method 100 times to avoid the test call overhead. For the reflective usage the Method object and arguments array are cached.

Method Execution Performance Comparison

Method

Average (100 operations/10 seconds)

%STD

%DIF (with normal)

Normal

25533130

0.7%

0%

synchronized

13383707

4.3%

-93%

Block synchronized

8244087

4.7%

-203%

final

26873873

1.2%

+6.1%

In-lined

26816109

1.5%

+5.2%

volatile

1503727

0.1%

-1539%

Reflection

159069

3.2%

-15646%

Interesting results. Synchronized methods are slower, and using a synchronized block inside a method seems to be slower than just making the method synchronized. Final methods seem to be slightly faster, but very minor, and seem to have the same improvement as in-lining the method, so I assume that is what the JVM is doing.

Calling a method through reflection is significantly slower. Reflection has improved much since 1.5 added byte-code generation, but if you profile a reflective call you will see several isAssignable() checks before the method is invoked to ensure the object and arguments are of the correct type, it is odd that these cannot be handled through casts in the generated code, or typing errors just trapped.

CORRECTION
Originally volatile was showing an improvement in performance because of a bug in the volatile test. After correcting the bug the test now shows a huge overhead in performance. The usage of volatile on the int field that is being incremented in the test method is causing a 15x performance overhead. This is surprising, and much larger than the synchronized overhead, but still smaller than the reflection overhead. This is especially odd as the field is an int which one would think is atomic anyway. I imaging the affect the volatile has on the JVM memory usage is somehow causing the overhead. In spite of the overhead, I would still use volatile when required, in concurrent pieces of code where the same variable is concurrently modified. I have done multi-threaded tests comparing its concurrency with synchronizing the method, and using volatile is better than using synchronization for concurrency.

Note that this difference in cost of execution is on a method that does basically nothing. Generally, performance bottlenecks in applications are in places that do something, not nothing, and it is the something that is the bottleneck, not how the method is called. Even in the reflective call, the cost was less than 1 millionth of a second, so on a method that toke 1 millisecond the overhead would be irrelevant. Also, I assume these results are very JVM specific, other JVMs, older JVMs, and the JVMs of the future may behave much differently.

Reflection

There are two types of reflection, field reflection and method reflection. In JPA this normally corresponds to using FIELD access of PROPERTY access (annotating the fields or the get methods). Which is faster? This test sets a single variable, either directly, by calling a set method, or through field or set method reflection.

Reflection Performance Comparison

Type

Average (operations/10 seconds)

%STD

%DIF (with in-lined)

In-lined

44105730

0.7%

0%

Set method

46189638

1.2%

+4.7%

Field Reflection

23488987

1.4%

-87%

Method Reflection

10862684

0.8%

-306%

So its appears that in JDK 1.6 field reflection is faster than set method reflection. Part of this difference is that for method reflection an Object array must be created to call the set method, where as field reflection does not require this. Method reflection improved a lot in JDK 1.5, but it still seems slower than field reflection, at least for set methods. Both have an overhead over a compiled method execution or direct variable access.

If you look at the results the difference between reflection and normal execution is much less than in the previous test. This is because in the previous test the method was executed 100 times inside the test, so the overhead of calling the test method was reduced, where as this test only called it once, so had the test method call overhead in it. This highlights an important fact, that how you call the method that sets the field has a big impact on the performance. If you use a complex layering of interfaces and generated code to call a set method directly, versus calling it reflectively, the overhead of the layering may be worse than the reflective call.

In EclipseLink field reflection is used by default, but it depends if you annotate your fields or get methods. If you annotate your get methods, then method reflection is used. If you use weaving (enabled through either, the EclipseLink agent, static weaving, JEE, or Spring) and use FIELD access, then EclipseLink will weave in a generic get and set method into your classes to avoid reflection. This amounts to a minor optimization in the context of database persistence, and can be disabled using the persistence unit property "eclipselink.weaving.internal"="false".

Summary

So this post has presented a lot of data on how different things perform in Oracle Sun JDK 1.6 on Windows. It would be interesting to see how the same tests perform in different JVMs in different environments. Perhaps I will investigate that next.

The source code to the above tests is in the EclipseLink SVN repository,here

Thursday, November 18, 2010

In my last post I investigated different ways to optimize the loading of objects and their relationships. I presented some performance results, but from a very small database with small query result sets. In this post I will investigate how those same query optimization strategies scale to larger data sets.

The previous post only had a database of 12 rows. So for this run I will increase the database size by 5,000 to 60,000 rows. Still not a huge database, but should be big enough to highlight any performance differences in the results. I will also increase the size of the query result set from 6 employees, to 5,000 employees.

The first thing that I noticed in this run was that Oracle has a limit of 1,000 parameters per statement. Since the IN batch fetching binds a large array, and I'm reading 5,000 objects, this limit was exceeded and the run blew up with a database error. The BatchFetchPolicy in EclipseLink accounts for this and defines a size for the max number of ids to include the an IN. The default size limit in EclipseLink was suppose to be 500, but I think I remember increasing it to 100,000 to test something when I was developing the feature, and, well..., I guess never set it back, oops, I will fix this...

EclipseLink defines a JPA Query hint "eclipselink.batch.size" that allows the size to be set. So I will set this to 500 for the test. This means that to read in all of the 5,000 objects, the IN batch fetch will need to execute 10 queries per batch fetched relationship. It will be interesting to see how it compares to the other query optimization techniques.

The run time was also increased to 10 minutes from 1 minute because reading 5,000 objects obviously takes longer than reading 6. Also, since the last run our lab got a new database machine. The old database was running on Linux on an old server machine, and the new database is running on a new Oracle Sun server machine running Linux on a virtualized environment. The client machine is the same, my old desktop running Oracle Sun JDK 1.6 on Windows XP. Both databases were Oracle 11g.

Big Data Results, run 1, simple (fetch address, phoneNumbers)

Query

Average (queries/10 minutes)

%STD

%DIF (of standard)

standard

27

4.5%

0%

join fetch

307

0.1%

+1037%

batch fetch (JOIN)

310

0.2%

+1048%

batch fetch (EXISTS)

309

0.1%

+1044%

batch fetch (IN)

261

0.1%

+866%

The results show that join fetching and batch fetching have basically equivalent performance, and about 10x better performance than the non-optimize query. IN batch fetching does not perform as well as the others with this larger result set. It performs better than I expected, given it has huge IN statement and has to execute 10 queries per relationship. Note that these results differ from the previous post that showed IN batch fetching performing the best for queries with small result sets.

The second run uses the complex query which fetches 9 different relationships.

The results show batch fetching having about 2x the performance of join fetching, and 20x the performance of the non-optimize query. Join fetching still performs 10x faster than the non-optimize case, which is different than the small result set run which gave it worse performance than the non-optimized query. IN batch fetching again did not perform as well as JOIN and EXISTS batch fetching, but still out performed join fetching and was 15x faster than the non-optimized query.

Note that these results are not universal. Expect that every database, every machine, every environment, every query, and every object model will give different results. The basics should be the same though, batch fetch should have better performance than non-optimized queries, and better performance than join fetching for objects with complex relationships. IN batch fetching will perform worse for large result sets, but have similar performance for small result sets. Join fetching will perform well for objects with a small number of relationships.

To see how the results differ in different environments, I did a few more runs on different databases. The next run is for a local Oracle 10g database installed on my desktop. This database is slower than the new server, but will not require a network trip since it is on the same machine as the Java client.

These results show similar results for the previous simple run, but about a 30x improvement over the non-optimize query, which is a bigger difference. The JOIN batch fetch did not seem to perform as well as the EXISTS or join fetch.

The complex run only completed a single run in the 10 minutes for the non-optimized query. The JOIN and EXISTS batch fetching performed the best, over 100x faster than the non-optimized query. Join fetching and IN batch fetching did not perform as well, but were both still over 30x faster than the non-optimized query.

Monday, August 9, 2010

Probably the biggest impedance mismatch between object-oriented object models and relationaldatabase data models, is the way that data is accessed.

In a relational model, generally a single big database query is constructed to join all of the desired data for a particular use case or service request. The more complex the data, the more complex the SQL. Optimization is done by avoiding unnecessary joins and avoiding fetching duplicate data.

In an object model an object or set of objects are obtained and the desired data is collected by traversing the object's relationships.

With object relation mapping (ORM/JPA) this typically leads to multiple queries being executed and typically the dreaded "N queries" problem, or executing a separate query per object in the original result set.

Big database query

You would then need to group and format this data to get all the phone numbers for each employee to display.

ID

Name

Address

Phone

6578

Bob Jones

17 Mountainview Dr., Ottawa

519-456-1111, 613-798-2222

7890

Jill Betty

606 Hurdman Ave., Ottawa

613-711-4566, 613-223-5678

...

The corresponding object model defines Employee, Address, Phone classes.
In JPA, Employee has a OneToOne to Address, and a OneToMany to Phone, Phone has a ManyToOne to Employee.

To display employee data including address and phone for all part-time employees you would have the following JPQL,

Simple JPQL

Select e from Employee e where e.status = 'Part-time'

To display this data you would write the Employee data, get and write the Address from the Employee, and get and write each of the Phones. The displayed result is of coarse the same as using SQL, but the SQL generated is quite different.

ID

Name

Address

Phone

6578

Bob Jones

17 Mountainview Dr., Ottawa

519-456-1111, 613-798-2222

7890

Jill Betty

606 Hurdman Ave., Ottawa

613-711-4566, 613-223-5678

...

N+1 queries problem

SELECT E.* FROM EMPLOYEE E WHERE E.STATUS = 'Part-time'
... followed by N selects to ADDRESSSELECT A.* FROM ADDRESS A WHERE A.ADDRESS_ID = 123SELECT A.* FROM ADDRESS A WHERE A.ADDRESS_ID = 456
...
... followed by N selects to PHONESELECT P.* FROM PHONE P WHERE P.OWNER_ID = 789SELECT P.* FROM PHONE P WHERE P.OWNER_ID = 135
...

This will of coarse have very pathetic performance (unless all of the objects were already in the cache). There are a few ways to optimize this in JPA. The most common method is to use join fetching. A join fetch is where an object, and its related objects are fetched in a single query. This is quite easy to define in JPQL, and is similar to defining a join.

SQL for JPQL with join fetch

The code to display the Employee results is the same, the objects are just loaded more efficiently.

JPA only defines join fetches using JPQL, with EclipseLink you can also use the @JoinFetch annotation to have a relationship always be join fetched. Some JPA providers will always join fetch any EAGER relationship, this may seem like a good idea, but is generally a very bad idea. EAGER defines if the relationship should be loaded, not how it should be accessed from the database. A user may want every relationship loaded, but join fetching every relationship, in particular every ToMany relationships will lead to a huge join (outer joins at that), fetching a huge amount of duplicate data. Also for ManyToOne relationships such as parent, owner, manager where there is a shared reference (that is probably already in the cache), join fetching this duplicate parent for every child will perform much worse than a separate select (or cache hit).

JPQL does not allow the aliasing of the join fetch, so if you wish to also query on the relationship, you have to join it twice. This is optimized out to a single join for ToOne relationships, and for ToMany relationships you really need the second join to avoid filtering the object's related objects. Some JPA providers do support using an alias for a join fetch, but the JPA spec does not allow it, and EclipseLink does not support this as of yet (but there is a bug logged).

Nesting join fetches are not directly supported by JPQL either (there is also a bug for this). EclipseLink supports nested join fetches through the Query hints "eclipselink.join-fetch" and "eclipselink.left-join-fetch".

Nested join fetch query hint

query.setHint("eclipselink.join-fetch", "e.projects.milestones");

Join fetching is fine, and the best solution in many use cases, but it is a very relational database centric approach. Another, more creative and object-oriented solution is to use batch fetching. Batch fetching is much harder for the traditional relational mindset to comprehend, but once understood is quite powerful.

JPA only defines join fetches, not batch fetches. To enable batch fetching in EclipseLink the Query hint "eclipselink.batch" is used, in our example this would be,

Batch fetch query hint

In a batch fetch the original query is executed normally, the difference is how the related objects are fetched. Once the employees are retrieved and their first address is accessed, ALL of the addresses for ONLY the SELECTED employees are fetched. There are several different forms of batch fetching, for a JOIN batch fetch the SQL will be,

The first observation is that 3 SQL statements occurred instead of 1 with a join fetch. This may lead one to think that this is less efficient, but it actually is more efficient in most cases. The difference between 1 and 3 selects is pretty minimal, the main issue with the unoptimized case was that N selects were executed, which could be 100s or even 1000s.

The main benefit to batch fetching is that only the desired data is selected. In the join fetch case, the EMPLOYEE and ADDRESS data were duplicated in the result for every PHONE. If each employee had 5 phones numbers, 5 times as much data would be selected. This is true for any ToMany relationship and becomes exasperated if you join fetch multiple or nested ToMany relationships. For example if an employee's projects were join fetched, and the project's milestones, for say 5 projects per employee and 10 milestones per project, you get the employee data duplicated 50 times (and project data duplicated 10 times). For a complex object model, this can be a major issue.

Join fetching typically needs to use an outer join to handle the case where an employee does not have an address or phone. Outer joins are general much less efficient in the database, and add a row of nulls to the result. With batch fetching if an employee does not have an address or phone, it is simply not in the batch result, so less data is selected. Batch fetching also allows for a distinct to be used for ManyToOne relationships. For example if the employee's manager was batch fetched, then the distinct would ensure that only the unique managers were selected, avoiding the selecting of any duplicate data.

The draw backs to JOIN batch fetching is that the original query is executed multiple times, so if it is an expensive query, join fetching could be more efficient. Also if only a single result is selected, then batch fetching does not provide any benefit, where as join fetching can still reduce the number of SQL statements executed.

There are a few other forms of batch fetching. EclipseLink 2.1 supports three different batch fetching types, JOIN, EXISTS, IN (defined in the BatchFetchType enum). The batch fetch type is set using the Query hint "eclipselink.batch.type". Batch fetching can also be always enabled for a relationship using the @BatchFetch annotation.

Batch fetch query hints and annotations

query.setHint("eclipselink.batch.type", "EXISTS");

@BatchFetch(type=BatchFetchType.EXISTS)

The EXISTS option is similar to the JOIN option, but the batch fetch uses an exists and a sub-select instead of a join. The advantage of this is that no distinct is required, which can be an issue with lobs or complex queries.

The IN option is quite different than the JOIN and EXISTS options. For the IN option the original select is not included, instead an IN clause is used to filter the related objects just for the original object's id. The advantage of the IN option is that the original query does not need to be re-executed, which can be more efficient if the original query is complex. The IN option also supports pagination and usage of cursors, where as the other options do not work effectively as they must select all of the related objects, not just the page. In EclipseLink IN also makes use of the cache, so that if the related object can be retrieved from the cache, it is, and is not included in the IN, so only the minimal required data is selected.

The issues with the IN option is that the set of ids in the IN can be very big, and inefficient for the database to process. If the set of ids is too big, it much be split up into multiple queries. Also composite primary keys can be an issue. EclipseLink supports nested INs for composite ids on databases such as Oracle that support it, but some databases do not support this. IN also requires the IN part of SQL to be dynamically generated, and if the IN batch sizes are not always the same, can lead to dynamic SQL on the database.

SQL for batch fetch (IN)

Batch fetching can also be nested, by using the dot notation in the hint.

Nested batch fetch query hint

query.setHint("eclipselink.batch", "e.projects.milestones");

Something that batch fetching allows that join fetching does not is the optimal loading of a tree. If you set the @BatchFetch annotation on a children relationship in a tree structure, then a single SQL statement will be used for each level.

So, what does all of this mean? Well, every environment and use case is different, so there is no perfect solution to use in all cases. Different types of query optimization will work better in different situations. The following results are provided as an example of the potential performance improvements from following these approaches.

The results were obtained running in a single thread in JSE accessing an Oracle database over a local network on low end hardware. Each test was run for 60 seconds, and number of operations recorded. Each test run was run 5 times. The high/low results were rejected and the middle 3 results average, the % standard deviation between the runs in included. The numbers themselves are not important, only the % difference between the results.

The source code for this example and comparison run can be found here.

The example performs a simple query for Employees using an unoptimized query, join fetching and each type of batch fetching. After the query each employee's address and phone numbers are accessed. The JPQL NamedQuery queries a small result set of 6 employees by salary from a tiny database of 12 employees.

Simple run (fetch address, phoneNumbers)

Query

Average (queries/minute)

%STD

%DIF (of standard)

standard

5897

0.5%

0

join fetch

14024

1.1%

+137%

batch fetch (JOIN)

11190

4.5%

+89%

batch fetch (EXISTS)

13764

0.4%

+133%

batch fetch (IN)

14341

0.6%

+143%

From the first run's results it seems that join fetching and batch fetching were similar, and about 1.9 to 2.4 times faster than the non optimized query. IN batch fetching seemed to perform the best, and JOIN batch fetching the worst for this small data set.

The first run was kind of simple though. It only fetched two relationships. What happens when more relationships are fetched? This next run fetches all 9 of the employee relationships, including OneToOnes, OneToManys, ManyToOnes, ManyToManys, and ElementCollections.

The second run results show that as more relationships are fetched, join fetching starts to have major issues. Join fetching actually had worse performance that the non optimized query (-22%). This is because the join becomes very big, and a lot of data must be processed. Batch fetching on the other hand did even better than the simple run (2.3 to 2.7x faster). IN batch fetching still performed the best, and JOIN batch fetching the worse.

The amount of data is very small however. What happens when the size of the database and the query are scaled up? I will investigate that next.

We provide consulting, development services, and products in the field of intelligence automation.
Intelligence automation projects are complex and unpredictable, if your organization is in development,
or investigating an automation project, we can provide advice, development services and sub-contracting.