This case study describes the analysis, design, implementation and deployment of a distributed Java EE 6 application that makes use of the EJB 3.1, JPA 2.0, JSF 2.0, Servlet 3.0 and JAX-RS API implemented as part of the Oracle GlassFish Server 3.1 distribution. The Java IDE used for this tutorial is the tightly integrated SUN NetBeans 7.0 release that will be in beta until April 2011 but we are also able to use Eclipse Helios 3.6 as well.

This tutorial also demostrates use of alternate client and presentation frameworks that include any number of concurrent distributed Java SE clients that get, process and submit work units to the central server via a remote stateless session bean instance.

Why distributed? We need to investigate the concurrent behavior and exception handling of a near-real-world hammering of a server based JPA application from multiple clients. We need an architecture that induces contention for shared memory (as either static variables or shared database records). Specifically I am interested in how we implement 2-phase commit, handle OptimisticLockExceptions and design for a mix of transaction types involving REQUIRES(default)|REQUIRES_NEW|NOT_REQUIRED - which happen to be the only types supported by EJB 3.1 @Asynchronous beans. We may also integrate different isolation levels.

We will be concentrating on how to leverage the features of JPA 2.0 that are implemented by EclipseLink 2.x. These features should include...

Document History

Date

Author

Version Description & Notes

20110209

Michael O'Brien

1.0 Initial draft starting

Source

Work in progressThis implementation has not been fully completed yet - however it is fully functional using manual deployment of the remote clients.

See the ongoing enhancement request tracking bug 337037 with diffs and some exported EAR and JAR archives.

See the latest version of the source for the EAR, EJB jar, WAR, entities jar and SE client jar at the EclipseLink Examples SVN repository.

Technology Summary

Technology Statement: Develop a n:1 distributed application with many clients connected to one central persistence server

Normally we do not decide on what APIs will be in use before we analyse the requirements. However, here is the list of technologies we are using - as we finalize the implementation.

JPA 2.0 : All database interaction will be on the main server via a container managed @PersistenceContext on EJB session beans. The clients will modify detached entities and return them to the server for merging/persistence.

JTA : We will continue to use container managed transactions via the dependency injected proxy so we do not have to manage transaction events ourselves

EJB 3.1 : We will be using @Stateless or @Stateful (depending on our level of conversational state) @Remote beans but may be using @Singleton and/or container-managed JTA persistence units in the WAR for @Local beans

We will likely require the use of @Asynchronous methods or beans to enable greater parallism

Part of our strategy of handling OptimisticLockExceptions may involve @Singleton beans.

JSF 2.0 : We will use the existing @ManagedBean and new .XHTML controller/view separation pattern

JAX-RS 1.1 : The ability to to get/put/delete/update operations on URL resources will be required

JMS : An external message consumer will be used to do asynchronous operations such as collating data

JNI :(possible optimization via C++ using either IA32/64, SSE or even CUDA) - where the Java client is just the wrapper around the computation engine.

We will not be using an L2 cache such as Coherence, ehCache or Terracotta at this point - we will be communicating using standard EJB beans such as session or message-driven beans.

Here is a screen capture of the current state of our UI development for this Java EE application. On the left we have a brute force liveAJAX client connected to a standard Servlet, on the right we have a Java EE 6 JSF 2.0 .xhtml client. Both are backed by a @ManagedBean injected with a @EJB session bean that is injected with a @PersistenceContext.

Problem

Instead of the usual Employee demo or even the simple entity/jsp format of previous JPA tutorials - we will attempt at providing a usefull distributed java application that could be deployed to a live server that would be hosted outside our firewall.

We require a real-world distributed app that can be used as a case study for the following issues.

- performance (we need a way to hammer a JPA based server and change the client load at runtime)

- management (test framework to try out central management of the server and clients)

- distributed memory (for clients that are not running persistence on their own - where they would benefit from an L2 cache like Coherence, ehCache or Terracotta - we need to prototype scenarios where the database or EclipseLink L1 in-memory cache can act as a distributed shared memory for the remote clients).

Specifically, how do we propagate changes from some clients to others using both a

1) Star network of ManyToOne for clientsToServer

2) Mesh network of ManyToMany for clientsToClients

We will develop a n:1 distributed application with many clients connected to one central persistence server.

20110226: After working on this for a couple weeks I realized that I was re-inventing MapReduce - originally developed by Google - where a work unit is mapped to a distributed network of processors (possibly recursively) and then reduced back into a single solution by merging the results of the mapped sub-problems. However, this distributed system is more complicated and specializes in continuous packet distribution and collation.

Our selected real-world problem is a type of Blue-Sky algorithm. However, in reality it can be regarded as a kind of toy problem or other easily parallizable problem like the Mandelbrot set. For example, in the performance section below we illustrate the proof that distributing the calculations as evenly as possible over all the cores in an individual node - generates significant almost O(n) performance gains. In the graph below of several performance runs on an Intel Core i7-920 we decreased an 800 second zoom calculation to 67 seconds by using up to 512 threads for a problem size of 1024 lines.

How can we help prove the Collatz conjecture (or all integer paths lead to 1).

The collatz problem presents us with several attributes that are very helpful in solving concurrency issues.

1) SIMD: Each calculation of an individual collatz sequence is independent of any other - it can be done in parallel - however the threads are not synchronized and are therefore a type of MIMD processing.

2) Asynchronous threads: different calculation times for different data sets requires a thread scheduler.

3) Shared Memory: Optimization requires data sharing between threads

Collatz Numbers

Actually, since Collatz cannot be solved - it is a "research problem" - Richard Bellman and Donald K. Knuth.

In the interest of advancing science - specifically the science of very large (and I mean very large) as in near googol class numbers and their sequences.

The Collatz conjecture or (3n + 1) problem has not been proven yet. There have been attempts at verifying collatz up to 2^61 - however, massive amounts of scalar processing power is required to do this because the problem is non-linear and therefore must be brute force simulated even with optimizations.

The algorithm is as follows for the set of positive integers to infinity.

odd numbers are transformed by 3n + 1

even numbers are divided by 2

all numbers eventually reach the sequence 4-2-1

The Collatz Conjecture stetes that all sequences end in 1 - we just cannot prove this yet without brute force simulation - this is the goal of this search and this distributed application.

If you think in base 2, we see that for odd numbers we shift bits to the left, add the number to the result and set bit 0. For even numbers we shift bits to the right. We therefore have a simplified algorithm as follows.

odd: next binary = number << 1 + number + 1
even: next binary = number >> 1
or the following combined odd + even rule where we do both steps at once
odd: next binary = number >> 1 + number + 1
- this result I found is sort of odd and surprising as it differs only in the direction of the shift.

Here is an example of the sequence for number 27.

This number reaches a maximum of 9232 during a path of 110 before it reaches the terminating sequence 4-2-1.

Here is the graph of the sequence for 670,617,279 with a path of 986 and a maximum of 966,616,035,460

Observation 1: the maximum value remains at or around 2x the number of bits in the start number - at least so far in my own simulations up to 640 billion.

We stop iteration and record the max path and max value when the sequence enters the 4-2-1 loop after the first 1 is reached. This sequence must be simulated for all positive integers up to the limit of the software being used. Fortunately, in Java (and .NET3) we can use BigInteger which supports unlimited length integers - as we would quickly overflow using a 64 bit long as soon as we started iterating numbers over 32 bits.

Requirements

R0: Unbounded Scalar Precision

Actually essentially unbounded integer precision is required - but if we (my research (the R in R&D) division at Oracle anyway) are going to persist something we need a set column size. I think we are safe with 256 or 512 bit precision for now.

Java (and lately .NET and android). We could use the more efficient BitSet for binary operations but it won't help us because the bit length is fixed at 64 bits - we need at least 256. We will also need a conversion strategy for persisting unlimited numbers into limited length NUMERIC database fields.

The Long datatype in Java and the corresponding __int64 datatype in C/C++ (Visual Studio 10) and the BIGINT datatype in SQL - all overflow at 64 bits which can address an Exabyte or represent the unsigned scalar 10^18 which is 18,446744,073709,551616 or 18 Quintillion.

R1: Increased Superscalar Performance

We need to get better performance from a group of separate JVM's running in parallel on the same or different machines that we would get from a single instance of the client.

The impact of distribution and processing of the client data packets should incur very little overhead on the central processing server.

However the possiblity for shared memory contention (the current maximums) will be fierce - and will require a strategy for handling OptimisticLockExceptions when attempting to update the same record in the database (where the version field will be different as a result of out of order execution).

R1: Local Client Access

JSF browser based console will be developed

R2: Remote RMI Client Access Inside Firewall

EJB 3.x remote session beans will be available

R3: Remote WebService Client Access Outside Firewall

We will implement this by generating a WSDL from the JPA model and exposing a web service facade around the EJB 3.x session beans

R4: Browser based Interface to Client Data on Server

We will implement this using JSF 2.0 to start.

R5: Separation of components and concerns

The data model in the form of a JPA persistence context will be in a separate JAR project allowing us to share the model among the EJB beans, the WAR web project and the distributed clients of the business layer that includes the SE clients, the web services clients and any JMS client listeners.

R6: Full abstraction of the database

We will use JPA 2.0 to manage the persistence of the model.

R7: JEE6 API usage

Where available we will leverage any JEE6 features that help our implentation

R: Remote Update

We need some sort of utility that will remote update all the client code (includes client classes and EJB session bean interfaces).

We will likely use Java Web Start to initially download the SE client and to keep it current.

R: Thread Modulation

A way to reduce the processor load of individual clients would be very usefull in allowing the overall distributed system to be throttled down (likely with wait states). We would perform a process very similar to PWM (pulse width modulation) - used for example in brightness control of LED systems by varying the on time square wave of a signal. In our case will could increase the thread wait/suspend time from 0=default to something like 60 sec.

Analysis

Like all architectural projects - we will proceed in 3 phases.

Develop the API

Optimize Performance

Optimize Volumetrics

This collatz application is an example of an Embarassingly Parallel Problem.

The solution or simulation of this problem is easily described by a SIMD (Single Instruction Multiple Data) architecture - where each thread can run independently using the same algorithm on its own data set. There is however a part of this problem that requires synchronization between the threads - the determination of the global maximums.

Data Model

The following UML class diagram details the data model for the business objects. We will be using JPA entities and mappedsuperclass artifacts.

Initially I used aggregation and a unidirectional @OneToOne from a Maximum or Path entity to a CollatzRecord entity to differentiate value maximums from path maximums (IE: for start #27, the value maximum is 9232 and the path maxiumum is 110 iterations).

After some simulation it became apparent that the schema needs an inheritance model where PathRecord and MaximumRecord should subclass from CollatzRecord instead.

Shared Memory

AI1: Unidirectional or Bidirectional communication between clients and server

At this point we will be implementing a protocol similar to stateless HTTP where each client requests from or posts resources to the server. The server does not initiate communication - it only responds to clients.

Alternatively we will likely add JMS listener registration where the server will post messages to clients and the clients that subscribe to the JMS queue may choose to asynchronously respond to the message.

AI2: Synchronous or Asynchronous access to session beans from clients

We have the choice of getting a reference to a remote session bean and holding that reference for the duration of the client work packet until we return results to the server. Or, we can perform separate calls to separate references to get and put the work unit. It will depend on the length of time to process the unit, the bean lifetime and how many beans are in the server pool.

AI3: JEE6 Technology State for major EE Servers

As we will be deploying at least one implementation to one of the major EE servers - we need some selection criteria.

Using all the physical cores on a multicore processor significantly increases performance. For example I get around a 350% speedup if I use all four physical cores of an Intel Corei7 processor. Using the other 4 hyperthreaded cores starts to slow down all the cores significantly though.

In order to use the cores of a system we can either run multiple instances of our client code or we can spawn multiple threads from a single application - as long as we use a 1:1 ration of threads to physical cores.

We need to answer the question - should I use the hyperthreaded cores as well?

WebLogic 10.3.5.0

Oracle WebLogic 10.3.4.0 was released on 15 Jan 2011 with WebLogic 10.3.5.0 release on 15 May 2011, the following list of JEE6 APIs are implemented on top of its JEE5 certification.

JBoss 6

JBoss JEE6 Functionality

AI4: Network Topology

Use Cases

UC1: Request unit of work

UC2: Post completed unit of work

This use case is where most of our concurrency issues will arise. If the period of the work unit is small enough and we start getting results returned to the server at less than one per second we will see a lot of OptimisticLockExceptions when accessing shared memory (or records) because the value may have been modified in the short time between a read/update by another thread. We see this in production if the period is less than 15 bits.

The solution to this will likely be any of EJB 3.1 @Asynchronous methods or beans, use of sychronized blocks, use of @Stateful beans or some sort of retry mechanism.

Variant Use Cases

UC101: Communication Errors

UC101.1: RMI Host not available

UC101.1: RMI Bean not available

UC101.1: RMI Host busy

UC102: Handle discarded unit of work

Algorithm Optimization

Brute force simulation does not work when trying to prove Collatz. We need to with overly optimistic enthusiasm apply what we know about the behavior of hailstone numbers.

Even numbers don't reach milestones - especially powers of 2 which reach 1 in the fastest time possible log(2)n (kind of the opposite of milestones)

path sequences repeat - we can lookup parts of the current path/orbit based on already completed sequences in the solution tree

O1: Optimization by Truncation

Assumption: This optimization depends on whether we need to actually compute the paths and maximums for a range of values 'below a higher range that jas just found new maximum value and maximum path attributes - rendering our current lower search kind of irrelevant. Except in the case where the sub-path is required for other types of optimization.

I have determined - via a week of simulation distributed among 16 different machines in parallel - that we will need native computation.

Lets put things in perspective:

With brute force Java on around 8 parallel JVM's I can search around 1 million (~2^20) number sequences per second. At this rate, in order to search past the current record at 64 bits I would need 2 ^ (64-20) seconds = 2^44 seconds. Since there are about 31.5 million seconds in a year - or roughtly (2^25) - I would still need 2 ^ (44-25) = 2^19 years. This works out to just over half a million years.

Obviously I need to increase the efficiency of my search and/or incorporate x86/SSE/GPU native scalar C/C++ optimized DLLs and link to them via JNI. I require an increase of at least 6 orders of magnitude - likely 3 orders of magnitude will need to be in minimizing the search path by keeping track of past paths keyed by start number in a HashMap.

Contrary to traditional computer science doctrine - every software developer benefits from being architecture-aware. Knowledge of the underlying hardware, operating system and implementation language is required. For example...

1) If you are running directly on a multi-core (quad-core) machine as opposed to a virtual machine (cloud) image - you will be able to take advantage of the parallelism available in the former (direct-OS) but not the latter (cloud) without replicating the cloud instances.

2) When we develop in Java - a knowledge of the fact that we are actually running compiled C/C++ machine code will aide us in optimizing for word boundaries. An example is the speed up of any use of Long (64-bit words) on 64-bit native operating systems like Windows 7

Design

DI 1: Distributed Communication Strategy

How are we going to link the distributed clients? Are we linking the one-to-many or many-to-many where all clients communicate with each other (which would necessitate multiple EE servers).

1) Multiple SE clients linked to multiple SE clients - non EE

2) Multiple EE clients linked to multiple SE clients - overhead

3) Multiple SE clients linked to multiple EE clients - possible

4) Multiple EE clients linked to multiple EE clients - complex

5) Multiple SE clients linked to single SE server - non EE

6) Multiple SE clients linked to single EE server - in use

This model is the most promising and offers the least overhead. The SE clients will get and post work packets. If any user or admin needs to check the data they can do so via a browser based interface to the server.

7) Multiple EE clients linked to single SE server - invalid

8) Multiple EE clients linked to single EE server - possible

DI 2: Module Separation

All code should be separated by functionality

Model layer: JPA persistence Entitities/MappedSuperclasses/Embeddables should be in a separate model jar (with no persistence.xml)

The model layer ideally has 2 jars (one with entity interfaces), the other with the actual entities and their possible mapped superclasses and embeddables.

Business layer: The business objects (Session Beans) should be in a separate ejb jar and their interfaces (only) need to exported to clients (not the SSB implementation class). Why? because clients will only be interacting with the instrumented $proxy of the session bean - not the bean itself (which is a field of the server proxy)

The business layer ideally has 2 jars (one with the business interface classes) and one with the business implementation classes.

Presentation layer: The JSF managed and backing beans should be separate from both the model (entities) and the implementation (session beans) - these are delegates of the controller servlet (FacesServlet) - which implements the FrontController design pattern.

Stateful or Stateless Session Beans

Whether we use @Stateful' or @Stateless session beans will depend on whether we have a conversational message exchange between our clients and server. If our operation to get or post results to or from the server is atomic then a @Stateless session bean is sufficient. However, if our business process is conversational and spans multiple message calls or even multiple calls to multiple resources (using the XA 2-phase commit pattern) - then a @Stateful session bean is required.

DI 3: Remote RMI/EJB Communication type

Remote Session Beans on WebLogic 10.3.4.0

Remote Session Beans on GlassFish 3.1

DI 4: Type of client/server setup

1) multiple SE clients communicate to a single EE server

2) multiple EE clients communicate to a single EE server

Decision DI4:

We will be using 1) and only run a single EE server with multiple SE clients

The core of this application is the use of BigInteger Math package library which allows us to use arbitrary length integers during scalar computation. The underlying implementation of BigInteger is not the native long 64-bit datatype which would cause overflow. The ArrayList is used to represent the BigInteger digits.

There is an issue that occurs when an BigInteger is persisted to a database. Depending on the database (in this case Derby XA) and the JPA persistence provider (in this case EclipseLink - but we tested Hibernate as well) - the BigInteger will get truncated into a fixed size numeric field.

The issue is that any BigInteger that is greater than 63 bits cannot currently be stored in a NUMERIC field on a database without an overflow . This maximum number is represented by 10^19 or 9 quintillion.

We encounter these very large 10^19 or 9,223,372,036,854,775,808 numbers regularly in the following scenarios - we need a persistence strategy for users that wish to use them with JPA.

- factorials greater than 50 (or # of ways to order more than 50 objects)

In the above scenarios - scalar truncation must not be done by using FLOAT or DOUBLE types as the mantissa is also limited to 23 digits.

/** Maximum BigInteger that can be stored in SQL field NUMERIC = 0x7fffffffffffffffL or
* 2^63 or 10^19 or 9,223,372,036,854,775,808 or 9 Quintillion.
* Numbers greater than this are encountered in scientific, cryptographic and nanosecond time sensitive calculations.
*/privatestaticfinalLong MAX_BIGINTEGER_IN_SQL =Long.MAX_VALUE;

This issue is independent of the JVM used whether 32 or 64 bit. The issue is related to the size of a Long in Java which is 64 bits.

Results DI5:

If we use JPA out of the box to persist a BigInteger that is larger than 64 bits like the maximum value for collatz path #88 with start 1,980,976,057,694,848,447 @61 bits and maximum 64,024,667,322,193,133,530,165,877,294,264,738,020 @125 bits - found by Tomás Oliveira e Silva and verified by Eric Roosendaal (which just happens to be the first maximum where the max bits is more than twice the start bits).

INFO: New max value: 1414236446719942480
INFO: Hibernate: update Parameters set bestIterationsPerSecond=?, globalDuration=?, globalStartTimestamp=?, maxPath=?, maxValue=?, nextNumberToSearch=?, partitionLength=?, version=? where id=? and version=?
WARNING: SQL Error: -1, SQLState: 22003
SEVERE: The resulting value is outside the range for the data type DECIMAL/NUMERIC(19,2).
SEVERE: Could not synchronize database state with session
org.hibernate.exception.DataException: could not update: [org.dataparallel.collatz.business.Parameters#32768]
at org.hibernate.exception.SQLStateConverter.convert(SQLStateConverter.java:77)
at org.hibernate.exception.JDBCExceptionHelper.convert(JDBCExceptionHelper.java:43)
at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:2425)
at org.hibernate.persister.entity.AbstractEntityPersister.updateOrInsert(AbstractEntityPersister.java:2307)
at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:2607)
at org.hibernate.action.EntityUpdateAction.execute(EntityUpdateAction.java:92)
at org.hibernate.engine.ActionQueue.execute(ActionQueue.java:250)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:234)
at org.hibernate.engine.ActionQueue.executeActions(ActionQueue.java:142)
at org.hibernate.event.def.AbstractFlushingEventListener.performExecutions(AbstractFlushingEventListener.java:298)
at org.hibernate.event.def.DefaultFlushEventListener.onFlush(DefaultFlushEventListener.java:27)
at org.hibernate.impl.SessionImpl.flush(SessionImpl.java:1000)
at org.hibernate.impl.SessionImpl.managedFlush(SessionImpl.java:338)
at org.hibernate.ejb.AbstractEntityManagerImpl$1.beforeCompletion(AbstractEntityManagerImpl.java:523)
at com.sun.enterprise.transaction.JavaEETransactionImpl.commit(JavaEETransactionImpl.java:412)
at com.sun.enterprise.transaction.JavaEETransactionManagerSimplified.commit(JavaEETransactionManagerSimplified.java:837)
at com.sun.ejb.containers.BaseContainer.completeNewTx(BaseContainer.java:5040)
at com.sun.ejb.containers.BaseContainer.postInvokeTx(BaseContainer.java:4805)
at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:2004)
at com.sun.ejb.containers.BaseContainer.postInvoke(BaseContainer.java:1955)
at com.sun.ejb.containers.EJBObjectInvocationHandler.invoke(EJBObjectInvocationHandler.java:208)
at com.sun.ejb.containers.EJBObjectInvocationHandlerDelegate.invoke(EJBObjectInvocationHandlerDelegate.java:75)
at $Proxy199.postUnitOfWork(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.sun.corba.ee.impl.presentation.rmi.ReflectiveTie.dispatchToMethod(ReflectiveTie.java:146)
at com.sun.corba.ee.impl.presentation.rmi.ReflectiveTie._invoke(ReflectiveTie.java:176)
at com.sun.corba.ee.impl.protocol.CorbaServerRequestDispatcherImpl.dispatchToServant(CorbaServerRequestDispatcherImpl.java:682)
at com.sun.corba.ee.impl.protocol.CorbaServerRequestDispatcherImpl.dispatch(CorbaServerRequestDispatcherImpl.java:216)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.handleRequestRequest(CorbaMessageMediatorImpl.java:1841)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.handleRequest(CorbaMessageMediatorImpl.java:1695)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.handleInput(CorbaMessageMediatorImpl.java:1078)
at com.sun.corba.ee.impl.protocol.giopmsgheaders.RequestMessage_1_2.callback(RequestMessage_1_2.java:221)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.handleRequest(CorbaMessageMediatorImpl.java:797)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.dispatch(CorbaMessageMediatorImpl.java:561)
at com.sun.corba.ee.impl.protocol.CorbaMessageMediatorImpl.doWork(CorbaMessageMediatorImpl.java:2558)
at com.sun.corba.ee.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.performWork(ThreadPoolImpl.java:492)
at com.sun.corba.ee.impl.orbutil.threadpool.ThreadPoolImpl$WorkerThread.run(ThreadPoolImpl.java:528)
Caused by: java.sql.SQLDataException: The resulting value is outside the range for the data type DECIMAL/NUMERIC(19,2).
at org.apache.derby.client.am.SQLExceptionFactory40.getSQLException(Unknown Source)
at org.apache.derby.client.am.SqlException.getSQLException(Unknown Source)
at org.apache.derby.client.am.PreparedStatement.executeUpdate(Unknown Source)
at org.hibernate.persister.entity.AbstractEntityPersister.update(AbstractEntityPersister.java:2407)
... 37 more
Caused by: org.apache.derby.client.am.SqlException: The resulting value is outside the range for the data type DECIMAL/NUMERIC(19,2).

Analysis DI5:

I suspect that it would be simplest to just convert the BigInteger to a string (VARCHAR2) and convert back when reading from the database. There may be a more efficient algorithm that involves partitioning or variable length scalar fields as well.

The reality is that the DDL generation between EclipseLink and Hibernate are different. DDL generation in general should not be used for production. It would be better to fine tune the table creation myself.

The DDL generation should pick up the column length annotation attribute though

TypeConverter

We will be using a @TypeConverter which is provided beyond the JPA specification using native EclipseLink ORM.

Use of a TypeConverter (not an ObjectTypeConverter) may be one option. We would map the BigInteger type to a String which could be stored in a column that is larger than the current 128 bit (16 byte) lenght of NUMERIC. This would eventually hit a maximum of 1-2K, where if we represented each bit as a 0 or 1 byte we could at least represent 1024 bits with this strategy.

DI 6: EAR Redeploy should not affect remote clients

If the server application is temporarily down due to a redeploy - it should not affect clients.

The fix is to catch the NoSuchObjectException and perform a series of timed re-posts to the session bean.

Analysis DI6:

Error on the remote client is as follows when the server app is hot-redeployed (without clustering) at the same time as the client is pushing a data post to the server.

java.rmi.NoSuchObjectException: The object identified by:'312' could not be found. Either it was has not been exported or it has been collected by the distributed garbage collector.
at weblogic.rjvm.ResponseImpl.unmarshalReturn(ResponseImpl.java:234)
at weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:348)
at weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:259)
at org.eclipse.persistence.example.distributed.collatz.business.CollatzFacade_of6sps_CollatzFacadeRemoteImpl_1034_WLStub.postUnitOfWork(Unknown Source)
at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at weblogic.ejb.container.internal.RemoteBusinessIntfProxy.invoke(RemoteBusinessIntfProxy.java:85)
at $Proxy0.postUnitOfWork(Unknown Source)
at org.eclipse.persistence.example.distributed.collatz.presentation.SEClient.processUnitOfWork(SEClient.java:216)

Solved by a finite number of repeated lookup operations - without a wait

_collatz: results sent to server after 390 ms
javax.ejb.EJBException: [WorkManager:002917]Enqueued Request belonging to WorkManager default, application org.eclipse.persistence.example.distributed.CollatzEAR is cancelled as the WorkManager is shutdown; nested exception is: weblogic.work.WorkRejectedException: [WorkManager:002917]Enqueued Request belonging to WorkManager default, application org.eclipse.persistence.example.distributed.CollatzEAR is cancelled as the WorkManager is shutdown
weblogic.work.WorkRejectedException: [WorkManager:002917]Enqueued Request belonging to WorkManager default, application org.eclipse.persistence.example.distributed.CollatzEAR is cancelled as the WorkManager is shutdown
at weblogic.rjvm.ResponseImpl.unmarshalReturn(ResponseImpl.java:234)
at weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:348)
at weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:259)
at org.eclipse.persistence.example.distributed.collatz.business.CollatzFacade_of6sps_CollatzFacadeRemoteImpl_1034_WLStub.requestUnitOfWork(Unknown Source)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at weblogic.ejb.container.internal.RemoteBusinessIntfProxy.invoke(RemoteBusinessIntfProxy.java:85)
at $Proxy0.requestUnitOfWork(Unknown Source)
at org.eclipse.persistence.example.distributed.collatz.presentation.SEClient.processUnitOfWork(SEClient.java:161)
at org.eclipse.persistence.example.distributed.collatz.presentation.SEClient.main(SEClient.java:224)
javax.ejb.EJBException: [WorkManager:002917]Enqueued Request belonging to WorkManager default, application org.eclipse.persistence.example.distributed.CollatzEAR is cancelled as the WorkManager is shutdown; nested exception is: weblogic.work.WorkRejectedException: [WorkManager:002917]Enqueued Request belonging to WorkManager default, application org.eclipse.persistence.example.distributed.CollatzEAR is cancelled as the WorkManager is shutdown
at weblogic.ejb.container.internal.RemoteBusinessIntfProxy.unwrapRemoteException(RemoteBusinessIntfProxy.java:124)
at weblogic.ejb.container.internal.RemoteBusinessIntfProxy.invoke(RemoteBusinessIntfProxy.java:96)
at $Proxy0.requestUnitOfWork(Unknown Source)
at org.eclipse.persistence.example.distributed.collatz.presentation.SEClient.processUnitOfWork(SEClient.java:161)
at org.eclipse.persistence.example.distributed.collatz.presentation.SEClient.main(SEClient.java:224)
Caused by: weblogic.work.WorkRejectedException: [WorkManager:002917]Enqueued Request belonging to WorkManager default, application org.eclipse.persistence.example.distributed.CollatzEAR is cancelled as the WorkManager is shutdown
at weblogic.rjvm.ResponseImpl.unmarshalReturn(ResponseImpl.java:234)
at weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:348)
at weblogic.rmi.cluster.ClusterableRemoteRef.invoke(ClusterableRemoteRef.java:259)
at org.eclipse.persistence.example.distributed.collatz.business.CollatzFacade_of6sps_CollatzFacadeRemoteImpl_1034_WLStub.requestUnitOfWork(Unknown Source)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at weblogic.ejb.container.internal.RemoteBusinessIntfProxy.invoke(RemoteBusinessIntfProxy.java:85)
... 3 more

If I reduce the interval for each UnitOfWork from a comfortable 16 to 22 bits down to 8 bits (256 searches) this increases requests to the server to about 5 per second. If we run more than one client we almost immediately get an OptimisticLockException when one of the clients tries to overwrite shared memory (in the Parameters singleton entity). We expect this because of the concurrent nature of our distributed application. We will do a read, evaluate the change compared to our unsaved changes and retry if needed. We may need to do this a couple times - as the window between this manual 2-phase commit operation still has a small window of unmanaged concurrency between the read and write operations.

How do we test for this?

On a separate machine or two - set the search interval very low (like 18 or 16 bits) so we generate request at more than 1 per second.

On the server - set it to debug in Eclipse and set breakpoint on a client also running from eclipse in the catch block.

Now when the remote servers hammer the WebLogic server, eventually the SE client in eclipse will hit the breakpoint where it usally would crash on an unhandled OptimisticLockException.

Client Log Exception

_collatz: Remote Object: ClusterableRemoteRef(1326838513503838804S:10.156.52.246:[7001,7001,-1,-1,-1,-1,-1]:base_domain:AdminServer [1326838513503838804S:10.156.52.246:[7001,7001,-1,-1,-1,-1,-1]:base_domain:AdminServer/322])/322
javax.ejb.EJBException: BEA1-21603518C2783057A4BD: javax.persistence.OptimisticLockException: Exception [EclipseLink-5006] (Eclipse Persistence Services - 2.3.0.qualifier): org.eclipse.persistence.exceptions.OptimisticLockException
Exception Description: The object [org.eclipse.persistence.example.distributed.collatz.model.Parameters@2( id: 2)] cannot be updated because it has changed or been deleted since it was last read.
Class> org.eclipse.persistence.example.distributed.collatz.model.Parameters Primary Key> 2
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.commitToDatabase(RepeatableWriteUnitOfWork.java:623)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.commitToDatabaseWithChangeSet(UnitOfWorkImpl.java:1486)
at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.issueSQLbeforeCompletion(UnitOfWorkImpl.java:3109)
at org.eclipse.persistence.internal.sessions.RepeatableWriteUnitOfWork.issueSQLbeforeCompletion(RepeatableWriteUnitOfWork.java:331)
at org.eclipse.persistence.transaction.AbstractSynchronizationListener.beforeCompletion(AbstractSynchronizationListener.java:157)
at org.eclipse.persistence.transaction.JTASynchronizationListener.beforeCompletion(JTASynchronizationListener.java:68)
at weblogic.transaction.internal.ServerSCInfo.doBeforeCompletion(ServerSCInfo.java:1239)
at weblogic.transaction.internal.ServerSCInfo.callBeforeCompletions(ServerSCInfo.java:1214)
at weblogic.transaction.internal.ServerSCInfo.startPrePrepareAndChain(ServerSCInfo.java:116)
at weblogic.transaction.internal.ServerTransactionImpl.localPrePrepareAndChain(ServerTransactionImpl.java:1316)
at weblogic.transaction.internal.ServerTransactionImpl.globalPrePrepare(ServerTransactionImpl.java:2132)
at weblogic.transaction.internal.ServerTransactionImpl.internalCommit(ServerTransactionImpl.java:272)
at weblogic.transaction.internal.ServerTransactionImpl.commit(ServerTransactionImpl.java:239)
at weblogic.ejb.container.internal.BaseRemoteObject.postInvoke1(BaseRemoteObject.java:625)
at weblogic.ejb.container.internal.StatelessRemoteObject.postInvoke1(StatelessRemoteObject.java:49)
at weblogic.ejb.container.internal.BaseRemoteObject.__WL_postInvokeTxRetry(BaseRemoteObject.java:444)
at weblogic.ejb.container.internal.SessionRemoteMethodInvoker.invoke(SessionRemoteMethodInvoker.java:53)
at org.eclipse.persistence.example.distributed.collatz.business.CollatzFacade_of6sps_CollatzFacadeRemoteImpl.postUnitOfWork(Unknown Source)
at org.eclipse.persistence.example.distributed.collatz.business.CollatzFacade_of6sps_CollatzFacadeRemoteImpl_WLSkel.invoke(Unknown Source)
at weblogic.rmi.internal.BasicServerRef.invoke(BasicServerRef.java:667)
at weblogic.rmi.cluster.ClusterableServerRef.invoke(ClusterableServerRef.java:230)

DI7: Analysis

It would be better that we handle this on the server in the session bean. We can then leverage this single solution regardless of what client we use (RMI/EJB, WebService, JAX-RS).

DI 8: Variable Partition between Different Client Capabilities

We will attempt to use a homogeneos set of distributed processors, however we will need to accomidate processing nodes with a variance of capabilities.

The collatz problem is well suited to parallization because of the relative independence of the calculations on individual sequences. However, if we wish to optimize the algorith so we can reduce the calculation times by an order of magnitude - then we will need to use the symmetry of previos calculations.

Example: a large proportion of numbers greater than 27 will contain the 27:110:9232 record (27=start, 110=sequence path lenght, 9232=maximum value). One optimation would be do abort sequences that would not reach a max path or max value if their current path:max was merged with 27:110:9292 if they hit 27 at any time in their sequence.

Therefore, we will need an evaluation step for new nodes so that we can distribute the appropriate # of UnitOfWork packets so that all the processors work the same amount of time.

DI10:Entity search for WebLogic should not require <class> elements when <jar-file> specified

There may be an issue with entity search in WebLogic 10.3.4.0 when using a separate <jar-file> for entities.

On Glassfish 3, specifying only <jar-file> is sufficient, on WebLogic we need to also specify <class> - this should not be necessary for a managed @PersistenceContext.

In my Java EE 5 projects I always run with explicit <class> elements - whether i am using a managed @PersistenceContext or an un-managed @PersistenceUnit. I have 2 nearly identical projects that use an external jar

to contain the entity classes - that are referenced from a persistence.xml in the separate ejb-jar file.

I turned off <class> elements and deferred to <jar-file> and/or manifest entry - as instructed by the Java EE 5 spec -and our own "Pro JPA 2 p.413". Note: I do not directly reference the EM from the WAR - so I don't need a ref

there.

On GlassFish V3 via NetBeans 6.9 I run fine with the following (no class elements as per spec = OK)

FINER:Class[org.dataparallel.collatz.business.CollatzRecord] registered to be processed by weaver.

However on WebLogic 10.3.4, I have tried everything, relative paths ../lib, /lib, lib etc (there seems to be some difference on whether to state the path to the default EAR/lib dir) and I can only get WebLogic to find the entities if i also list them as class elements. The jar is being found evidently on the classpath - it is just that the entities are not processed unless also listed - which should not be necessary as they are annotated. Need to check an older JPA 1.0 server that does not use the patch jar.

I connected a brute force AJAX client in the form of a JSP page before getting into the AJAX support that ships with JSF 2.0 and quickly realized that I am hammering the server on each request possibly unnecessarily.

Since the current client is only read only and is updated every 200ms by having a @Servlet contact a @EJB that has access to a JTA@PersistenceContext - I don't need to be reading from the database on every request. Fortunately we are using EclipseLink as the JPA 2.0 provider - so we are usually reading from the in-memory L1 cache between database writes from the server. It may be better for us to use a Map based in-memory object that is managed by JPA to avoid the cache hits.

DI 14: 20110518: Optimize JPQL Queries

Some of the queries, especially the ones that are used for web client display in the JSF 2.0 Facelet:ManagedBean combination index.xhtml:MonitorManagedBean.java are naive and not optimized at all. When we start to get a list of thousands of UnitOfWork entities just to be able to get a count for example - we encounter the default 30 seconds database transaction timeout.

DI 15: 20110711: Add JAX-WS 2.2 Web Service Endpoint

DI 110: Volumetrics

We need to track calculation iterations to be able to report on a sort of scalar MIPS

DI 111: Analytics

DI 112: Reporting

DI 113: Management

DI 201: Refactor as Framework

Issue 201

As is normal computer science behavior - I am thinking of rewriting the distributed collatz application as a more generic framework. This way I can distribute differ types of UnitOfWork as an interface for Mandelbrot fractal generation for example. As we all know there are already frameworks out there such as Apache Hadoop which is an implementation of MapReduce. So I did a quick search first (as usually I search at the end of a project) with the terms Fractal+MapReduce. I was a bit shocked at what I found on the 5th link from the technoticles article on Googe, Hadoop and CouchDB.

It looks like I was incorrect. MapReduce is a patented framework from Google now. I know a bit about the history of the framework as a way to reduce the overhead of each development team writing their own distribution and merging of work units. but I did not realize that I might be infringing on a patent by unknowingly writing distributed applications that package and merge pieces of a parallel problem. I modelled the collatz distribute application more on Seti@Home except that the collatz problem really does not have a end - as it never finishes - it just keeps checking packets to infinity or when the electricity goes out.

Therefore in the spirit of disclosure - I am stating that I am not looking at anything related to the patent - I am only following the architecture of the original computer science details on Map + Reduce

Normally I stay away from anything to do with patents - I have never been directly personally affected by anything patent related in my work - but this patent 7,650,331 awarded for "system and method for efficient large-scale data processing" - really scares me. How are we supposed to develop software that breaks up a parallel problem into concurrent pieces without infringing on this patent awarded in Jan 2011?

SE Client

The SE client will need a reference to the EE libraries of the server, here are the locations of the relevant jars. In the case of GlassFish - the gf-client.jar' is a manifest only jar that references the other EE jars by relative paths (so do not move it). In the case of WebLogic - the wlfullclient.jar library must be generated at design-time.

Four 1-core P630 clients (with up to 8 threads - but in practice I run the CPU's at 50% with a single thread)

One 2-core T4400 client (1 thread)

One 2-core E8400 client (1 thread)

One 1-core P520 client (1 thread)

Here is a screencap of the H1 NOC with 13 threads on 8 physical machines.

Configuring TCPIP Traffic between 2 Networks

When you are developing distributed and even multithreaded applications using a separate cluster. This is usually done on a private network like 192.168.n.N. There are a lot of reasons why your cluster may need to be off your corporate network - usually it comes down to the fact that your private router cannot or should not be directly connected to the rest of the network. In most cases your primary development PC will act as a bridge between the corporate and development network and require two network interfaces. This is usually accomplished by adding a 2nd network card to your pc or using a wireless dongle and connecting wirelessly to the private router.

Issues will arise surrounding packet routing when both networks are live. Occassionaly you will luck out and one of the network interfaces will correctly route external HTTP traffic to the right network - but this will not always work because of the variable nature of TCP/IP packet routing tables.

The solution is to override the interface metric setting off the Network properties | Internet Protocol (TCP/IP) | Internet Protocol (TCP/IP) properties | Advanced | Advanced TCP/IP Settings | Automatic metric. Usually this metric is set to checked=automatic and the metric - partially depending on the route length statistic will be variable.

For each physical/wireless network - change this number to manual and enter something like 100 and 200 for the two networks - where a smaller number represents the network that should be used first.

If you set the private network from DHCP to static and set the gateway and DNS servers to your private router then everything will function correctly. Your internet and intranet traffic will go through your corporate gateway and your private cluster traffic 192.168.n.N will go through your private network. You will not get a page not found anymore.

Performance

We need to set a baseline and test various simulations in 2 or more variables so that we get the optimum configuration before we start a large run that could last for months.

Performance Criteria

The following criteria will be used to optimize the distributed performance of the application.

Multicore Usage

All the current CPU's available have 1,2,3 or 4 cores. Some of them are hyperthreaded.

Q) Do we really get N times performance gain if we use up all the hardware cores? The answer is yes below

A) If we look at a dual-core (non-HT) system we get a 76% speedup with a 12% drop in performance per core if both are used.

We see a 12% drop of performance on both hardware cores if both are active. However that also means a 76% increase in performance when both hardware cores are used.

Multithreaded Usage

For each core above we may have hyperthreading available - it was available on the Pentium IV and again on the Corei7-9xx. The question is what is the optimum number of threads? I would expect that we need to determine how many soft cores the machine has (hard + hyperthreaded) and not use more than this number. This would be 2 for a single core P4-630, 2 for a dual core E8400 or T4400, 4 for a quad core Q6600 and 8 for a Corei7-920.

The other question is do we really get a performance gain when using the hyperthreaded cores. Or if I use 2 threads on a single core P-630 or 8 threads on a Corei7-920 what is the performance gain? From initial experimentation it looks like we can get up to a 50% performance increase by using the HT cores. Use of the HT cores will slow down the hard cores as both share cache and ram. Therefore if the algorithm was kept in register memory we would be able to use the superscalar execution queues.

Multicore Analysis

A problem that maps very well to multiple cores and is easily parallelized is the computation of the Mandelbrot set.

The following graph is the result of an experiment where I varied the number of cores used to render each frame of a deep zoom to the limit of double floating point precision. When I run this algorithm as a traditionally single threaded application it takes up to 800 seconds to render a 1024x1024 grid from 1.0 to 1 x 10^-16. However when I start adding threads I see the best speedup when I use the same number of threads as there are 'hard processors (non-hyperthreaded). The performance increase nears it's maximum 8 times increase for an Intel Corei7-920 when I approach a thread/line of 512 threads.

As you can see from the graph, we benefit more from a massive number of threads - as long as they are independent. The Mandelbrot calculation however it not homogeneous - computing the central set requires a lot more iteration than outlying areas. This is why each parallel algorithm must be fine tuned to the problem it is solving. If you look at the screen captures of performance during the runs with various thread counts you will see what I mean. The processor is not being exercised at it's maximum capacity when the bands assigned to particular threads are finished before other threads that are performing more calculations than their peers. If we increase the number of bands - we distribute the unbalanced load among the cores more evenly - at a slight expense of thread coordination/creation/destruction.

Multicore Rendering of Mandelbrot Set

The following runs are on a 1024x1024 grid starting form 1.0 to 0.0000000000000001 that take from 800 to 67 seconds depending on the number of threads used concurrently. Notice that I have a temporary issue with shared variable access between threads - as some of the pixel coloring is off.

As you can see - the processor usage goes from 12% for a single core, through 50% for 8 cores - to 100% for 128+ cores.

Why do we need so many threads? If even one thread takes longer than any other ones that are already completed their work unit - the entire computation is held up. We therefore use more work units than there are threads.

A better algorithm would be to distribute work units asynchronously instead in the current MapReduce synchronous way we currently use. When a thread is finished, it can work on part of the image that is still waiting processing. We would need to distribute work units more like packets in this case.

1 thread on an 8-core i7-920 takes 778 sec

2 threads on an 8-core i7-920 takes 466 sec

16 threads on an 8-core i7-920 takes 138 sec

128 threads on an 8-core i7-920 takes 114 sec

Thread Contention for Shared Resources

For our multithreaded Mandelbrot application - which currently is not @ThreadSafe - we encounter resource contention specific to the Graphics context. This type of contention is the same for any shared resource such as a database. The issue is that setting a pixel on the screen is not an atomic operation - it consists of setting the current color and then drawing the pixel (The Java2D API may require multiple internal rendering steps as well). The result of this is that another thread may change the color of the graphics context before the current thread actually writes the pixel - resulting in noise - or more accurately - Data Corruption.

Note: that no noise or data corruption occurs when we run a single thread. We only get a problem when we run multiple threads concurrently.

color = Mandelbrot.getCurrentColors().get(iterations);
color2 = color;// these 2 lines need to be executed atomically - however we do not control the shared graphics contextsynchronized(color){// this does not help us with drawRect()
mandelbrotManager.getgContext().setColor(color);// drawRect is not atomic, the color of the context may change before the pixel is written by another thread
mandelbrotManager.getgContext().drawRect((int)x,(int)y,0,0);}if(color2 != mandelbrotManager.getgContext().getColor()){System.out.println("_Thread contention: color was changed mid-function: (thread,x,y) "+ threadIndex +","+ x +","+ y);// The solution may be to rewrite the pixel until the color is no longer modified}
_Thread contention: color was changed mid-function:(thread,x,y)2,298,22
_Thread contention: color was changed mid-function:(thread,x,y)15,140,155
_Thread contention: color was changed mid-function:(thread,x,y)15,140,156
_Thread contention: color was changed mid-function:(thread,x,y)15,140,157
_Thread contention: color was changed mid-function:(thread,x,y)15,141,151
_Thread contention: color was changed mid-function:(thread,x,y)2,307,25
_Thread contention: color was changed mid-function:(thread,x,y)15,143,154
_Thread contention: color was changed mid-function:(thread,x,y)15,144,152
_Thread contention: color was changed mid-function:(thread,x,y)13,0,130
_Thread contention: color was changed mid-function:(thread,x,y)11,0,110

The better solution would be designate a host thread that coordinates all the unit of work threads and acts as a single proxy to the GUI - only one thread should update AWT or Swing UI elements - as most of them are not thread safe by design. Multithreaded distributed applications need to be very careful when using GUI elements. For example if I do not introduce at least a 1ms sleep between GUI frames - the entire machine may lock up when 100% of the CPU is given to the calculating threads.

Local vs remote threads

If I use 4 threads on a 4 core chip like the 920 or 4 single threads on 4 separate P630 machines - what kind of gain do I see?

Network Bandwidth

Not an issue.

We are using Gigabit ethernet. So far because of the low data transmission of our application I rarely see the network go above 1%.

Preliminary Performance Numbers

There are 3 networks that I am simulating on (two at work, one at home) with a total of 23 cores available. The standard work packet at this point is 2^23 numbers - or 8,388,608. Processing these 8 million numbers takes a range of 55 to 330 seconds depending on the processor for numbers below 12 billion.

This is my first non-optimized machine language routine for the propeller chip - it is a testament to the tutorial by deSilva that I was running in 2 hours from never having written assembly since the 80386, the 8085(TRS-80 M-100) and the 6809E (TRS-80 COCO).

We get a raw performance of 2.4 million iterations/sec for 32-bit precision scalar arithmetic running machine language at PLL16 or 80Mhz or 20 MIPS per core. We therefore should get close to 19.2 MIPS per chip or 1540 MIPS for a 640 core compute grid.

Management and Reporting

There are formal tools and methods we can use to measure performance and also track and modify parameters of our application and the EE frameworks it runs on - at runtime.

JRockit Mission Control

When we run on an Oracle JRockit JVM - either on the server, the client or both - we have a lot of tools at our disposal. The key is JRMC.exe or JRockit Mission Control. JRMC enables us to use JMX MBeans exposed by WebLogic and the JPA provider - EclipseLink.

JRMC Method Profiler

The method profiler of JRMC is just one of the tools we can use when running JRockit to determine where our performance issues are - or test out a performance fix before and after a change.

For example - one of the biggest performance issues is toString() String.class allocation or concatenation. We should be able to answer the question of whether use of StringBuffer will alleviate this performance hit.

JMX Management

We can use the JMX MBeans exposed by the JPA provider - EclipseLink - to view and modify attributes of our persistence context running on the central server at runtime.

For instance, we may wish to change the logging level of the persistence context so we can temporarily track SQL statements to the database - without redeploying the EAR.

Any of JConsole or JRMC can be used - we will concentrate on JRMC or JRockit Mission Control.

As you can see above, we let a single remote client run for a couple seconds so that it reported a completed UnitOfWork entity packet back to the session bean that holds the dependency injected persistence context on the server.

Statistics

JOPS per Watt

Since we have not implemented the core scalar integer processing in native C, SSE C or even GPU C the current "Java Operations (JOPS) per watt will suffice until we are able to state MIPS but not FLOPS.

Appendix

Enabling JPA 2.0 on WebLogic 10.3.4

Either follow the instructions on my other tutorial page, or let Eclipse 3.6 Helios change the order of the javax.persistence library and add the JPA 2.0 patch for container managed dependency injection to work.

Enabling JSF 2.0 on WebLogic 10.3.4

Use Eclipse Helios 3.6 EE edition to enable the JSF 2.0 facet after you have installed Oracle WebLogic Server Tools as a server plugin to eclipse.

Note: JSF 2.0 managed beans either via @ManagedBean or via definition in faces-config.xml work fine when using the supplied JSF 2.0 library (2.0/1.0.0.0_2-0-2 in my case) in WebLogic Server 10.3.4.0. I have verified that the new .xhtml facelet based pattern using either annotations or XML works fine using an @EJB injected (in this case @Local) @Stateless session bean - that itself is injected with a JPA 2.0 container managed @PersistenceContext that handles persistence of the model.

Enabling JAX-RS 1.1 on WebLogic 10.3.4

JAX-RS is recommended over traditional JAX-WS.

Similar to enabling JSF - we must enable the JAX-RS shared-library WAR on the server.

See C:\opt\wls1034r20110115\wlserver_10.3\common\deployable-libraries

Jersey Servlet Implementation = jersey-bundle-1.1.5.1.war

JAX-RS API = jsr311-api-1.1.1.war

Class Introspection = asm-3.1.jar (part of ?)

JSON processor = jackson-core-asl-1.1.1.war

JSON processor = jackson-jaxrs-1.1.1.war

JSON processor = jackson-mapper-asl-1.1.1.war

JSON Streaming = jettison-1.1.war

ATOM processing = rome-1.0.war

We don't have to do this manually, there is a script supplied to register the Jersey JAX-RS 1.1 RI on WebLogic 10.3.4

This issue where a JSF 2.0 EAR created for an Eclipse 3.6 managed WebLogic 10.3.4.0 is solved by enabling the same JSF 2.0 library from Eclipse against the other remote WebLogic 10.3.4.0 where the EAR is exported to.

You should see the following logs on the 2nd WebLogic server console after the specific JSF 2.0 library has been enabled.

An alternate solution to enabling JSF 2 and JAX-RS 1.1 on WebLogic Server 10.3.4 is to deploy the deployable shared libraries manually via the WebLogic console. Your deployed libraries should include the following.

Avoiding Obstructed SVN Web Project

For some reason SVN is reporting an obstructed web project whenever I modify the web project. I narrowed it down to Eclipse Helios causing the entries file at the root to be deleted on refresh.

Using Thread Unsafe API in a Thread Safe Way

Use InheritableThreadLocal to make SimpleDateFormat Thread Safe

The SimpleDateFormat implementation of DateFormat is not thread safe by design. If you wish to use this API you have several architectures to choose from in your application.

Use sychronization in your implementation - not advisable since this will queue your requests

Use clones of the format for every thread or parse call

Use InheritableThreadLocal storage by using get/set to use an instance of SimpleDateFormat per thread - recommended for environments running in thread pools - like in EE application servers.

Here we use an InheritedThreadLocal map entry to store an instance of the DateFormat object for each thread. See p.45 section 3.3 Thread confinement of "Java Concurrency in Practice" by Brian Goetz. Also, do not set this field before starting this thread or the ThreadLocal map value will be cleared to initialValue()

However, we go further than the book because we also must override the childValue method of InheritableThreadLocal to handle the case where shared variables like the formatter are set before child threads are created. In that case we must clone the variable to maintain thread safety

Get the JNDI name working for remote session bean lookup from an SE client in another JVM for GlassFish (this is working for WebLogic). Currently I can only get the case where the SE client is run in the same JVM as the server to work (where we use the no-arg constructor of InitialContext())

Progress

As of 20110228:1100EDT after 9.75 days of server up time since 20110218:1620EDT using an average of 9 distributed JVM's on 7 machines running in parallel using an unoptimized version of the distributed collatz software - we are at.