Description

Clustered Environment: constraint violation if clones are started at the same time.

Exception thrown:

com.ibm.websphere.ce.cm.DuplicateKeyException: [IBM][CLI Driver][DB2/6000] SQL0803N One or more values in the INSERT statement, UPDATE statement, or foreign key update caused by a DELETE statement are not valid because the primary key, unique constraint or unique index identified by "2" constrains table "PORTLET_APPLICATION" from having duplicate rows for those columns. SQLSTATE=23505

at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java(Compiled Code))
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java(Compiled Code))
at java.lang.reflect.Constructor.newInstance(Constructor.java(Compiled Code))
at com.ibm.websphere.rsadapter.GenericDataStoreHelper.mapExceptionHelper(GenericDataStoreHelper.java:502)
at com.ibm.websphere.rsadapter.GenericDataStoreHelper.mapException(GenericDataStoreHelper.java:545)
at com.ibm.ws.rsadapter.jdbc.WSJdbcUtil.mapException(WSJdbcUtil.java:902)
at com.ibm.ws.rsadapter.jdbc.WSJdbcPreparedStatement.executeUpdate(WSJdbcPreparedStatement.java:555)
at org.apache.ojb.broker.accesslayer.JdbcAccessImpl.executeInsert(JdbcAccessImpl.java:216)
at org.apache.ojb.broker.core.PersistenceBrokerImpl.storeToDb(PersistenceBrokerImpl.java:1754)
at org.apache.ojb.broker.core.PersistenceBrokerImpl.store(PersistenceBrokerImpl.java:813)
at org.apache.ojb.broker.core.PersistenceBrokerImpl.store(PersistenceBrokerImpl.java:726)

is not encapsulated in one and only one transaction and the transactions are not blocking other clusters. To change the data of the PortletApplication it uses methods of the PersistenceBrokerPortletRegistry which are encapsulated by transactions for removing and creating a portlet application.

Since (re)registering removes and inserts data from the database which is not fully encapsulated by one transaction and not write locked, there maybe conflicts. A sample:

A = cluster node 1
B = cluster node 2

A removes PA from DB

B removes PA from DB again (with no effect)

A inserts PA into DB

B inserts PA into DB (exists with duplicat key constraints violation)

What would be the options:

1.) Make sure only one cluster node can (re)deploy the portlet application at once.

A first approach could be be:

delete and insert should only be executed, if not executed yet by another cluster node

to synchronize add a kind of "monitor" to the database (e.g. new table with monitoring "flag" and optimistic locking)

every cluster node checks the monitor

if monitor not set, the cluster node sets it and executes the deletion/insert stuff

if monitor set, the cluster node waits until monitor is "free" and only reloads the registry (with the already written Portlet Application by the other cluster node)

if both cluster nodes want to update the monitor, optimistic locking leads to an exception on one side. that side then also should wait and reload

make sure the cluster node retries to (re)deploy the portlet application on exception (see 2.))

2.) Catch the exception, roll back and keep on trying to (re)deploy the porlet.xml

I am not sure if this is a good solution because multiple transactions on multiple cluster nodes could produce invalid data in the database tables or deadlocks? (I am not an clustered eviroment database expert )

3.) change the (re) deploy process:

avoid deletion of the portlet application

step trough the object tree and insert/update only if necessary

combine this with optimistic locking (requires data model change)

4.) another slick solution that makes everything much easier (maybe at OJB level?)

I would like to synchronize with the core developers before starting to implement a solution. What do you think?

The quickest solution for now with the least impact on data model and code base would be 2.), but I am not sure if this is a really robust solution. Please comment.

To generally avoid problems in clustered environments we maybe have to change some aspects of the database access via OJB as stated in :

Joachim Müller
added a comment - 14/Sep/07 09:49 The problem here is that the (re)registering defined in
PortletApplicationManager.registerPortletApplication(...)
is not encapsulated in one and only one transaction and the transactions are not blocking other clusters. To change the data of the PortletApplication it uses methods of the PersistenceBrokerPortletRegistry which are encapsulated by transactions for removing and creating a portlet application.
i.e.
PersistenceBrokerPortletRegistry.registerPortletApplication(PortletApplicationDefinition)
PersistenceBrokerPortletRegistry.removeApplication(PortletApplicationDefinition)
Since (re)registering removes and inserts data from the database which is not fully encapsulated by one transaction and not write locked, there maybe conflicts. A sample:
A = cluster node 1
B = cluster node 2
A removes PA from DB
B removes PA from DB again (with no effect)
A inserts PA into DB
B inserts PA into DB (exists with duplicat key constraints violation)
What would be the options:
1.) Make sure only one cluster node can (re)deploy the portlet application at once.
A first approach could be be:
delete and insert should only be executed, if not executed yet by another cluster node
to synchronize add a kind of "monitor" to the database (e.g. new table with monitoring "flag" and optimistic locking)
every cluster node checks the monitor
if monitor not set, the cluster node sets it and executes the deletion/insert stuff
if monitor set, the cluster node waits until monitor is "free" and only reloads the registry (with the already written Portlet Application by the other cluster node)
if both cluster nodes want to update the monitor, optimistic locking leads to an exception on one side. that side then also should wait and reload
make sure the cluster node retries to (re)deploy the portlet application on exception (see 2.))
2.) Catch the exception, roll back and keep on trying to (re)deploy the porlet.xml
I am not sure if this is a good solution because multiple transactions on multiple cluster nodes could produce invalid data in the database tables or deadlocks? (I am not an clustered eviroment database expert )
3.) change the (re) deploy process:
avoid deletion of the portlet application
step trough the object tree and insert/update only if necessary
combine this with optimistic locking (requires data model change)
4.) another slick solution that makes everything much easier (maybe at OJB level?)
I would like to synchronize with the core developers before starting to implement a solution. What do you think?
The quickest solution for now with the least impact on data model and code base would be 2.), but I am not sure if this is a really robust solution. Please comment.
To generally avoid problems in clustered environments we maybe have to change some aspects of the database access via OJB as stated in :
http://db.apache.org/ojb/docu/howtos/howto-work-with-clustering.html
http://db.apache.org/ojb/docu/guides/lockmanager.html#LockManagerRemoteImpl

1.) It introduces an (optional) configurable maxRetriedStart Parameter (defaults to 10). This parameter defines how often the PA (portlet application) manager will try to restart a PA on error.

2.) A PA registration error on startup does not lead into NOT register the PA anymore. The descriptor change monitor is always started for the PA, also in case of an registration error.

3.) The description change monitor tries to start the PA if

a.) the PA descriptors have changed OR
b.) the previous start of the PA was unsuccessful, as long as the number of unsuccessful starts does not exceed maxRetriedStart (defaults to 10)

This means that in a cluster (we presume identical portlet descriptors here) the cluster nodes can "delay" the PA registration if the node encounters registration problems (like the described constraint violation). If the problem is not recoverable (portlet.xml is destroyed) in deactivated the re-registration after a number of retries (but restarts registration on PA descriptor changes).

Still the registration of the PA and the synchronization of the PA descriptors with the database is based on the picked up changes of the (file based) PA descriptors. The cluster nodes will not pick up changes of the PA introduced by another cluster node as long as they are not restarted.

Joachim Müller
added a comment - 18/Sep/07 15:35 I have attached a patch that addresses the problem.
It solves the problem as follows:
1.) It introduces an (optional) configurable maxRetriedStart Parameter (defaults to 10). This parameter defines how often the PA (portlet application) manager will try to restart a PA on error.
2.) A PA registration error on startup does not lead into NOT register the PA anymore. The descriptor change monitor is always started for the PA, also in case of an registration error.
3.) The description change monitor tries to start the PA if
a.) the PA descriptors have changed OR
b.) the previous start of the PA was unsuccessful, as long as the number of unsuccessful starts does not exceed maxRetriedStart (defaults to 10)
This means that in a cluster (we presume identical portlet descriptors here) the cluster nodes can "delay" the PA registration if the node encounters registration problems (like the described constraint violation). If the problem is not recoverable (portlet.xml is destroyed) in deactivated the re-registration after a number of retries (but restarts registration on PA descriptor changes).
Still the registration of the PA and the synchronization of the PA descriptors with the database is based on the picked up changes of the (file based) PA descriptors. The cluster nodes will not pick up changes of the PA introduced by another cluster node as long as they are not restarted.

Ate Douma
added a comment - 09/Dec/07 02:36 As I had to apply the provided patch by hand to be able to review it, I made a new one.
Note: I haven't yet reviewed the patch yet, but plan to do so later this weekend / early next week.

Reviewing the new patch I made myself earlier it turned out I made a few errors when applying Joachim's patch by hand.
After fixing those, this patch tested out very well and definitely provides some improvements as well as a more robust handling for clustered environments.

So, for the record, I'm attaching a fixed version of the patch I created and then will commit this to both the 2.2 trunk and the 2.1.3 branch to which I actually tested this one out.

Ate Douma
added a comment - 09/Dec/07 23:52 - edited Reviewing the new patch I made myself earlier it turned out I made a few errors when applying Joachim's patch by hand.
After fixing those, this patch tested out very well and definitely provides some improvements as well as a more robust handling for clustered environments.
So, for the record, I'm attaching a fixed version of the patch I created and then will commit this to both the 2.2 trunk and the 2.1.3 branch to which I actually tested this one out.