cluster synchronization feature

Details

Description

We developed a cluster synchronization feature that is able to synchronize the execution of an objects method cluster-wide. The approach is similar to the spring transaction proxy, so we are able to AOP-inject the cluster synchronization via spring configuration. The component is highly customizable. A new database table is used for synchonization, but any other data container (i.e. broadcast-distributed-hashmap) could be used.

One main application for this functionality is the start of many jetspeed cluster nodes and synchronize the deployment of the PAs to the database. (Even with the VersionedApplicationManager we experienced DB constraints failures on startup with fresh database, preventing the some PAs from registration.)

PAM implementations are now capable of successfully rolling back and retrying portlet application registrations in both the 2.2 trunk and 2.1.3-POST branch, (to be 2.1.4).

Fixes included adding retry loop in startPortletApplication APIs and cleaning up prefs cache in the 2.1.3-POST branch. The 2.2 trunk has a new prefs/registry implementation that did not require any improvements based on the need to recover successfully from failed registrations.

2.1.3 svn commit: 769669
2.2 svn commit: 769670

The new service proposed as a fix for these problems was not encorporated. A new JIRA issue should be created to donate the service into the 2.2 and/or 2.1.4 if that is still desired.

Randy Watler
added a comment - 29/Apr/09 07:10 PAM implementations are now capable of successfully rolling back and retrying portlet application registrations in both the 2.2 trunk and 2.1.3-POST branch, (to be 2.1.4).
Fixes included adding retry loop in startPortletApplication APIs and cleaning up prefs cache in the 2.1.3-POST branch. The 2.2 trunk has a new prefs/registry implementation that did not require any improvements based on the need to recover successfully from failed registrations.
2.1.3 svn commit: 769669
2.2 svn commit: 769670
The new service proposed as a fix for these problems was not encorporated. A new JIRA issue should be created to donate the service into the 2.2 and/or 2.1.4 if that is still desired.

Reported prefs problems do not appear in 2.2, so these will not be addressed as part of this issue.

An attempt will be made to fix the concurrent startPortletApplication issue without using the patch submitted above. The primary reason we are avoiding the patch above is to prevent cluster wide locking. Database transactions should be sufficiently robust to implement concurrent PAM access. We will reconsider this approach if we are not able to achieve this.

Randy Watler
added a comment - 24/Apr/09 23:51 Unit tests in 2.1.3-POST and 2.2 versions committed.
2.2: 768433
2.1.3: 768434
Reported prefs problems do not appear in 2.2, so these will not be addressed as part of this issue.
An attempt will be made to fix the concurrent startPortletApplication issue without using the patch submitted above. The primary reason we are avoiding the patch above is to prevent cluster wide locking. Database transactions should be sufficiently robust to implement concurrent PAM access. We will reconsider this approach if we are not able to achieve this.

Investigation into this issue has confirmed that there are issues as reported using the VersionedPortletApplicationManager implementation on 2.1.3-POST. There are at least two issues that are evident at this point: concurrent access conflicts between multiple nodes in the cluster and problems writing preferences for a portlet application more than once during the lifetime of a single server.

Randy Watler
added a comment - 24/Apr/09 06:08 Investigation into this issue has confirmed that there are issues as reported using the VersionedPortletApplicationManager implementation on 2.1.3-POST. There are at least two issues that are evident at this point: concurrent access conflicts between multiple nodes in the cluster and problems writing preferences for a portlet application more than once during the lifetime of a single server.

I am changing the fix version to 2.2, since I cannot modify the database tables to a post release.
Will review this before the 2.2 release and make a decision whether it should be included in the release as an optional or default feature.

David Sean Taylor
added a comment - 13/Oct/08 18:24 I am changing the fix version to 2.2, since I cannot modify the database tables to a post release.
Will review this before the 2.2 release and make a decision whether it should be included in the release as an optional or default feature.

The patch should be modified in deployment.xml to use the VersionedPortletApplicationManager instead of the PortletApplicationManager (or at least set descriptorChangeMonitorInterval to 0). We experienced in some setups that the DescriptorChangeMonitor fires a startPA() before startPortletApplication was executed as it was blocked by the cluster synchronization.

Unfortunately startPA() cannot be proxied because it is protected. What about making the method public so the cluster sync can hook in?

Joachim Müller
added a comment - 26/Aug/08 10:41 - edited The patch should be modified in deployment.xml to use the VersionedPortletApplicationManager instead of the PortletApplicationManager (or at least set descriptorChangeMonitorInterval to 0). We experienced in some setups that the DescriptorChangeMonitor fires a startPA() before startPortletApplication was executed as it was blocked by the cluster synchronization.
Unfortunately startPA() cannot be proxied because it is protected. What about making the method public so the cluster sync can hook in?

A new table CLUSTER_RESOURCE is created to store the LOCK tokens. It also stores information about TTL and the client that obtained the lock.

With the parameters in the new deployment.xml it is possible to influence different behaviours like: TTL, time to wait for retry obtaining a cluster wide lock and the max. number of retries before the proxy interrupts the method execution.

Joachim Müller
added a comment - 06/Aug/08 16:51 some more comments:
A new table CLUSTER_RESOURCE is created to store the LOCK tokens. It also stores information about TTL and the client that obtained the lock.
With the parameters in the new deployment.xml it is possible to influence different behaviours like: TTL, time to wait for retry obtaining a cluster wide lock and the max. number of retries before the proxy interrupts the method execution.