I originally intended to open this thread to talk through some design decisions I'd made in how to implement the fix proposed on the JIRA issue, but after doing some testing today I think it's better for me to get clarification on exactly what the issue is in JBAS-1476.

As I understood it, the problem was if a client persisted a serialized handle to a clustered EJB and then the cluster was restarted, deserialization of the handle could pollute the FamilyClusterInfo with the outdated set of targets.

I'm not sure how this could happen, as a call to getEJBObject() on the different EJB handle classes all result in creating a new InitialContext followed by a JNDI lookup and an invocation on the resulting proxy. This should ensure the proper target set is available.

I tested this today as follows:

1) Create two server configs, node1 and node2.2) Start node1.3) Look up an SFSB home on it, call create and invoke a method on the bean.4) Persist a handle to the SFSB to disk.5) Client sleeps 60 secs while I stop node1 and start node2.6) Do a fresh JNDI lookup and create() on the bean. This ensures the FamilyClusterInfo on the client has the current cluster topology.7) Deserialize the handle to the SFSB. Call getEJBObject on it. This fails on the server side because the bean instance can't be found (the passivated bean didn't survive step 5). But there is no problem with making a call to the server.8) Make another call using the bean from step 6. No problem, showing that the FamilyClusterInfo was in no way corrupted.

I've tried all sorts of variations on this (using SLSBs, using a 2 node cluster,etc) and in no case did I see an indication that the deserialized handle was causing a problem.

The one possible thing I saw was that if a user set the -Dorg.jboss.ejb.sfsb.handle.V327 JVM flag, a reference to the Invoker would be stored in the EJB handle and would thus be serialized. Deserializing this could pollute the FamilyClusterInfo, but AFAICT only until getEJBObject() was called on the deserialized handle. Typically, I would expect that call to be made almost immediately after deserializing the handle.

Here is how to reproduce it (you can probably come up with a simpler pattern :-)

1) Set up a cluster with two nodes (e.g. S1, S2).2) Recycle the nodes of the cluster a few times3) From client A make a request on the cluster (e.g. it gets view id 10 - S1, S2)4) Remove all the nodes (the cluster is dead - back to view 0)5) Bring S1, S2 back online and let S3 join6) From client B make a request on the cluster (e.g. it gets view id 2 - S1, S2, S3)6) Remove S1, S2 from the cluster7) Make an invocation from A to B passing the ha proxy as a parameter8) B will now have A's view of the cluster (view 10 > view 2)

View 10 was S1, S2 but S3 is now the only valid member.There is nothing B can do to rejoin the cluster until the view id goes back above 10(unless either S1, S2 are rebooted).

It is step 7 where the deserialization of the HAProxy updates the cluster family info(even though it does not come directly from the server) that is the problem.

I suppose another solution would be to change the update of the family cluster infosuch that it is only updated explicitly on an invocation response(not from any deserialization of the object when it passed over the wire).

I suppose another solution would be to change the update of the family cluster infosuch that it is only updated explicitly on an invocation response(not from any deserialization of the object when it passed over the wire).

Thanks Adrian. I missed this problem before when looking at the jira issue.

IMO, there are two ways to address this; 1. devise a way that cluster id can be compared for which is the most recent. Only problem with this is can't depend on including a timestamp since machines' time may not be in synch. Don't see how using a GUID can indicate which GUID is the most recent (to me is the same problem we have now).

2. Make an explicit call to the server to get the correct view. This should guarantee that get the correct version of the view, but is extra overhead.

There is no need for any kind of cluster restart to be involved to have this problem. ClusteringTargetsReposity.initTarget() will replace the target list any time its called, irregardless of the value of the viewId.

That's not a trivial problem to fix by adding logic to only update the target list if the new viewId is greater than the old. This is because the viewId is *not* a counter that is incremented as the list of replicants changes. It's a hash of the names of the nodes on which the service is deployed.

If we change this hash-based viewId to a counter, we have to ensure that the counter stays in sync across the cluster (which I'm sure is why the hash approach was chosen :)

I haven't thought this through yet, but my instinct tells me a solution might lie in the fact that there are two paths by which a set of targets are received:

1) As part of an HARMIResponse. This list of targets can be considered authoritative, as HARMIResponse only comes from the server.2) As part of a serialized proxy. This list is not necessarily authoritative, as there is no way to know where the proxy came from. If no FamilyClusterInfo exists for the proxy's key, the proxy's targets are authoritative. Otherwise, not.

The trick is what to do with the non-authoritative targets. They can be stored in FamilyClusterInfo (and flushed any time a set of authoritative targets comes in), but how to make them available to the load balance policy? Just append them at the end of the target list returned by getTargets()?

Having to have a valid cluster unique id for stability is a fragile construct in my view. I think we need to rethink this problem from the perspective of view ids as a hint with an ability to simply punt and say, that the current view needs to be recoverable the same as if the entire cluster has been brought down.

I agree with this and is something that Ben and I have been talking about. IMO, first path on the client should be to try executing on the target it has (old or new does not matter). It will either pass (and get an updated target list on response) or fail. If it fails (because old version from deserialized client proxy, whole cluster changed since last client call, whatever), there should be a way for the client do discover new cluster topology.

The problems with discovery on the client is it adds more to the client. Will have to have extra jars for clustering discover. Will also have to have some kind of configuration embedded so it knows how to do the discovery. Also, as things currently stand, may be an issue with the client environment where can not use the preferred method of discovery (i.e. multicast because client existing on different network node). So is going to possibly require some kind of change on the client that will make it so is not seamless for the client user.

I'd like to think a bit about where we want to go on this for 4.0.4. I'd thought through how to implement the GUID concept, but passing around the GUID will affect a lot of classes and will only partly solve one use case -- node A passes bad proxy to node B with cluster restart in between. I say partly solve because when node B uses the bad proxy, the call will still fail, but at least B's own FamilyClusterInfo won't be corrupted.

I agree with Tom's thinking on adding an ability to do discovery after a failure, but adding a full-fledged framework for that for 4.0.4 seems a stretch.

For EJB's at least, we have an existing technology that AFAIK works -- the RetryInterceptor. It has two flaws that I see:

1) To configure the InitialContext, the user has to call a static method on the interceptor class. I think this can be improved by using NamingContextFactory.lastInitialContextEnv.get() like Adrian did in the EJB handle classes.2) The RetryInterceptor keeps retrying forever, whereas what we want is a version that attempts one JNDI lookup and then gives up.

One possibility for 4.0.4 is to create an OnlyOnceRetryInterceptor and add that too our standard clustered EJB stacks.

This could be an implementation of a more generic DiscoveryInterceptor concept, where the discovery mechanism is pluggable. (Tom, don't know if you were thinking along these kinds of lines).

The "OnlyOnceRetryInterceptor" (renamed "SingleRetryInterceptor") mentioned above is implemented and added to the default EJB client interceptor stacks. I think that reasonably addresses the problem of invalid target lists for EJB2 ejbs.

That leaves EJB3 ejbs (where I assume a similar approach could be used) and other non-EJB clustered proxies.

Did we earlier reach consensus that a "discovery-after-failure" mechanism is the right solution to this problem rather than the GUID? If so, I would like to close JBAS-1476, which specifies the GUID approach, and open a new issue to track adding a discovery process for the other proxy types.

but we don't support this mc style of configuration in the legacy stack.

Other things that need to be cleaned up in ha remoting are:

1. HA proxy invokers like JRMPInvokerProxyHA should not exist. The only difference between an ha proxy and regular proxy would be an interceptor that selected from the available targets and set the invocation proxy such that the common Invoker proxy was told the transport proxy to use. Even an unclustered proxy could have a recover/ha sematic like blocking until the server is availble with this configuration. HA is an aspect that should not be embedded in the transport specific proxy. 2. We need a consistent level of debugging info such that under trace level logging we can identify what is going on.

We were discussing a potential problem if a cluster got restarted, and after restart the viewId recalculated to the same number it had before. This could happen if JGroups were configured with a bind_port, as then all the JGroups IpAddress' would use the same address and port.

In this case, we were concerned there might be problem if a client contacted the cluster, transmitting the old viewId. The server wouldn't know the viewId was from the "old" cluster instance and therefore wouldn't send the new targets with the response. Would this cause a problem that adding a GUID would solve?

I've thought about this some more and have come to the following conclusions:

1) Not an issue w/ JRMPInvokerProxyHA. If the cluster is restarted, none of the old client-side targets will be valid, so the client will not be able to contact the cluster without doing a RetryInterceptor-style lookup. That lookup will bring in the current targets.

2) Not an issue with HttpInvokerProxyHA. The targets are just URLs; even if the targets aren't refreshed, the old ones are still usable. In the very off chance that one of the restarted servers changed it's HttpInvoker URL, that target will be incorrect, but this will be detected as soon as an attempt is made to contact the target. That call will fail, and the client will thus know its target set has an issue and will get fresh targets on the next successful call.

3) AFAICT, not a problem with PooledInvokerProxyHA, for largely the same reason as HttpInvokerProxyHA -- the targets are just address/port combinations.