Our real-time evaluation showed us that failover from the primary server to the backup server made up over 90% of our recovery time for faulty invocations. As a result, our goal in Phase IV will be to improve this failover time by optimizing the failover process.

One bottleneck in our failover mechanism is the replication manager, which takes a considerable amount of time to update its list. The client spends too much time waiting on the replication manager to provide a new server name. The other bottleneck in our system is the process of creating a new bean once the replication manager has provided a valid server name.

Both of these delays can be greatly reduced by always having a second bean ready on the client. When the client starts, it can ask the replication manager for the name of the next backup server along with the name of the primary server. The client can then create two beans - each pointing to these two different servers. In the event that the client cannot invoke a method on the primary server, it can immediately begin using the secondary bean. It can then continue processing as usual and in the background it can get the name of the next backup server from the replication manager and create a new secondary bean. Using this approach, the client will always have a secondary bean readily available in the event that the primary server goes down. This of course assumes that the backup server will not go down before the primary, but if this does happen, the delay would be no worse than in our current setup. Using this approach of always having two beans readily available on the client, we can significanly reduce the end-to-end latency in the presence of primary server failure.

Tips

JBoss and Java 5

For system evaluation, you may wish to make use of Java 5's System.nanoTime() method. Unfortunately, JBoss has difficulties working under Java 5. To get around this, you can delete the javax.management.* classes in your Java 5 installation. The following commands should accomplish this for you.

SSH provides a nice way of performing remote execution. This is very beneficial for 749 projects which need to remotely start and stop servers and clients. To start the JBoss server on machine risk, for example, you could execute

ssh risk $JBOSS_HOME/bin/run.sh& 2>&1

Unfortunately, when using the ssh method of remote execution, you do not have access to all the environment variables you would normally have access to when logging into machine risk. You can, however, explicitly specify variable values for ssh to use by adding them to the file ~/.ssh/environment on the machine from which you will be performing the remote execution. So in our example, you would modify the file on your source machine and not on machine risk. If you're running everything on ECE machines, it doesn't matter though thanks to AFS. So your file might look something like