1. Instead of resetting the agent's "In-Transit" flag separately in each place where the teleport can fail, do so in a "finally" clause. This is more robust because it reduces the need to remember to reset the flag wherever it's needed.

2. In addition, reset the "In-Transit" flag when the client disconnects. This ensures that if a bug did happen and leave the agent In-Transit, then simply disconnecting and reconnecting will fix the problem. Without this change only a region restart would fix the agent's status.

I am certainly happy with changes to improve the readability of ResetFromTransit().

However, I believe this patch suffers from a problem if DoTeleport() exits early because the agent is already teleporting. In this case, from my reading ResetFromTransit() will be invoked when it should not be.

In addition, have you encountered any cases where the client was closed in the middle of teleport? I am extremely wary of belt and braces solutions here because they tend to hide underlying problems - I would far rather a bug start occurring so that it can be fixed properly rather than get hidden and possibly manifest itself in other ways that become much harder to debug.

However, it strikes me that perhaps a client close by the user is possible in the middle of teleport. Have you tested this?

For instance, hooking OnClientClosed() also has the effect of resetting the transit flag at sp.Scene.IncomingCloseAgent() in ETM.DoTeleport() rather than right at the very end. This would open up a race condition with sp.Scene.NeedSceneCacheClear(). It just so happens that I just looked through this old method and it appears completely pointless so I intend to comment it out/remove it. However, this is the kind of unexpected stuff that could be introduced without very careful analysis.

Also, why make the EntityTransferStateMachine (and OnClientClosed()) protected rather than private? ETSM should be managed purely by the ETM module and not touched by subclasses unless there's a very good reason.

1. Handle correctly the case where an agent that is already in transit attempts to start another teleport/cross. There's a simple rule that ensures that this works correctly: calls to SetInTransit() and ResetFromTransit() must come in pairs. If SetInTransit() succeeds then start a try/finally block that calls ResetFromTransit() when it's done.

2. Changed m_entityTransferStateMachine to be private. I actually think it's ok to have it protected: that was done to allow HGEntityTransferModule to access it. Since HGEntityTransferModule is similar to EntityTransferModule, it should be allowed to call its internal members. But after this latest refactoring I no longer need to call m_entityTransferStateMachine from HGEntityTransferModule, so I made it private.

To answer your questions:

Yes, I did encounter a case where the client was closed in the middle of teleport. This is easy to do. A bigger problem happened when the teleport failed: in some cases the user's transit state wasn't reset, with the serious consequence that the user couldn't teleport any longer. Previous to this patch, this was a region-level error that required a region restart to fix. Now, if this happens again, it's just a user-level error that can be fixed simply by logging out and back in.

Hi Oren. I certainly agree with the try/finally changes so I've put these in as git commit ccb7cce. I separated out the other changes into jc.reset-state-on-client-close.patch above.

I actually do like the idea of resetting the transit state on the client close but not for the 'belt and braces' reason. Rather, it's because it has the potential to stop other transfer-related things from happening when the client is closed during teleport.

However, on quickly testing this, I do notice that if a client is closed whilst the first region is waiting for the 'everything is fine' reply from the destination region, the following kind of thing occurs.

This works but is extremely ugly with the exception spew. Now, in this case one could prevent this by checking for the InTransit state after receiving false from WaitForAgentArrivedAtDestination(). However, I think it emphasizes the fact that making this change swaps a known set of behaviours when a client is closed during teleport for a new set of behaviours where we need to have a good idea of the knock-on effects.

This may also be a crude way of implementing teleport cancel. At the moment, this is not hooked up at all and hitting cancel can strand an avatar and force a relog.

But of course, these changes narrow the window for races rather than close it. I think that what is ultimately needed is to extend the state model to encompass client close and login as well as inter-region transfers. It may well make sense for a presence to always have some kind of transfer state, where one of those states is 'not transferring'. Thoughts?

On a minor point, I did not carry through the logging change from debug to warn after a QueryAccess fails. This is because this call could fail for all kinds of legitimate reasons (region full, viewer banned, etc.) and so a log warning would be misleading.

I effectively implemented the client closing check in git master 7471bc7, after previously implementing teleport cancellation as best as can be done at the moment. This is done in a more controlled way with a new Aborting transfer state which the teleport process checks at strategic points.

This still leaves windows where behaviour is unpredictable. However, extending the existing AgentCircuitData logging that controls login/logout into entity transfer quickly became extremely hairy.