After what seems like ages we finally managed to get to the bottom of the weird issues we were having with a load balanced instance of Umbraco hosted at Amazon Web Services. Despite having followed Umbraco’s load balancing guide and extensively tested we were seeing some weird behaviours for multi-page publishing and node copy operations. Multiple threads were created for each request which lead to unexpected outcomes for users.

Eventually we managed to isolate this to the default (and unchangeable at time of writing) 60 second timeout behaviour of AWS’ ELB infrastructure combined with the long-running nature of the POST requests from Umbraco. The easiest way to see the real side effect of the 60 second timeout is to put a debugging proxy like Fiddler or Charles between your browser and the ELB. What we saw is below.

So, you can see the culprit right there in the red square – the call to the publish.aspx page is terminated at 60 seconds by the ELB which causes the browser to resubmit it – Ouch! This also occurs when you copy or move nodes and the process exceeds 60 seconds – you get multiple nodes!

To be clear – this is not a problem that is isolated to Umbraco – there is a lot of software that relies on long-running HTTP POST operations with the expectation that they will run to completion.

Now there are probably a range of reasons why AWS has this restriction – the forum posts (dating back to 2009) don’t enlighten but it’s not hard to see why, in an “elastic” environment anything that takes a long time to complete may be a bad thing (you can’t “scale up” your ELB if it’s still processing a batch of long-running requests). I can see the logic to this restriction – it simplifies the problems the AWS engineers need to solve, but it does introduce a limitation that isn’t covered clearly enough in any official AWS documentation.

The real solution here has to come from better software design that takes into account this limitation of the infrastructure and makes use of patterns like Post-Redirect-Get to submit a short POST request to initiate the process on the server, redirect to another page and then utilise async calls from the browser to check on the status of the process.

Yes, I know, we could probably run our own instances with HA Proxy on, but why build more infrastructure to manage when what’s there is perfectly fit for purpose?

Updated – You Have An Alternative

10 September – I’ve been lucky enough to be attending the first AWS Achitecture course run by Amazon here in Sydney and the news on this front is interesting. By default you get 60 seconds, *but* you can request (via your AWS Account Manager or Architect) that this timeout be increased up to 17 minutes maximum. This is applied on a per-ELB basis so if you create more ELB instances you would need to make the same request to AWS.

My advice: fix your application before you ask for a non-standard ELB setup.

Not Just For WS ELB

Now, chaps (and ladies), you also need to be aware that this issue will raise its head in Windows Azure as well but most likely after a longer duration. A very obliquely written blog post on MSDN suggests it will be now be based on the duration AND the number of concurrent connections you have.

Having worked with every version of Team Foundation Server (TFS) since its inception I was keen to see what API support “TFSPreview.com” has. The good news is that (at time of blogging) the API accessibility is all there, is free and aligns with the on-premise API and client object model.

I’ve always felt the strongly-typed client object model and library is a strength of the TFS offering over many of its competitors and the classes that compose it provide some good extensibility possibilities – I’ve been on a project where all “Requirements” work item types from an MSF Agile project were exported via the API to a Word document and formatted for presentation to the customer (and we could re-run it any time!)

This past week has seen the RTM availability for a bunch of Microsoft products including Visual Studio and Team Foundation Server 2012, which means that an RTM set of the TFS Client Object Model assemblies are now available. After grabbing them I fired up Visual Studio, added in the correct references and was able to connect to our TFS Preview instance and perform some query magic.

So the above is just a simple and incomplete example – but it will connect and return results for you. The TFS extensibility options are pretty wide and varied as can be seen on MSDN! I’ll post more stuff up here over time as I work through my planned use of this server (a custom team board in our office… too big perhaps, but hey…).

If you don’t have a TFSPreview account I’d recommend getting one and having a play – Microsoft has said the platform will be free through until the end of 2012 so I’d say there’s no better way to try it out than that. They are also shipping updates for the platform every 3 weeks which will be ahead of the on-premise version which will get quarterly updates (based on the TFSPreview updates). Get in and get informed.