Introduction

Microsoft ASP.NET Ajax is a very powerful Ajax framework. However, when you build a real Ajax site like those out there in the Web 2.0 world, you face many problems that you will hardly find documented anywhere. In this article, I will show some advance-level ideas that I learned while building Pageflakes. We will look at the advantages and disadvantages of Batch calls, Ajax call timeouts, browser call jam problems, ASP.NET 2.0's bug in web service response caching, and so on.

Why Use ASP.NET Ajax

When others see Pageflakes, the first question they ask me is, "Why did you not use Protopage or Dojo library? Why Atlas?" Microsoft Atlas (renamed to ASP.NET Ajax) is a very promising Ajax framework. They are putting a lot of effort into it, making lots of reusable components that can really save you a lot of time and give your web application a complete face lift at reasonably low effort or change. It integrates with ASP.NET very well and is compatible with the ASP.NET Membership and Profile provider. The Ajax Control Toolkit project contains 28 extenders that you can drag & drop onto your page, tweak some properties and add pretty cool effects on the page. Check out the examples to see how powerful the ASP.NET Ajax framework has really become.

When we first started developing Pageflakes, Atlas was in infancy. We were only able to use the page method and Web Service method call features of Atlas. We had to make our own drag & drop, component architecture, pop-ups, collapse/expand features, etc. Now, however, you can have all these from Atlas and thus save a lot of development time. The web service proxy feature of Atlas is a marvel. You can point a <script> tag to an *.asmx file and get a JavaScript class generated right out of the web service definition.

The JavaScript class contains the exact methods that you have on the web service class. This makes it really easy to add/remove new web services and add/remove methods in web services that do not require any changes on the client side. It also offers a lot of control over the Ajax calls and provides rich exception trapping features on the JavaScript. Server side exceptions are nicely thrown to the client side JavaScript code, and you can trap them and show nicely formatted error messages to the user. Atlas works really well with ASP.NET 2.0, eliminating the integration problem completely. You need not worry about authentication and authorization on page methods and web service methods. You thus save a lot of code on the client side -- of course, the Atlas Runtime is huge for this reason -- and you can concentrate more on your own code than on building up all this framework-related code.

The recent version of Atlas works nicely with ASP.NET Membership and Profile services, giving you login/logout features from JavaScript without requiring page post-backs. You can read/write Profile objects directly from JavaScript. This comes in very handy when you heavily use ASP.NET Membership and Profile providers in your web application, which we do at Pageflakes.

On earlier versions of Atlas, there was no way to make HTTP GET calls. All calls were HTTP POST and were thus quite expensive calls. Now you can say which calls should be HTTP GET. Once you have HTTP GET, you can utilize HTTP response caching features, which I will show you soon.

Batch Calls Are Not Always Faster

ASP.NET Ajax had a feature in the CTP release (and previous releases) that allowed batching of multiple requests into one request. It worked transparently, so you wouldn't notice anything, nor would you need to write any special code. Once you turned on the Batch feature, all web service calls made within a duration got batched into one call. Thus, it saved round-trip time and total response time.

The actual response time might be reduced, but the perceived delay is higher. If three web service calls are batched, the first call does not finish first. If you are doing some UI updates, all three calls finish at the same time, upon completion of each WS call; it does not happen one-by-one. All of the calls complete in one shot and then the UI gets updated in one shot.

As a result, you do not see incremental updates on the UI. Instead, you see a long delay before the UI updates. If any of the calls -- say the third call -- downloads a lot of data, the user sees nothing happening until all three calls complete. So, the duration of the first call becomes nearly the duration of the sum of all three calls. Although the actual total duration is reduced, the perceived duration is higher. Batch calls are handy when each call is transmitting a small amount of data. Thus, three small calls get executed in one round trip.

Let's work on a scenario where three calls are made one-by-one. Here's how the calls actually get executed.

The second call takes a little bit of time to reach the server because the first call is eating up the bandwidth. For the same reason, it takes longer to download. Browsers open two simultaneous connections to the server so, at a time, only two calls are made. Once the second/first call completes, the third call is made. When these three calls are batched into one:

Here the total download time is reduced (if IIS compression is enabled) and there's only one network latency overhead. All three calls get executed on the server in one shot and the combined response is downloaded in one call. To the user, however, the perceived speed is slower because all the UI updates happen after the entire batch call completes. The total duration the batch call will take to complete will always be higher than that for two calls. Moreover, if you do a lot of UI updates one after another, Internet Explorer freezes for awhile, giving the user a bad impression. Sometimes, expensive updates on the UI make the browser screen go blank and white. Firefox and Opera do not have this problem.

Batch calls have some advantages, too. The total download time is less than that for downloading individual call responses because if you use gzip compression in IIS, the total result is compressed instead of individually compressing each result. So, generally, a batch call is better for small calls. However, if a call is going to send a large amount of data or is going to return, say, 20 KB of response, then it's better not to use batch. Another problem with batch calls occurs when, say, two calls are very small but the third call is quite big. If these three calls get batched, the smaller calls are going to suffer from the long delay due to the third larger call.

Bad Calls Make Good Calls Time Out

If two HTTP calls somehow get stuck for too long, those two bad calls are going to make some good calls expire too, which in the meantime got queued. Here's a nice example:

I am calling a method named Timeout on the server which does nothing but wait for a long time so that the call gets timed out. After that, I am calling a method which does not time out. Guess what the output is:

Only the first call succeeded. So, if at any moment the browser's two connections get jammed, then you can expect that other waiting calls are going to time out as well. In Pageflakes, we used to get nearly 400 to 600 timeout error reports from users' browsers. We could never figure out how this could happen. First, we suspected slow internet connections, but that cannot happen for so many users. Then we suspected something was wrong with the hosting provider's network. We did a lot of network analysis to find out whether there were any problems on the network or not, but we could not detect any.

We used SQL Profiler to see whether there were any long-running queries that timed out the ASP.NET request execution time, but no luck. We finally discovered that it mostly happened due to some bad calls which got stuck and made the good calls expire, too. So, we modified the Atlas Runtime and introduced automatic retry on it, and the problem disappeared completely. However, this auto-retry requires a sophisticated open heart bypass surgery on the ASP.NET Ajax framework JavaScript. The idea is to make each and every call retry once when it times out. In order to do that, we need to intercept all web method calls and implement a hook on the onFailed call-back, which will call the same web method again if the failure reason was a timeout.

Another interesting discovery we made while we were traveling was that whenever we tried to visit Pageflakes from a hotel or an airport wireless internet connection, the first visit always failed and all the web service calls on first attempt always failed. Until we did a refresh, nothing worked. This was another major reason why we implemented immediate auto-retry of web service calls, which fixed the problem.

Here's how to do it. The Sys$Net$WebServiceProxy$invoke function is responsible for making all Web Service calls. So, we replace this function with a custom implementation that passes a custom onFailure call-back. That custom call-back gets fired whenever there's an error or timeout. So, when there's a timeout, it calls the this function again and thus a retry happens.

Here you see that the first method succeeded and all the others timed out and retried. However, you will see that after a retry, they all succeeded. This happened because server side methods do not time out on retry. So, this proves that our implementation is correct.

Browsers Allow Two Calls at a Time and Don't Expect any Order

Browsers make two concurrent Ajax calls at a time to a domain. If you make five Ajax calls, the browser is going to make two calls first and then wait for any one of them to complete. Then it makes another call until all four remaining calls are complete. Moreover, you cannot expect calls to execute in the same order as you make the calls. Here's why:

Here you see that call 3's response download is quite big, and thus takes longer than call 5. So, call 5 actually gets executed before call 3. The world of HTTP is unpredictable.

Browsers Do Not Respond when More Than Two Calls Are in Queue

Try this: go to any start page in the world that will load a lot of RSS on the first visit (e.g. Pageflakes, Netvibes, Protopage) and, while loading, try to click on a link that will take you to another site or try to visit another site. You will see that the browser is stuck. Until all queued Ajax calls in the browser complete, the browser will not accept any other activity. This is worst in Internet Explorer; Firefox and Opera do not have this much of a problem.

The problem is that when you make a lot of Ajax calls, the browser keeps all calls in a queue and executes two at a time. So, if you click on something or try to navigate to another site, the browser has to wait for running calls to complete before it can take another call. The solution to this problem is to prevent more than two calls from being queued in the browser at a time. We need to maintain a queue ourselves and send calls to the browser's queue from our queue on-by-one. The solution is quite shocking; brace for impact:

QueuedCall encapsulates one web method call. It takes all the parameters of the actual web service call and overrides the onSuccess and onFailure call-backs. We want to know when a call completes or fails so that we can issue another call from our queue. GlobalCallQueue maintains the list of all web service calls. Whenever a web method is called, we first queue the call in the GlobalCallQueue and execute calls from the queue one-by-one ourselves. It ensures that the browser does not get more than 2 web service calls at a time and thus the browser does not get stuck. In order to enable the queue-based call, we need to override the ASP.NET Ajax web method invocation again, as we did before.

Caching Web Service Response on the Browser and Saving Bandwidth Significantly

Browsers can cache images, JavaScript and CSS files on a user's hard drive, and they can also cache XML HTTP calls if the calls are HTTP GET. The cache is based on the URL. If it's the same URL and it's cached on the computer, then the response is loaded from the cache and not from the server when it is requested again. Basically, the browser can cache any HTTP GET call and return cached data based on the URL. If you make an XML HTTP call as HTTP GET and the server returns some special header that informs the browser to cache the response, on future calls, the response will be immediately returned from the cache. This saves the delay of network round trip and download time.

At Pageflakes, we cache the user's state so that when the user visits again the following day, the user gets a cached page that loads instantly from the browser cache, not from the server. Thus, the second-time load becomes very fast. We also cache several small parts of the page that appear on users' actions. When the user does the same action again, a cached result is loaded immediately from the local cache, which saves on the network round trip time. The user gets a fast-loading site and a very responsive site. The perceived speed increases dramatically.

The idea is to make HTTP GET calls while making Atlas web service calls and return some specific HTTP Response headers that tell the browser to cache the response for some specific duration. If you return the Expires header during the response, the browser will cache the XML HTTP response. There are two headers that you need to return with the response, which will instruct the browser to cache the response:

HTTP/1.1 200 OK
Expires: Fri, 1 Jan 2030
Cache-Control: public

This will instruct the browser to cache the response until January, 2030. As long as you make the same XML HTTP call with the same parameters, you will get a cached response from the computer and no call will go to the server. There are more advanced ways to get further control over response caching. For example, here is a header that will instruct the browser to cache for 60 seconds, but not contact the server and get a fresh response after 60 seconds. It will also prevent proxies from returning cached responses when the browser local cache expires after 60 seconds.

The Expires header is set properly, but the problem is with the Cache control. It is showing that max-age is set to zero, which will prevent the browser from doing any kind of caching. If you seriously want to prevent caching, you should emit such a cache-control header. It looks like exactly the opposite thing happened. The output is, as usual, incorrect and not cached:

There's a bug in ASP.NET 2.0 where you cannot change the max-age header. As max-age is set to zero, ASP.NET 2.0 sets the Cache control to private because max-age = 0 means that no cache is needed. There's no way you can make ASP.NET 2.0 return proper headers that cache the response. Time for a hack. After decompiling the code of the HttpCachePolicy class (the Context.Response.Cache object's class), I found the following code:

Somehow, this._maxAge is getting set to zero and the check "if (!this._isMaxAgeSet || (delta < this._maxAge))" is preventing it from getting set to a bigger value. Due to this problem, we need to bypass the SetMaxAge function and set the value of the _maxAge field directly, using Reflection.

Now max-age is set to 60 and thus the browser will cache the response for 60 seconds. If you make the same call again within 60 seconds, it will return the same response. Here's a test output that shows the date time returned from the server:

After 1 minute, the cache expires and the browser makes a call to the server again. The client-side code is like this:

What happens when you call the call method? Do you get 1 on the debug console? No, you get null on the debug console because this is no longer the instance of the class. This is a common mistake that everyone makes. As this is not yet documented in Atlas documentations, I have seen many developers spend time finding out what's wrong. Here's the reason: we know that whenever JavaScript events are raised, this refers to the HTML element that produced the event. So, if you do this:

If you click the button, you see ButtonID instead of 1. The reason is that the button is making the call. So, the call is made within the button object's context and thus this maps to the button object. Similarly, when XML HTTP raises the event onreadystatechanged, which Atlas traps before firing the call-back, the code execution is still on the XML HTTP's context. It's the XML HTTP object that raises the event. As a result, this refers to the XML HTTP object, not your own class where the call-back is declared. In order to make the call-back fire on the context of the instance of the class so that this refers to the instance of the class, you need to make the following change:

HTTP POST Is Slower than HTTP GET, but It Is Default in ASP.NET Ajax

ASP.NET Ajax, by default, makes HTTP POST for all web service calls. HTTP POST is more expensive than HTTP GET. It transmits more bytes over the wire, thus taking precious network time, and it also makes ASP.NET do extra processing on the server end. So, you should use HTTP GET as much as possible. However, HTTP GET does not allow you to pass objects as parameters. You can pass numerics, strings and dates only. When you make an HTTP GET call, Atlas builds an encoded URL and makes a hit to that URL. So, you must not pass too much content that makes the URL become larger than 2048 characters. As far as I know, that's the max length of any URL. In order to enable HTTP GET on a web service method, you need to decorate the method with the [ScriptMethod(UseHttpGet=true)] attribute:

Another problem of POST vs. GET is that POST makes two network round trips. When you first make a POST, the web server sends an "HTTP 100 Continue" message, which means that the web server is ready to accept the content. After that, the browser sends the actual data. So, initiation of a POST request takes more time than GET. Network latency (round trip time between your computer and the server) is the biggest concern in Ajax applications because Ajax makes many small calls that need to be done within milliseconds. Otherwise, the application does not feel smooth and inspires user annoyance. Ethereal is a nice tool to see what happens under the hood on POST and GET:

From the above picture, you see that POST requires a confirmation from the web server, "HTTP/1.1 100 Continue," before sending the actual data. After that, it transmits the data. On the other hand, GET transmits the data without waiting for any confirmation. So, you should use HTTP GET while downloading data from a server like parts of pages, contents in a grid, a block of text, etc. However, you should not use HTTP GET to send data to a server like submission of web forms.

Conclusion

The above extreme hacks are already implemented in Pageflakes, not the exact way as mentioned here, but the principles are the same. So, you can happily rely on these techniques. These techniques will save you from many problems that you will probably never realize in your development environment, but people from all over the world will face these problems when you go for large scale deployment. Having these tricks implemented right from the beginning will save you a lot of development and customer support effort. Keep both eyes on my blog for more tricks to come.