Thursday, September 10, 2009

Yahoo has a great performance analysis tool in flavor of a Firefox addon: YSlow (yes, you need to install the -also great- Firebug addon first). The YSlow site has already explained all of the best practices in detail here.

Yahoo's explanations are in general clear enough for the average Java EE web application developer, but when the YSlow's Server category comes into the picture, Yahoo unfortunately only gives examples based on Apache HTTP server and PHP and in a few cases also IIS. In this article I'll "translate" the relevant subcategories into the Java EE approach based on Apache Tomcat 6.0. As a bonus, a few more best practices are added and explained in detail.

This is the first rule of the YSlow's Server category. Well, the idea is nice, but this is in my opinion not a "must". Having a secondary domain (no, not a subdomain) for pure static content is a more general practice to gain performance in serving static content. A webbrowser is namely restricted to have a certain maximum amount of simultaneous open connections on a single domain. In the older browser versions this is usually limited to 2 and ranges nowadays around 10-15 connections. This can also be changed using a simple regedit (MSIE) or by editing about:config (Firefox). Those kind of tweaks are usually only done by the more advanced users with an above average knowledge of the software they use.

So, to give a broader area of visitors a better performance experience, it may be better to have a secondary domain for pure static content only. E.g. onedomain.com for JSP files and anotherdomain.com for CSS/JS/Flash/etc files. Or of course such a CDN as suggested by Yahoo, but again, a CDN for private static data is in my opinion a bit nonsensicial. After all, if you respect the performance rules for static content the correct way, then the static content will actually only be requested whenever really needed, so this makes a secondary domain or CDN more superfluous. Or you must have a webapplication which needs to serve a lot of non-layout-related images, such as photography.

For 3rd party public static content it's however definitely worth the effort to link it to a CDN which is provided by themselves, if any. For example jQuery offers several CDN hosts. It's a win-win situation for both your server and the client.

This is the second rule of the YSlow's Server category. A very good point. The Expires header prevents the browser to re-request the same static content (JS/CSS/images/etc) everytime, which is only a waste of the available time, connections and bandwidth. When you're serving static content from public webcontent in Tomcat, then the DefaultServlet is responsible for serving the content. It unfortunately does nothing with the Expires header. Although it supports the Last-Modified headers, this costs effectively a HEAD request which is already one connection and request too much when the content is actually not changed after all. You can however override the DefaultServlet with an own implementation as outlined here. How to do it effectively is already covered by the earlier FileServlet article at this blog. This servlet is a well suited solution for the second, third as well as the fourth rule of the YSlow's Server category.

About the cache-control header for dynamic content, the general practice is that we just want to avoid caching of dynamic content, especially the pages containing forms or the pages in restricted area. You can do that by adding the following response headers to the base controller Servlet or Filter of your webapplication:

There is a little story behind the no-store and must-revalidate attributes of the cache-control header: some webbrowsers (including Firefox) doesn't cache the page when those attributes are omitted! According to the HTTP specification only the no-cache should have been sufficient. But OK, now we at least have the 'magic' three headers which should work for all decent webbrowsers and proxies.

The Expires header is useful, but .. with a (too) far-future Expires header, the client won't check for any updates on the static resource anymore until the expire date has passed, or you clear the browser cache, or you do a hard-refresh (CTRL+F5)! A common practice is then to append an unique query string to the URL of the static content denoting a timestamp of the last file modification or the server startup time, so that the browser is forced to re-request it whenever the query string changes.

Determining the last modification time on every request is more expensive than just determining the server startup time only once in application's lifetime. It is generally sufficient to do so. Whenever the server restarts, the browser will send a HEAD request to check if there are any updates. Assuming that your server doesn't restart every minute or so, this doesn't harm that much. Here's an example of how to do it using a ServletContextListener:

Appending query string with a timestamp to static CSS files is nice, but .. this doesn't cover the CSS background images! Those counts each as a separate request. If you don't append a timestamp query string to them, then they won't be checked for any updates. How to handle it may differ per environment, so I'll only describe my general approach to give the idea. You might need to finetune it further to suit your environment. I myself use a batch job using YUI Compressor (yes, it's a Java API!) to minify all CSS and JS files before deploy. After getting the minified result, regexp is used to find all background images in the CSS source and File#lastModified() is used to get the last modification timestamp from it and finally the originals will be replaced. Here's a basic example of the Minifier -keep in mind, this may needed to be modified to suit your environment:

This is the third rule of the YSlow's Server category. Yes, that's also a very good point. Gzip is relatively fast and can save up to 70% of the network bandwidth. For static text content you can just use the aforementioned FileServlet article at this blog. For dynamic text content you'll need to configure the application server so that it uses GZIP compression. This is usually explained in the documentation of the application server in question. In case of Apache Tomcat 6.0 you can find it here. You need to extend the <Connector> element in Tomcat/conf/server.xml with a compression attribute which is set to "on". Here's a basic example (note the last attribute):

This is the fourth rule of the YSlow's Server category. Again a good point and again also covered by the aforementioned FileServlet article at this blog. The ETags are not needed for dynamic content as they are usually not to be cached.

This is the fifth rule of the YSlow's Server category.
Well, that's also a good point. Flushing the response between </head> and <body>. But that's one of the 0,01% cases where in you can't quickly go around a (cough) scriptlet and thus its use is less or more forgiveable.

...
</head><% response.flushBuffer(); %><body>
...

However, in case of Apache Tomcat 6.0 the HTTP connector uses a buffer size of 2KB (2048 bytes) by default which is configureable using the bufferSize attribute. This is generally more than good enough. The average HTML head with the "default" minimum tags (doctype, html, head, meta content type, meta description, base, favicon, CSS file, JS file and title) already accounts 1 up to 1.5KB in size. In any way, in one of my last webapps I have used a slightly modified WhitespaceFilter which removes all whitespace inside the <body> and instantly pre-flushes the stream before the <body>.

When your webapplication needs to handle more than around 1.000 concurrent connections, or when your webserver is also used for other purposes than only serving the web, then it's generally better to use non-blocking IO streams instead of blocking IO streams. It scales much better as you don't need one implicitly opened thread per opened IO resource anymore, instead basically all resources are managed by a single thread. This saves the server from a lot of threads and the overhead of controlling them and the exponentially growing performance drop when the amount of concurrent threads (HTTP connections) gets high. You're for performance also not dependent on the amount of available threads anymore, but more on the amount of available heap memory. It can go up to around 20.000 concurrent connections on a single thread instead of around 5.000 concurrent connections on that much threads.

Most decent servers supports NIO, as does Apache Tomcat 6.0 in the HTTP connector. Basically all you need to do is to replace the default protocol attribute of "HTTP/1.1" with "org.apache.coyote.http11.Http11NioProtocol". The Tomcat NIO connector implementation is also known as "Grizzly". In some full fledged Java EE application servers like Sun Glassfish, this is by default turned on.

That's basically all! Restart Tomcat and now it will use NIO to handle HTTP connections. Only ensure that you give it enough memory (also in the IDE when developing with it). You can start with 512MB, but 1024MB is better.

Copyright - No text of this article may be taken over without explicit authorisation. Only the code is free of copyright. You can copy, change and distribute the code freely. Just mentioning this site should be fair.

Tuesday, September 1, 2009

What to write?

It is not that I'm out of inspiration. Contrary, I have too much inspiration and too little time that I don't know what to write and finish. When I get inspiration, then I start with some introductory text and some code samples and/or notes. But when I run out of time in meanwhile, I leave it for as it is too long until I get inspiration about another subject .. and the story continues. I have several unfinished startups. Here are some examples of unfinished articles:

Doing the SQL JOIN in DAO

Uploading files in JSP

Table display, paging and sorting in JSP

Export to Excel

Using jQuery with JSP/Servlet

Website performance tips and tricks

I also analyzed using Google Analytics the 'missing hits' of Google searches which incorrectly listed this blog in the results, but which are in my opinion indeed worth a blog because there is almost no clear information about it at the world wide web (but I unfortunately don't have any practical experience with it, so I can't write something clear and robust enough about it, thus I need more time to play with it first):

Using JPA in Eclipse, Tomcat and/or JSF

Populate child menus with Ajax in JSF

Geez. I like my job and my family, but it is taking me too much time :o) Which one should I now continue?

About

Donate

For the ones who want to express their excessive thanks for my work, I used to have an Amazon wishlist with a list of books, but right now I don't have any interesting books on the list anymore (to anyone who've sent books before: thank you very much, I got 6 books in 6 months). You can always donate something so that I can use it for other stuff, such as Nespresso coffee.