I know this site is for scalable web site design. But as there aren't any sites I can find for graceful failure under "slashdotted" like pressure I'll ask here.

Does anyone have a sensible way, once you have a "web application" that either won't scale, or can't scale, that you can give some users a good consistent experience and bounce other users to a busy site page. I have seen sites do this to varying degrees, some of which work better than others, but no explanations beyond simply bouncing requests to a "we're busy page server" when you have more than a given number of connections. This is obviously useless as a web page likely requires multiple connection (ignoring keep-alive, pipelining etc) multiple connection to completely render properly.

The normal problem is users getting a page and not the "furniture" for that page like images or css. Other problems are having to wait ages to get the busy page or the site being slow even if you do "get in". And some site let a user "in" and then as they browse around they get bounced out suddenly to the busy page.

Obviously not being the developer for sites I deal with (I am an infrastructure bod) I can't solve the problem where it should have been pre-emptively solved. That is to say I can't write the code to be scalable or re-write the code to do some simple session filtering or the like (and not being a developer I get dirty looks when I point developers at information like your site ... I can hear them thinking "how dare you suggest I don't know how to code a web site you lowly infrastructure cretin").

Before developer on-line lynch me I should point out that sometimes the cause of not being able to scale a site is that I can't get in new hardware quick enough, but then who knows when you will get slashdotted right ?. So my question applies even when a developer of genius level brilliance has built a unsurpasibly scalable web site for me to run the infrastructure for.

My best guess so far is using something like HAProxy to load balance sessions, and then use it's more advanced total session count, and cookie issuing abilities to track users and bounce some at a given "heavy load" point. This isn't ideal as the heavy load point would have to be based on connection counts not server load or server response times, but it's the best I can come up with so far.

Also, having mentioned brilliant developers writing great sites not always making my question redundant, could I ask, do people normally think about coping with overload when designing scalable solution - surely they should but I don't see much talk about it. Couldn't a simple Java filter or the equivalent for other things be built into applications ? It'd be nice to have a site that not only scales, but "is nice" when waiting for the infrastructure it runs on to be scaled, which could be several days when you have to purchase new hardware.

Use somethink like nginx to serve images/css, etc. (this will help a lot with large number of connections)For example use nginx as a reverse proxy (serve everything that it can and for the rest it sends the request to the application servers)

Benchmark the app before putting in production. Something like Apache Jmeter might be useful.

Put all static data in RAM (or solid state media) if possible. Combine this with a high performance server such as nginx / lighttpd and you will see a huge difference.

Now on the application side, if possible to get the dev and ops working together:

* cache, cache, cache* Build pages in parts using server-side and client side tricks. Find out bottlenecks by breaking the big problem into smaller problems. There is no general solution , each case has to be studied, but lets take one example.

Lets take this page for example. The main part of this page is the article text. The sub parts of this page are * Who's online block (left) * Most popular article block (right) * Active forum topics block (right) * Recent comment block (right)

Lets assume that a server-side script does the following. 1. Fetch the template from the filesystem 2. Fetch the article text from the cache and embed in that template 3. Run the code to put the left and right blocks 4. Merge and prints the pageLets assume that the page is being served in 6 seconds.

Lets assume that we have benchmarked and found something like this1. Fetch template takes 0.0001 sec2. Fetch article from cache takes 0.0001 sec3. The blocks code that 5.7 second4. Merge and print take 0.0001 seconds.

Result: I as the end user has to wait around 5.x seconds just for this block on the side. I am really more interested in this article.I would prefer that I may not have the extra blocks but can access the main content rather than not being able to reach the page at all.

I think your clients might want something like that too.So perhaps one possible way to handle large load is to degrade gracefuly.

How? I have played a little with this so just take it with a grain of salt.

Separate the block loading logic. Lets assume we create a script called blockLoader.plThe job of this script is to load the correct logic for the block and render it.This script can be compact as it might not need to load allllll the code for each block.For example: the script can load the "Who's online" block in 0.01 seconds and takes only 100Kb of memory for execution. Whereas it may take 4.5 seconds to load the "Active forum topics" and consume 5MB of memory.This script alone will help on optimizing code which is the bottle-neck.

Lets go back to the page loading script.This script might now just do the following 1. Fetch the template from the filesystem (0.001 sec) 2. Fetch the article text from the cache and embed in that template (0.001 sec) 3. Merge and prints the page (0.001 sec)

Note: We have taken away the 3. Run the code to put the left and right blocks (5.x seconds)part.

The template now can use block place holderssuch as [div id=block_2][/div][div id=block_3][/div][div id=block_4][/div]

A client side script that executes onLoad knows how to handle these place-holders.Lazy loading...

* get data via ajax from /blockLoader.pl?id=2 and place the content in the div block_2* get data via ajax from /blockLoader.pl?id=3 and place the content in the div block_3etc

This way your main page will load in 0.xxx seconds and the blocks will render as they are loaded (or not)The images/CSS will come from another server so the browser will be able to simultanously load multiple content.

Not wanting to be funny, but I know all the things yuo have listed already and you are still in the mind set of "how do I make my page work properly" in the first place. For example I have used squid (even to the extend of telling it to break all internet caching standards so it actually caches the images that for some reason are served by an application and set as not cacheable - don't ask, nobody knows why).

The question is, the site IS BROKEN as it can't handle more than X number of users (no more hardware in one instance didn't help as the application killed itself when trying to deal with more than a certain number of servers in a "cluster" - again don't ask - nobody knows ... can you see a trend here ?) how to I put something in the infrastrcture to ensure the up to X people get to the site and get to browse around and everyone after the first X users get a simple "we are busy" page. From there on obviously some sort of time out of the current users would be need due to the way the web works.

Again, I know this is a site for how to make the thing work to start with, but there isn't a site for how to deal with it when it doesn't and the developers are reading the information on this site they should have read in the first place.

How brilliant can you be if you haven't fashioned a hardware replicator from chewing gum wrappers scrounged from a nearby garbage can? :-)

Other than the excellent advice given above, you might want to consider a static page only version of your site. Switch to it when load hits a certain point. Users will get a reasonable experience without being turned away and you should be able to handle very high loads.