When a service is very popular, a single
machine probably will not be able to keep up with the number of
requests the service has to handle. In this situation, the solution
is to add more machines and to distribute the load amongst them. From
the user's point of view, the use of multiple
servers must be completely transparent; users must still have a
single access point to the service (i.e., the same single URL) even
though there may be many machines with different server names
actually delivering the service. The requests must also be properly
distributed across the machines: not simply by giving equal numbers
of requests to each machine, but rather by giving each machine a load
that reflects its actual capabilities, given that not all machines
are built with identical hardware. This leads to the need for some
smart load-balancing techniques.

All current load-balancing techniques are based on a central machine
that dispatches all incoming requests to machines that do the real
processing. Think of it as the only entrance into a building with a
doorkeeper directing people into different rooms, all of which have
identical contents but possibly a different number of clerks.
Regardless of what room they're directed to, all
people use the entrance door to enter and exit the building, and an
observer located outside the building cannot tell what room people
are visiting. The same thing happens with the cluster of
servers—users send their browsers to URLs, and back come the
pages they requested. They remain unaware of the particular machines
from which their browsers collected their pages.

No matter what load-balancing technique is used, it should always be
straightforward to be able to tell the central machine that a new
machine is available or that some machine is not available any more.

How does this long introduction relate to the upgrade problem?
Simple. When a particular machine requires upgrading, the dispatching
server is told to stop sending requests to that machine. All the
requests currently being executed must be left to complete, at which
point whatever maintenance and upgrade work is to be done can be
carried out. Once the work is complete and has been tested to ensure
that everything works correctly, the central machine can be told that
it can again send requests to the newly upgraded machine. At no point
has there been any interruption of service or any indication to users
that anything has occurred. Note that for some services, particularly
ones to which users must log in, the wait for all the users to either
log out or time out may be considerable. Thus, some sites stop
requests to a machine at the end of the working day, in the hope that
all requests will have completed or timed out by the morning.

How do we talk to the central machine? This depends on the
load-balancing technology that is implemented and is beyond the scope
of this book. The references section at the end of this chapter gives
a list of relevant online resources.