If there are no idle backends, an exhaustive search over all backends
is performed. This should improve scheduling quality noticeably.
Also, remove some cruft from the main scheduler logic function
and place almost everything in the main loop.

This commit removes the FS_TIME_SCALE_OFFSET constant, replacing it by
a shift counter (SCHED_TIME_BITS). The score is now calculated simply as:
nreq << STB | ~(delta)
which reverses the comparison (lower score is more suitable now).
The components are now saturated instead of wrapping around. This has two
consequences:
1. The scheduler now always chooses the backend with the smallest number
of pending requests (FS_TIME_SCALE_OFFSET is effectively infinite)
2. The value of SCHED_TIME_BITS is not critical as it affects scheduling
quality only when the components exceed their maximum values

Traverse the tree of shm segments post-order (instead of in-order).
This ensures that every visited node is a valid part of the tree,
reachable from the root and no null pointers appear in ->left and
->right members.

If you are using lots and lots of backend servers, you may need to
increase this value. Raw data size is two native words (8 bytes
on 32 bit arches, 16 bytes on 64 bits) but the overhead may be
pretty big, so you just have to experiment.
Example usage:
http {
# ...
upstream_fair_shm_size 32K; # this is the minimum
# ...
}

Replace the array of timestamps with a single value of an atomic
type and remove padding. On i386, the struct will most likely be
8 bytes long (16 bytes on 64-bit arches), which gives a not so bad
chance of fitting a whole upstream block in a single cacheline
(depending on CPU and number of backends).
Putting every ngx_http_upstream_fair_shared_t is pretty pointless
since a few commits, because we touch all of them at the same time,
when looking for an idle backend.

The shared memory segments are now tracked via an rbtree with proper
lifetime management (via refcounting). Starting with this commit,
nginx survives a continuous reload-every-1s cycle while serving
requests all the time.

Now the time since last activity is "positive", i.e. the bigger the time,
the greater chance of selecting this backend.
The point of this change is to keep scheduler scores roughly decreasing in
round-robin order, and to keep the iteration count of the main loop of
ngx_http_upstream_choose_fair_peer as short as possible.

The algorithm is very simple but deserves a bit more explanation
than a commit message, so please wait patiently for more
thorough documentation.
Note: the lockless access will probably be insufficient,
expect spinlocks soon.

The shared memory is accessed without any locks. The number of
requests is updated atomically and the last activity time uses
a rotating set of slots a'la ngx_times.c
Note: the numer of time slots may need to be increased under heavy
load, but on the other hand, corrupted reads should lead in the
worst case to non-optimal load balancing. This needs further testing.