The busy freebsd-update server

Based on what I saw when FreeBSD 6.3-RELEASE was announced, I didn't
expect any problems -- there was a visible increase in traffic, but it
didn't come anywhere close to tying up the server. I hadn't accounted
for two important factors:

Upgrading from FreeBSD 6.x to FreeBSD 7.0 involves more and larger
updates than upgrading to FreeBSD 6.3.

In total, update1.freebsd.org handled 50.1 million HTTP
requests -- an average of 58 requests per second -- serving up 130939
distinct files and patches totalling 39.9 GB -- an average data rate
of 3.7 Mbps (not counting HTTP/TCP/IP overhead). The effect over this
traffic on the server is perhaps best illustrated by the following two
MRTG graphs; the first graph shows total and active Apache processes,
while the second shows incoming and outgoing bandwidth:

A few notes are in order concerning the above graphs:

This server has an uplink capped at 10 Mbps; on several occasions
it came very close to that limit.

The primary reason it didn't hit the 10 Mbps limit more is that for
five hours Apache was at the maximum number of processes I had
configured (100) and all of them were busy handling requests. When I
woke up on Thursday morning (around 1800 UTC -- 10AM in my time zone)
I logged in and increased Apache's process limit.

When I wrote the code for converting MRTG bandwidth statistics into
95th percentile and GB/month values, I didn't bother handling leap
years.

In short, the FreeBSD Update server was handling about as much traffic
as it is capable of handling (at least unless its uplink is upgraded
and I switch from Apache to a faster web server), and there were
most likely some people who tried to use FreeBSD Update between 1200
UTC and 1800 UTC and found that the server was either very slow or
completely unresponsive. If you had problems upgrading, please try
again later -- perhaps a random day next week, since as I write this
I already see the load increasing as Friday afternoon (UTC) approaches.
For myself, I've learned an important lesson: Next time there's a
FreeBSD release, I'm going to make sure there are several FreeBSD
Update mirrors ready to share the load.

One final addendum: While my bsdiff binary patching tools is usually
highly efficient -- for security updates, it routinely provides a
greater than fifty-fold reduction in download size -- it performed
quite poorly overall at producing patches for upgrading from FreeBSD
6.x to FreeBSD 7.0, providing only a five-fold reduction in download
size. Why? Because FreeBSD 6.x uses gcc 3.4, while
FreeBSD 7.0 uses gcc 4.2. Such a major change in compiler means that
even binaries compiled from identical source code differ throughout,
dramatically reducing the potential for bsdiff (or any other binary
patch tool) to identify similarities. Let this be a lesson to
anyone who uses binary patches to update devices: Think twice before
changing compilers!

The (good) deal with freebsd-update(8)

Earlier today, I stumbled across a blog post by Radu Cristian Fotescu
entitled
The
(bad) deal with freebsd-update(8), which (as the title suggests)
casts FreeBSD Update in a rather unfavourable light. Since the author
is misinformed about several details, I'm taking this opportunity to
set the record straight.

First, the author points out that there is an older version of FreeBSD
Update available in the ports tree, which he states "can only fetch
updates for FreeBSD 6.1". In fact, the version in the ports tree works
for releases dating back to FreeBSD 4.7 (although it obviously doesn't
provide binary updates to fix bugs which were uncovered after a release
ceased to be supported by the FreeBSD Security Team). The only releases
which the version of FreeBSD Update in the ports tree does not support
are FreeBSD 6.2 and up -- versions of FreeBSD which contain a new (and
vastly improved) version of FreeBSD Update in the base system. Once
FreeBSD Update is in all supported FreeBSD releases (i.e., in June) I'll
remove the old FreeBSD Update code from the ports tree.

Next, the author questions the logic of having "64-byte keys"
(actually, 64 hexadecimal digit keys) as file names, and suggests that
this makes FreeBSD Update overly complex. Nothing could be further
from the truth: In fact, as I described in my
BSDCan'07 talk, the "Reference
by [SHA256] hash" method makes both FreeBSD Update and Portsnap far
simpler than they would otherwise be.

The author then moves on to speaking of "a patch applied to a given
release and patch level", thereby demonstrating a fundamental
misunderstanding of how FreeBSD Update works. In the author's mind
(apparently), to update a system from FreeBSD 6.2-RELEASE-p9 to
FreeBSD 6.2-RELEASE-p10, FreeBSD Update downloads a (single) patch
and applies it. Not so; rather, FreeBSD Update fetches a file which tells
it what FreeBSD 6.2-RELEASE-p10 looks like. FreeBSD Update then makes
the system look like that: It can leave files alone if they are already
up to date (or if the user has asked it to leave those files alone); or
it can download or generate the new versions of files. Put another
way, in most patching systems, the server will answer the question "how
do I get there from here?" -- with FreeBSD Update, the server merely
answers the question "where should I be going?" and leaves it up to
the FreeBSD Update client to figure out how to get there.

Related to this error is another mistake which immediately follows: The
author asserts that the "full new binaries" are not available. In
fact, for every file which appears in a (recent) FreeBSD release, or in
a FreeBSD release plus patches, is available via the FreeBSD Update
server. (I was concerned that I might be technically violating the
GPL on some files by this fact, until I remembered that the FreeBSD
source code is also distributed via FreeBSD Update.) FreeBSD Update
uses patches in exactly the same way as Portsnap: As I described in
my BSDCan'07 talk (linked above), FreeBSD Update and Portsnap rely on
"opportunistic patching" -- they start out by attempting to fetch
patches and apply them, but if anything goes wrong (the patch isn't
available, the file generated by patching has the wrong SHA256 hash,
et cetera), they gracefully fall back to fetching the complete file.

Next, the author points out that the list of binary patches used for
updating to FreeBSD 6.3 is publicly visible. Oops -- this is fixed
now. I don't have any desire to keep this list of file names secret,
but there are two very good practical reasons for turning off the
directory indexing: First, Apache processes
chew
up lots of RAM when generating large directory listings; and
second, I was having problems with robots ignoring my "don't crawl
here" directives in robots.txt and loading down my server
with large numbers of pointless requests.

Moving on, the author points to the approach of RedHat, Debian, and
Mandriva, of distributing entirely new package tarballs, as a model
to be emulated. I don't know how fast the author's internet connection
is, but I know one of the most frequent comments I hear about FreeBSD
Update is how incredibly fast it is. This is what binary patches do
for you -- provide a fifty-fold reduction in the bandwidth needed to
download security updates. The tool I wrote for this purpose -- bsdiff
-- is now used by Apple, FireFox, Sophos, and probably Amazon's Kindle
(in this last case, I haven't heard from any developers, but they have
bsdiff code on the device, so presumably they're using it) in addition
to FreeBSD, and in the summer of 2006 I calculated that it had saved
users upwards of 100 person-years of waiting for updates to download.
Returning to downloading complete tarballs every time a small change
is made might be simple, but it wouldn't be very popular with many
people who have to wait for said tarballs to download!

Finally, the author complains that he can't find the FreeBSD Update
server code. As a comment to the blog entry points out, the server
code is in the FreeBSD projects repository.