FreeBSD + Vantec NexStar 3

A few days ago, I added an entry to the
FreeBSD
Developers Want List indicating that I would like to have a large
hard drive and USB-attachable enclosure, in order to permit me to
perform backups in a more sane manner. Santa (aka. Daniel Seuffert)
provided me with a 250GB Seagate Barracuda 7200.9 SATA2 hard drive and
a Vantec NexStar 3 USB 2.0 enclosure, and since several people were
curious as to how well this hardware was supported by FreeBSD, I
thought I should provide a brief report.

The good: It works. I installed the drive into the enclosure,
plugged in the power, and plugged the USB cable into my Dell D600
laptop, and FreeBSD 6.0-RELEASE-p4 recognized it immediately:

I could then read and write to /dev/da0 just like I would any
other 250GB hard drive. I could partition it, label it, create
filesystems on it -- everything Just Worked.

The bad: It is a bit slow. USB 2.0 can, in theory, transmit
data at 60MB/s, while according to
StorageReview
the Seagate drive has a transfer rate varying from 34.4 MB/s to 62.0
MB/s. In contrast, the transfer rate I obtained via USB was constant
at approximately 25 MB/s across the entire drive.

FreeBSD's diskinfo -c explains the reason for the poor
performance: command overhead. In contrast to my laptop's hard drive,
where there is an overhead cost of 97 microseconds for a single read
request, the USB-attached drive has an overhead cost of 730
microseconds. I imagine that this increased cost is largely due to the
USB<-->SATA translation, but also partly due to my laptop's poor
interrupt routing -- the USB controller is sharing IRQ 11 with several
other devices, and the FreeBSD kernel needs to pick up the Giant lock
to handle each interrupt.

The ugly: FreeBSD doesn't handle removal of drives very
gracefully. When I unplug the USB cable, FreeBSD recognizes that the
device is gone -- but if there is a filesystem mounted from the device,
that filesystem remains mounted. FreeBSD doesn't want to unmount the
filesystem, since it thinks the underlying device is busy; but at the
same time you (obviously) can't do anything with that filesystem. If
you ask FreeBSD to forcibly unmount the filesystem -- or if FreeBSD
shuts down, at which point it forcibly unmounts every filesystem --
then it will panic.

I imagine that this could be fixed by teaching the kernel to forcibly
unmount filesystems at the point when their underlying device is being
removed (but before freeing the data structures associated with the
device), but I'm not comfortable enough in the FreeBSD kernel to try
to make that sort of change myself. In any case, there is a very
simple answer to unplugging the drive while it has a filesystem mounted:
Don't do that!

Canadian election results trivia.

Now that the results of the 39th Canadian general election are (mostly)
in, I have looked through the numbers (helpfully provided by Elections
Canada in CSV format) and pulled out some of the more interesting
statistics:

Widest margins of victory: The widest margin of victory is in
Crowfoot, where the Conservative candidate, Kevin Sorenson, is
39,134 votes ahead of the NDP candidate, Ellen Parker. The top 14
margins of victory are all Conservative wins in Alberta; the only other
margin of victory of 25,000 votes or more is in Beauce, where
the Conservative candidate, Maxime Bernier, is 25,918 votes ahead of
the Bloc Quebecois candidate, Patrice Moore.

Votes cast: Thanks to a growing population and increased voter
turnout, 14,816,000 votes were cast, exceeding the previous record
(13,667,671 votes cast, in the 1993 federal election) by over a
million.

Votes received by the winning party: The Conservative party
received 5,371,000 votes, the third-largest total ever, after the
Progressive Conservative party in 1984 (6,278,818 votes) and the
Liberal party in 1993 (5,647,952 votes).

Votes received by the second-place party: The Liberal party
received 4,477,000 votes, the second-largest total ever for a losing
party, after the Liberal party in 1979 (which received 4,595,319 votes
-- almost half a million more than Joe Clark's Progressive
Conservatives -- but came second in the number of seats).

Proportion of votes received by the winning party: The
Conservative party received 36.3% of the votes cast, the second-lowest
proportion ever for a winning party, after the Progressive
Conservative party in 1979 (which received 35.89% of the popular vote
and formed a short-lived minority government, in spite of the Liberal
party receiving 40.11% of the popular vote).

The Green party did not win any seats, but did come second in
the riding of Wild Rose, where Sean Maw trails the
Conservative candidate, Myron Thompson, by 33558 votes. The Green
party came third in two ridings, Bruce--Grey--Owen and
Calgary West, and fourth in 223 ridings.

Note to media and blogs: Feel free to republish the above (in part or
in whole), giving credit to Colin Percival or a link to this post.

Garbage collection is evil.

For several days I've been wrestling with a peculiar performance
problem in Maple. In the Quadratic Sieve code I'm currently writing,
I use external C code to perform the sieving -- that is, I have a
QuadraticSieveSieveInterval() function which I wrote in C and call
from Maple -- and the relations are collected and filtered in Maple.
This allows me to keep the amount of C code needed to a minimum by
using Maple for some of the messy initialization (e.g., computing
modular square roots).

The performance problem arose in the "collecting relations in Maple"
part. My code is roughly as follows:

With NumberOfRelationsWanted equal to 30000, I noticed something very
odd: If I commented out the "rtab[Nrels] := rel" line -- that is, if I
counted the relations, but didn't store them -- then the code would be
faster by roughly 150 seconds. However, after collecting all the
relations, I could copy them all into a new hash table in under one
second. Somehow adding the relations to a table while they were
being generated was 200 times slower than adding the relations to a
table after they are generated.

After some exploration of Maple's profiling capabilities, I noticed an
unexpected function was (according to the profiler) using 15% of the
total CPU time: the garbage collector. This made me immediately
suspicious, since the most significant difference (aside from the very
much increased time taken) between throwing the relations away and
collecting them in a table is that collecting them means that the total
memory usage increases over time. With a bit more searching, I found
that a kernel option "gcfreq" which controls the frequency with which
Maple's garbage collector is called. The default value is "every
million words allocated"; I changed this to "every hundred million
words allocated", and suddenly my code was 160s faster -- even
with the "store the relation in a table" operation which had been
peculiarly slow, my code was now faster than it had been without that
operation before.

I'm not sure quite why my code was causing the garbage collector to
perform so poorly, but it might be related to the combination of very
small memory allocations (used by Maple) and rather large memory
allocations (in the sieving code itself). Whatever the cause, it's
worth remembering that while garbage collection isn't always
slow, it certainly can be slow and should be investigated as
a possible cause of unexplained poor performance. J.K. Rowling
remarked, via a character in the second Harry Potter book, that one
should "never trust anything that can think for itself, if you can't
see where it keeps its brain"; in much the same vein, I would suggest
that one should never trust a programming language if you can't see
where and how it allocates and deallocates memory.