Sunday, 27 February 2005

The whole family went to Auckland's annual Lantern Festival last night, marking the end of Chinese New Year celebrations. It's quite an experience --- lots of crafts and activities in tents in Albert Park, all kinds of Asian food stalls on Princes Street, huge crowds everywhere, and of course thousands of lanterns. We had a wonderful time. The highlight was a Chinese opera troupe at the end, many of them doing amazing things on stilts.

Quite a few kids got separated from their parents in the crushing crowds. I expect they were eventually reunited but I hate to imagine what a parent goes through in that situation. I wonder why we don't have some sort of RF tracking bracelet for kids to wear in such situations. Fortunately our kids are still small enough to carry, not only for safety but also to see what's going on.

In my memory Albert Park is simply and only the park I enjoyed during my undergrad years (being adjacent to the university), and I hadn't been there since I got back. It was strange to be back, yet only barely able to recognize the familiar paths and greenery beneath the festival apparatus. But it was also wonderful.

Wednesday, 23 February 2005

This is another photo from our trip in the Bay of Islands. A storm far out in the Pacific was sending 3 to 3.5 metre high swells into exposed areas which we had to pass through. ("Swell" means widely spaced but large ocean waves.) We were basically fine and in no significant danger, but it's disturbing to be in the trough with your head below the crest of the next oncoming wave, so you can't see beyond this rising mass of water in front of you. This photo shows the top of the wave above the horizon, but I guess it's one of those things that's hard to capture in a static frame.

A couple of weeks ago two big tropical storms (hurricanes) passed over islands in the South Pacific. Six fishermen were caught in the open, their boat was smashed, and they spent 24 hours clinging to wreckage before they were rescued, in waves estimated to be up to fifteen metres high. It boggles my mind. Once again it's humbling to consider what other people have endured.

Optimizing the compile/edit/debug cycle for speed is very important, especially when you do a lot of debugging by inserting print statements, as I tend to do. Working on Mozilla's layout engine means that almost everything I do requires relinking of libgklayout.so, which is a mammoth shared library. With debug information, it's about 93MB in my current builds. Link times of over a minute are normal on lower-end machines. Apart from the basic delay, I hate to just stare at the screen during this time so I often go off web surfing and lose my engagement with what I'm working on. This is a real problem so I thought it would be worthwhile to invest some time in making gklayout linking faster.

The first things I did were to build ld from the latest CVS sources and construct a benchmark script that would link gklayout in a repeatable way and time how long it took. Actually I run it a few times and take the last number, so that all files are in memory and no I/O is required, since that's how it generally is when I'm in a debugging session. The first thing I found was a real shock: the ld I built from source linked gklayout in 15 seconds, whereas the system linker on my machine took 65 seconds. I don't really understand why but I've heard that certain recent ld versions had bugs that made it perform poorly on this workload, which showed up in Fedora Core 3 and were fixed by Red Hat in the very latest version. Anyway, the good news is that I was looking at a 4x speedup without having done anything. I could have stopped there but I thought I might as well keep digging for another day or two to see what else could be done.

Next I set up oprofile. Oprofile is great; some of the more esoteric features need to be ported to the new Xeons in my machine, so I can only use oprofile to measure time, but that was enough for this project. I looked where time was being spent in the ld binary and "opreport -l ~/bin/ld --symbols" showed that we were spending about 35% of the time stalled in the function walk_wild_sections. (BTW I'm going to gloss over all the time I spent going down blind alleys and figuring out how ld actually works :-).) Basically the problem is that there's a loop over a list of "wildcard specs"; some of the wildcards are just plain strings, and others have wildcard characters like '*', and they get handled differently. According to "opannotate --source --assembly ~/bin/ld", the conditional branch controlling which kind of string is present seemed to be very poorly predicted so we spent a lot of time there waiting for the pipeline to refill. And this is actually inside a huge loop where we iterate over all sections of all object files in the link, which is where ld seems to spend most of its time.

The first thing I did was to introduce special handling for the case when there's only one wildcard spec in the list, and it's not really wild, just a simple string. In that case we can actually find the relevant section in each file, if there is one, by looking up a hash table instead of iterating through all sections of the file. So not only do we eliminate the mispredicted branch, but we actually speed up things at a higher level. This sped the link up from 15 seconds to 12 seconds, a 20% improvement.

Further analysis showed that we still spent time stalled on the same instruction as before. I realized that the wildcard lists that are used by the default linker configuration have only a small number of different forms: one simple string, one wildcard string, one simple string and one wildcard string, one simple string and two wildcard strings, and two simple strings and two wildcard strings. I also noticed that no two elements of a wildcard list can ever match the same section name, which we can also use to speed things up. So I wrote specialized walk_wild_section routines for each of those five cases. These routines do not need to dynamically test whether a wildcard spec is a simple string or not, that's hard coded into the code. So we eliminate all the conditional branch stalls that were hurting us before. Sadly, that only produced an additional 5% improvement (down to about 11.3 seconds). That's why profiling is so important; real-world performance often confounds your expectations, especially when you're working this close to a complex microarchitecture.

Now it looked like most time was in bfd_hash_lookup, but it's called from many places and I couldn't find who the problematic caller(s) were. (Oprofile can do this on x86 but the code hasn't been ported to x86-64 yet. A project for another day perhaps.) So I did "poor man's profiling" by running in gdb, hitting ctrl-C periodically and looking at the stack backtrace. I was surprised to find myself spending most of the time doing UTF8 character conversion in glibc. Sure enough oprofile confirmed we spent a lot more time in glibc than in ld now. In fact we're doing UTF8 conversion because the glibc wildcard matching routines handle Unicode strings, which is complete overkill for ld section names! Furthermore almost all the wildcard patterns used in the default linker configuration are just a simple string followed by a single '*', which are trivial to match. So I specialized my special-case walk_wild_section functions even further to work just on those patterns, and hand-wrote a small inlineable match function that does no character set conversion (but will still work if section names and wildcard specs happen to be UTF8). The results are astounding: link time is down to 5 seconds, a 3x improvement since I started my work, and a 12x improvement over what I'm used to!

At the beginning I was thinking I might have to do something really clever to parallelize ld across multiple processors, but now that just isn't worth it, not for this workload anyway.

I'll try to get these patches fed upstream so all ld users can take advantage of these optimizations.

Tuesday, 15 February 2005

People often ask me what's different between the USA and New Zealand. I usually tell them that they're quite similar --- wealthy Western English-speaking democracies, so the broad patterns of life are the same, except for one thing: Americans know they are the global superpower, New Zealanders know their country is a minnow, and such knowledge colours everything.

Of course there are many differences in details. I think in small ways New Zealand is more user-friendly. A small example that I notice often: at restaurants in the USA, when you finish your meal, sales tax is added to your bill and then you have to calculate a tip on top of that. In New Zealand tax is usually included in the menu prices and no tip is expected. It's a bit more "what you see is what you get" and a bit less fussing around at the end of each meal.

Monday, 14 February 2005

I've just set up a swank new machine: dual Xeon Nocona 3.4 GHz (x86-64, hyperthreading) with 2GB RAM. To get the fastest possible Mozilla builds on a machine like this you use parallel make. The question is, how many simultaneous processes should make run to maximise throughput? Too few, and the system is underutilized (some processor(s) will go idle waiting for I/O or just have nothing to do), too many and the system will thrash on memory or I/O access.

So I did the virtuous thing and benchmarked it over the weekend. I tested setting the process limit between 1 and 8. For each test I did "rm -rf *; ../configure; time make -jN" three times and took the fastest run as definitive. I was building Firefox with "--enable-extensions=all" (a few don't actually get built for various reasons), optimized at -O3, no debug, for the x86-64 architecture. Then I plotted the results as "builds per hour" for each setting. The results are below.

So it looks simple: throughput basically tops out at 4 processes, which makes sense because I have two CPUs each supporting two hardware execution contexts simultaneously. I thought there might be a measurable improvement at 5 or more processes, so that something can run when one of the hardware contexts is stalled on I/O, but there isn't. Perhaps there's enough memory that there aren't many I/O stalls and the ones that happen block all processes. Interestingly there's no significant dropoff even at 8 processes, although there must be eventually as the overhead of juggling lots of simultaneous processes grows.

Anyway the bottom line is that I can build Firefox from scratch in 13 minutes 45 seconds :-).

Over the weekend we went looking for Chinese shops in Howick and Pakuranga, where lots of Asian immigrants live. Mostly by luck we located the Somerville shopping centre, which has about 50 mostly Chinese restaurants and other vendors. Very cool and lots to explore. We ended up at Peninsula Cafe, a Hong Kong-style cafe where we had a delicious meal of ox-brisket on fried noodles and pork schnitzel. The milkshake with crushed pineapple was new to me, and I recommend it! Too bad it's all about 30 minutes drive from home.

On Saturday we had yumcha lunch at Sunshine with family. I continue to be impressed and I'm quite confident it's better than anything in New York.

I forgot to mention that last weekend we had lunch at Happy Valley in Newmarket, yet another Hong Kong-style cafe. It was Janet's favourite place when she lived here ten years ago, and it's still in good form. Recommended.

My parents' house has a lot of books that I wouldn't normally buy or read, mostly not to my taste but also a lot of real gems. Over the last month I read THE PAST IS MYSELF, Christabel Bielenberg's memoir of life in Nazi Germany in the years leading up to and during World War II, as a "normal person" having some contact with the German opposition to Hitler. It's a well-written, moving, intriguing, thought-provoking book --- well worth the time. It would be an excellent book to study in schools.

The book forces me to consider the questions of what I *should* and *would* do if I found myself in a similar situation. Nazi opponents had a spectrum of options but no clear idea which would prove moral, prudent or effective. At one end, direct and open opposition would be noble but probably short-lived and ineffective. At the other end, a passive struggle for the survival of oneself and loved ones could easily become tacit cooperation with the regime. But my great fear is that my choices would become clear and out of cowardice I would take the safe and venal path, surely the greatest sin of all.

The book also troubles me because I imagine what the Nazis could have achieved with the emerging technology of the 21st century. I'm forced to conclude it would have suited them very well. With their software running on every device, in tamperproof hardware, sporting diverse sensors, chattering wirelessly --- and with computers to sort and mine the data --- it's unlikely resistance could remain organized and secret nearly as long as it did. The new science of mind-reading with MRI and related sensing will likely transcend the fantasies of the Gestapo and even Orwell. Are we who look forward to universal democracy like people climbing a ridge in fog, blissfully unaware of steep cliffs on either side? Hmm. Let God have mercy on us.

Friday, 11 February 2005

Over the last few weeks I've mainly been catching up on bugmail and resolving as many of the bugs as I can, especially layout and crashing regressions. I've also been in touch with other Novell people who have Mozilla issues and I've been doing some work on those bugs. And I've done some work on security fixes. I thought I was about done with that early this week, and ready to move on to more interesting stuff, but then my new hardware arrived.

My primary box will be a dual Xeon. To my pleasant surprise it turned out to be x86-64 capable, so I just spent hours tracking down, burning and installing a SUSE 9.2 x86-64 image (actually it's still going...) I do wonder whether x86-64 is a wise move for a primary box, at this stage, but I just can't resist that sexy instruction set ... sixteen delicious general purpose registers. Mmmm mmm. Anyway I guess this means I'll be in charge of tracking down whatever 64-bit and x86-64 specific bugs are in Mozilla.

Anyway, after I've got this thing running --- and the other server that I haven't even unboxed yet, which I plan to dedicate to continuous performance and regression tests --- I'll be focusing on fixing some serious issues with floats breaking across column boundaries, with the goal of a really solid columns implementation for Gecko 1.8/FF 1.1. Then I hope to be digging into some really interesting stuff: Cairo graphics, and Mono integration. I'll blog more about these later.

Thursday, 10 February 2005

I've had a chance to visit a few more lunch offerings around downtown Auckland. Here's how they stack up...

Sunshine continues to excel. Really really good stuff. I have no words.

Grand Harbour was good as expected but I didn't like it quite as much as Sunshine. That may be because we only had two people at Grand Harbour, or maybe our luck with the carts was down that day.

BBQ King is an interesting place off mid-Queen-St. Highly recommended by my friend Stephen, the wonton soup I had was pretty good but it was a little crowded and chaotic. Not that that's a bad thing really.

The food court in the downtown shopping centre looks good on the surface but every time I go there I try to get beef brisket and they've run out. So I'm not impressed.

Janet and I went to the food court in Atrium on Eliot today, up behind the old Civic Theatre. WOW. I think it wins the battle of the food courts, by a gallop. There's a wide range of Asian places --- Malaysian, Thai, Indonesian, sushi, Chinese, Indian --- but there's also a dim sum outlet (attached to the full-service dim sum restaurant next door), and non-Asian options such as Turkish, a roast-meal place, and a hot dog stand. There's also Uncle Jack's, a Taiwanese drink-and-snack bar, the sort that offers "Mango Slush with Pearls and Pudding" (recommended!) and "Deep Fried Taiwanese Pork Chop" (unsure). This food court has spectacular variety, immaculate presentation, and reasonable prices. Highly recommended.

I still haven't gotten used to the ultrahip Chinese kids behind the counters rattling away in Mandarin and then speaking to me with a broad Kiwi accent. But I like it.

I need to keep exploring the food courts, and there are also a number of independent restaurants to try --- including our old yumcha haunt Empress Garden, and dim sum at the Dragon restaurant at the Atrium. Mmmm mmm. It's a good thing I walk to work.

I've been back for four weeks now. Have I woken up from the dream and seen that my visions of living New Zealand were cruel illusions, that I can't go home again? No! There are little annoyances but the broad strokes of life are as expected. Life here is very good and I am thankful.