Syndication

The text below pretty much speaks for itself. Bold highlighting and
numbered footnotes in [square brackets] are mine; all the rest is as I
received it. Some irregularities of spacing and punctuation, visible in the
original email, aren't obvious in the HTML. Names of the students are
redacted because (after finding several more copies on the Web) I imagine
the students are relatively innocent victims of bad advice. Name of the
institution not redacted because I hope others who receive such letters and
look for them on the Web will be able to easily find this posting.

I just got back from a trip to multiple conferences in Ontario, and that
makes it a good time to update my
publications page. Most
people interested in my academic work are likely to find out about it from
other sources, but I'm going to post some brief and relatively non-technical
introductions here as well for my general Web site readers. The official
versions of these papers are mostly behind paywalls, but there are also
unofficial "preprint" versions available to all.

The Firefox GUI becomes more annoying with each "upgrade." I don't know
if they're taking bribes from Chrome, or if they took advice from the same
"professional" UI designer who broke GIMP, or what, but it's really become a
problem. For those who haven't given up on Firefox yet, however, and for my
own future reference, here's something useful I managed to figure out after
a lot of hair-tearing.

You start typing a partial URL into the location bar, and the drop-down
list of suggestions appears. But there's a URL on that list that should
not be there. Maybe it's something embarassing you don't want other
users of your browser to see; maybe it's merely a site other than the one
you want to be the match for the few characters you typed, and yet for some
reason it keeps coming up as the preferred suggestion.

When I was preparing the Tsukurimashou 0.7 release, I
had to build the entire package several times from scratch, to verify that
all the necessary pieces really were included in what I was preparing to
ship. When I run the build on my development machine it normally re-uses a
lot of previously-built components, only updating the parts I have recently
changed. That kind of incremental compilation is one of the main functions
of GNU Make. But if I'm shipping a package for others to use, it has to
work on their systems which don't have a previous history of successful
builds; so I need to verify that it will actually build successfully in such
an environment, and verifying that means copying the release-candidate
package into a fresh empty directory on my own system and checking that the
entire package (including all optional features) can build there.

Tsukurimashou is a big, complicated package. It's roughly 92,000 lines
of code, which may not sound like so much. For comparison, the current
Linux kernel is about 15,000,000. Tsukurimashou's volume of code is roughly
equivalent to an 0.99 version of Linux (not clear which one - I couldn't
find numbers I trusted on the Web just now, and am not motivated to go
downloading old kernel sources just to count the lines). However, as
detailed in one of my earlier
articles, Tsukurimashou as a font meta-family is structured much
differently from an orthodox software package. Things in the Tsukurimashou
build tend to multiply rather than adding; and one practical consequence is
that building from these 92,000 lines of code, when all the optional
features are enabled, produces as many output and intermediate files and
takes as much computation as we might expect of a much larger package. A
full build of Tsukurimashou maxes out my quad-core computer for six or eight
hours, and fills about 4G of disk space.

So after a few days of building over and over, it occurred to me that I'd
really like to know where all the time was going. I had a pretty good
understanding of what the build process was doing, because I
created it myself; but I had no quantitative data on the relative resource
consumption of the different components, I had no basis to make even
plausible guesses about that, and quantitative data would be really useful.
In software development we often study this sort of thing on the tiny scale,
nanoseconds to milliseconds, using profiling tools that measure the time
consumption of different parts of a program. What I really wanted for my
build system was a coarse-grained profiler: something that could analyse the
eight-hour run of the full build and give me stats at the level of processes
and Makefile recipes.

Here are the
slides (PDF) and an audio recording (MP3, 25 megabytes,
54 minutes) from a talk I gave today about one of my research projects.
You'll get more out of it if you have some computer science background, but
I hope it'll also be accessible and interesting to those of my readers who don't.
I managed to work in Curious George, Sesame Street, electronics, XKCD, the
meaning of "truth," and a piece of software called ECCHI.
I plan to distribute the "Enhanced Cycle Counter and
Hamiltonian Integrator" publicly at some point in the future. Maybe not
until after the rewrite, though.

Abstract for the talk:

It is a #P-complete problem to find the number of subgraphs
of a given labelled graph that are cycles. Practical work on this
problem splits into two streams: there are applications for counting
cycles in large numbers of small graphs (for instance, all 12.3
million graphs with up to ten vertices) and software to serve that
need; and there are applications for counting the cycles in just a few
large graphs (for instance, hypercubes). Existing automated techniques
work very well on small graphs. In this talk I review my own and
others' work on large graphs, where the existing results have until
now required a large amount of human participation, and I discuss an
automated system for solving the problem in large graphs.

It's a very common pattern in the Han writing system that a character
will be made of two parts that are themselves characters, or at least
elements resembling characters, placed one above the other or one next to
the other. For instance, 音 (sound) can be split into 立 (stand up) above
日 (day); and 村 (village) can be split into 木 (tree) next to 寸 (inch).
This kind of structure can be nested, as in 語 (language).
One can do a sort of gematria with the meanings, (what exactly
is the deep significance of "village = tree + inch"?) but that's not the
direction I'm interested in going today.
Here's the thing: in the Tsukurimashou
project, these two ways of constructing characters each correspond to a
piece of code that's invoked many times throughout the system, and I thought
it would be interesting to look at how often the different parameter values
are used.

We see a sloppily-parked car and we think "what a terrible
driver," not "he must have been in a real hurry." Someone keeps bumping into
you at a concert and you think "what a jerk," not "poor guy, people must
keep bumping into him." A policeman beats up a protestor and we think "what
an awful person," not "what terrible training." The mistake is so common
that in 1977 Lee Ross decided to name it the "fundamental attribution
error": we attribute people’s behavior to their personality, not their
situation.

I haven't had very good luck with computer hardware, nor operating systems, in the last few months. I lost a hard drive in my main desktop computer at home, and had to replace that (no data loss because it waS RAIDed); the latest Arch Linux "upgrade" made my computer unbootable because the maintainers decided they had to move everything from /lib into /usr/lib and the documented procedure for doing the upgrade safely didn't cover oddball configuration cases like having GCC installed (because who would have that?); and now my LCD monitor is dying.