fxn: I haven't gotten the chance to look at Brian Ford's paper, but it sure is intriguing.

If indeed ZF comes crashing down, that would really validate my decision to make proofs in Ghilbert truly portable with respect to axioms, rather than trying to pick a single axiom system that is both powerful enough for the work at hand, but not so powerful as to raise questions about soundness, or raise problems porting the proofs over to other systems.

That said, the timing could have been a little better for me; as a demonstration of Ghilbert's portability, I picked a construction of HOL logic in ZF[C] set theory, and it's only a few proofs away from being done (of the core HOL axioms, I have only the law of the excluded middle (BOOL_CASES) and a theorem to introduce the iota operator). Oh well.

Busy, busy

It has been a busy month since the last time I blogged. Among other things, we had an Artifex staff meeting, and also tor stayed for a couple of weeks. We had a great time talking serious 2D graphics, color, and so on, plus hanging out with the family.

Kids

I'm proud of a couple photos I took on Alan's 8th birthday; the portrait of him, and another of the brothers posing with a penguin prop. Both were taken with my archaic Rapid Omega 100 on slow B/W film (6x7cm), then scanned on my brand new Epson 4870 scanner (highly recommended, btw).

Alan's photo page has unbelievably high Google-juice - looking at the referer logs, the most popular images are the ones that currently show up as #1 and #2 for a Google image search on happy. Even more exciting, a photo from last year of him playing his guitar is now on the cover of a book recently published by Wesleyan University Press.

Max, who just turned four last week, is developing quite a subtle sense of humor. A few weeks ago, when I was reading him Eating the alphabet, he asked if I would get him a Xigua melon. I answered that I didn't remember ever seeing one in a store, but that if I did see one, I'd get it for him, then we'd be able to tell if it's as good as a watermelon, or better (watermelon is one of Max's absolute favorites right now). Max replied that he knew that Xiguas are way better than watermelon. I asked him how he knew that, and he said: I can see inside people's minds. Even fruit. And that Xigua melon is thinking, "I'm way better than a watermelon".

More later

"More later" is one of Josh Marshall's favorite catchphrases, and people rib him for it. I've been doing quite a bit of deep thinking lately about the place of free software in the world, among other things, and I do want to write about it, but it will have to wait until I have a bit more time.

I turned '"' today. I can't think of any particular meaning to that
number. Perhaps it is interesting precisely because it is the first
uninteresting number.

Letter quality displays

Robb
Beal asked me a couple of weeks ago how to get support for letter
quality displays in Linux. It's a good question, and it deserves an
in-depth answer.

I want to go over the highlights here, though. There are really three parts
to the question. First, what can users do to push things along? Second,
what are the "configuration science" problems that need to be solved?
Third, what are the imaging problems that need to be solved to get
high resolution bitmaps to the display hardware?

As a user, probably the most important thing is to let your friendly
local developers know that you're interested in having letter quality
displays supported. There are a bunch of ways to do so. If I expand
this into a full post, I'll have two sample letters. One says,
briefly, "you suck, free software developers are commies, by the way
why doesn't your software support LQ?" The other says, "I love your
software and would love it even more if it had LQ support. By the way,
do you have a paypal address or maybe an Amazon wishlist so I can give
you a small token of my appreciation?"

Indeed, funding is a large part of the problem, not least because the
prototype displays are still rather expensive. Of course, after
Microsoft rolls out their LQ stuff, then the hardware will be
available at commodity prices, and then as the gear filters into the
hands of free software developers, there will naturally be more
motivation to support it. It's a shame that we, as a community, suck
so hard at organized fund-raising, or else we'd have already put a few
of these displays in the hands of the most important developers, as of
course Microsoft did years ago.

Yes, that's a whine. On to the configuration science part of the
question. The primary configuration parameter that must be discovered
by applications and negotiated between application and display
subsystem is the "pixels per px" ratio (which I'll abbreviate here as
pppx). The px is the fundamental unit of constructing user interfaces,
and has no preset size. On current display hardware, the pppx ratio is
locked in at 1. If you like, "LQ support" could well be defined as the
support of pppx ratios other than 1.

Many people believe (misguidedly, in my humble opinion), that the
"correct" configuration parameter is the display dpi. Indeed, it's
fairly easy to come by values for this - the EDID standard provides a
way for the display to tell it to the video card, and most OS's now
provide at least some support to access the info (in X, DisplayWidth
and DisplayWidthMM; in Windows, similar information can be obtained
from GetDeviceCaps; in Mac OS X, CGDisplayScreenSize).

UI's are built in units of px units, which, because of currently
available display technologies, are identified with pixels. They could
have been built out of some physical unit, for example points, so that
getting a larger DPI value out of the EDID would cause applications to
automatically draw their UI's with more pixels. (indeed, for quite
some time a display resolution of 72dpi was so common that many people
identified pixel and point, as well). If the technology had evolved
that way, then LQ support wouldn't have been espcially
difficult. However, technology didn't evolve that way. I don't
see this as entirely a bad thing - there are good things about a
px-based coordinate system, which I think I'll detail if this becomes
a full-length article.

So, the negotiation has to happen something like this. The OS knows
that a pppx value of, say, 2, is available. If the app is not
LQ-aware, then as it asks for a display surface, the OS knows to give
it one with 100 dpi "pixels", so that each pixel that the app
draws shows up on the hardware as a 2x2 square of pixels (one can
imagine other choices here, for example, drawing text at the full device
resolution, but it will be a compromise in any case, and the Real
Solution is always to make the app LQ-aware. The chunky pixels might
not look that good, but at least they won't break anything or
introduce bizarre new artifacts).

But if the app asks for the range of available pppx values, the OS
will tell it that 2 is available, and then the app can ask for a
drawing surface that matches the physical display resolution.

Once we get to that point, then the app has to actually come up with
the higher resolution pixels. But this is a relatively straightforward
technical problem, and different applications can solve it each in
their own best way. Certainly, having a good 2D graphics library under
the hood would be nice, but there are a few around (even if one of
the more well-known was written by a "monkey with a PhD in low level
coding" :), and if none are satisfactory, you can write your own.

By contrast, configuration science problems have a way of not being
readily solved. My favorite running example is what happens when you
press the backspace key. After many years, Linux systems have pretty
much settled down, but if I login from my MacOS X laptop to my Linux
box and open a forwarded X11 terminal, I still find that
pressing backspace sometimes prints '^H' rather than deleting the
character to the left of the cursor. Let us pray that we can do a
better job with LQ configuration.

I think Microsoft is on to something here: find smart people who know
what they're talking about, give them money and toys, and listen to
them. It seems to work well for them, but there's no reason for them,
or even evil proprietary software companies in general, to have a
monopoly on this business method. Who
knows, maybe we ingenious free software types can someday find a way
to adapt it to our universe.

Not that I'm personally complaining, mind you. I really enjoy my job
working on Ghostscript, and get to play with lots of cool graphics
toys - I don't have a 200 dpi monitor in my studio yet, but Miles has offered to pay half for
one, so it's tempting. I have a gut feeling that a letter quality
desktop panel will be available within a few months at consumer
pricing, at which point I'll jump on it.

If some kind benefactor wanted to set the cause of the free
desktop forward a few months or more, one of the most effective things
they could do is spread a few of those panels around to the people
in the free software community who can make the best use of them: X,
Gnome, KDE, Mozilla, etc. (I'd happily accept such a donation but
would be equally happy to see it go to others for whom it's a more
pressing need).

Indeed, the imminent arrival of letter quality displays will present a
very crisp test of the health of the various organizations responsible
for generating the relevant software. It's a pretty safe prediction
that Microsoft will not only get the software mostly right, but also
nearly singlehandedly create the mass market for these displays. Mac
OS faces a choice - Apple can either lead as they did with FireWire,
802.11 and DVD burners, or they can drag behind and let the Wintel
world kick their butt for a while, as they did with raw CPU power up
until their shipment of the IBM 970. Sun will probably manage to screw
up Java support royally - during the transition, I'm sure you'll see
teeny fonts, chunky pixels, and related artifacts for quite some
time. Mozilla won't even start to get its act together until there's
reasonably good support for letter quality on platforms other than
Internet Explorer (although the W3's sensible definition of the px
will help them get to the goal once they really get started). And of
course, it's reasonable to predict that the free software community
will eventually get it right, but that it will take quite some time.

I think there are two other potential winners from the disruption to
be caused by this technology: PDF and Flash. The win for PDF is pretty
obvious; today's dot-matrix screens are just not quite good enough to
display 8.5 x 11 inch pages with reasonable quality, in much the same
way as early-'80s dot matrix printers were not quite good enough to
render such pages on paper. The win for Flash is not quite as obvious,
but I think just about as compelling. Once you get past Flash's
reputation as the blink tag of the dot-com boom times, the underlying
technology is actually pretty impressive. Existing Flash applications
will immediately start looking good on letter quality displays once
the client supports the devices, and Flash will continue to be one of
the most painless ways to deliver such applications. Among other
things, it's pretty darned cross-platform already, and that will
probably just get better.

Interesting times ahead, that's for sure.

Advogato's DNS

I had DNS for advogato.org rather poorly configured. A power outage
here this evening that took out the primary, and none of the
secondaries thought they were authoritative, so oops. I also had the
timeouts set quite short because of the recent server move, so double
oops. Everything should be a lot more robust now.

I've posted the choice
thread on ghilbert.org. For those of you who are slavishly
following the development of Ghilbert, or fans of the Axiom of Choice,
it should offer a glimmer of enlightenment.

The New York Times reaches about 1.5 million people. This posting is
possibly of interest to two dozen. But the difference between my blog
and the NYT is that my post will reach those two dozen :)

Version control

As NTK says, No
self-respecting Thinker Of Hard Thoughts these days is without their
own Deep Theory Of How To Do Version Control. It's not surprising to
see so much activity in this sphere now. CVS has been broken for a
long time, and it's now clear that Subversion only solves some
of the problems of CVS.

I haven't actually played with Codeville yet, but I look forward to
it. When Bram puts his mind to something, it often turns out well. I
was also very interested to see Ken Shalk's CodeCon presentation on
Vesta, a project I've actually been following since its inception
about a decade ago.

The bottom line is that I think Vesta gets a few things very right,
but some of the design decisions are going to hold it back from
hitting the big time. Vesta is a source repository, a configuration
manager, and a build tool. If you buy in to the Vesta way of doing
things, all these pieces interact in a very nice way. For example,
because you keep not only your source files but also the tools needed
for building in the repository, you can always go back to a
specific build, bit for bit. It uses some neat tricks to work - the
files in the repository are exported through NFS, and, not so
coincidentally, that's how the build knows what the dependencies
are. If the file is accessed during the build, it's a depenedency,
otherwise not.

The biggest downside, I think, is that it's quite Unix-specific. It's
not impossible to run NFS on Windows or Mac, but it's not exactly
convenient either.

I think it is possible to take the best ideas from Vesta and put them
in a portable framework. Rather than a build being a script which runs
random commands and litters directories with temporary and result
files, it should be a functional program from input to output. All
intermediate results should be considered a cache. Indeed, I see no
reason why you shouldn't be able to take a source package, run a
simple command, and have it spit out a .deb for Debian, .rpm's of the
various flavors for the Red Hat-based distros (including some
intelligent analysis of how many variants are actually needed), a .pkg
or .dmg or whatever the Mac people decided is the preferred way to
distribute OS X apps, an InstallShield-like installer for Windows, and
a .pdb for Palms. Throw in a couple flags, and the Unix build is
instrumented to support debugging and profiling, or maybe gcc bounds
checking. Better yet, have it run in an interpreter such as eic, so that you can debug
runtime violations at a source level.

What exactly is standing in the way of such a thing? My guess is that
the main thing is inertia.

At least I think I voted. There's no way of knowing for sure, because
it was on one of those fancy new Diebold machines.

The problem is, of course, familiar to those experienced in computer
security. Because it's impossible to see security flaws, and
very difficult, even for experts, to understand them, people get
very complacent.
Judges, for example, are wont to dismiss
knowledge of such vulnerabilities as "speculative", rather than an
"actual threat".

What changes the status quo is invariably an actual exploit. If people
can see a voting machine being hacked, then they'll believe
it. Fortunately, in this case it ought to be relatively easy.

I don't think it's time yet for large-scale civil
disobedience about hackable voting. The most important thing, I
think, is to spread the word about the
dangers. As Avi Rubin
points out, many election judges are elderly folk without a deep
understanding of security flaws. I spoke with my election judges
briefly. I told them that I was a computer scientist, and that maybe I
know the secret to cracking the smart card they gave me. When I gave
it back to them and informed them that I didn't, in fact, mess with
it, they were visibly relieved.

I also liked Avi's idea of volunteering as a judge. That's probably
the single best way to get the word out in a positive way.

In any case, after I voted I got a little sticker that says "I voted -
touchscreen" with an American flag on it. Maybe for the election this
November, we should hand out stickers with the slogan "I think I voted
- touchscreen", and a flag with the 50 stars replaced by a BSOD.

Letter quality LCD's

I think this is going to be the year they go mainstream. They're
showing up in more and more devices. It's also
interesting that Tiger
Direct has the IBM T221 for four
grand, "while supplies last". It's possible, I think, that a
newer model is in the pipeline.

Logic update

I got a very gratifying response from my last post,
resulting in a very enlightening email correspondence with Michael
Norrish, Bob Solovay, and Norm Megill. I'll post it soon, for those
of you out there just dying to find out whether there really is a
closed form expression for HOL's epsilon in ZFC.

CodeCon was great fun, and I'm really glad I went. The highlight was
the Google party. How good was that? Let's put it this way: I was so
absorbed with intense conversations that I completely missed Knuth
crashing it.

Patents

elanthis:
yes, IMHO of course. I've
got one more in the pipeline, and very possibly some more coming.

Logic

Ghilbert was cited in a recent
preprint by
Carlos T. Simpson. One of the central themes of this paper is a
preference for untyped, set-theory flavored math as opposed to the
more type-flavored variety you see in systems such as HOL, NuPRL, and
Coq. He's also not a big fan of the constructive flavor of the latter
two systems (meaning that x || !x is not an axiom).

The main thing I'm doing in Ghilbert now is constructing HOL in set
theory. It's slow and steady progress, and very satisfying. One issue
I'm running into, though, is HOL's epsilon
operator. Essentially, if f is a function from some type T to bool,
then epsilon(f) chooses a value (of type T) for which f is true. This
construction is very useful, but does pose some challenges.

In the special case where there is only one value for which f is
true, there's no problem (in this case, the similar "iota" operator
suffices). But it's not hard to come up with situations in which f
might hold for infinitely many values, and choosing one out of the
many is a problem. If you accept the Axiom
of Choice, it tells you that it can be done, but it doesn't tell
you which value to choose. So it doesn't provide much help for
actually constructing HOL in set theory.

The HOL documentation adds further twists to the story. The HOL
description document contains a paragraph (at the end of section
1.1) suggesting that the universe of HOL terms is actually a pretty
small set (by the standards of set theory, anyway - it's still quite
infinite), and that it should be possible to choose an element from
any subset quite straightforwardly, even without assuming the axiom of
choice. I am beginning to get a vague sense what they're talking
about, but don't really understand it. If anyone out there reading
could explain it to me, I'd really appreciate it. If not, I think I
know who to ask.

I'll be at CodeCon this Friday and Saturday. I'm really looking forward to meeting up with my tribe. As an extra bonus, Heather and the kids are coming to pick me up Saturday afternoon, so it'll be a good opportunity for people in my tribe to meet my family.

Advogato maintenance

lkcl is absolutely right that I need to hand over the regular maintenance of the site. I'll start up a recruitment thread on virgule-dev after I get back from CodeCon.

The spammers are winning

When Paul Graham first published his Plan for Spam, I thought it fairly likely that his basic idea was sound, and that Bayesian-style classification would, if not eliminate spam, then at least make it manageable. Now I'm sure it won't work.

The problem is that there are two attacks that spammers can do, both of which are damned hard to do anything about. First, they can include bits of legitimate, high quality text in their messages [for example, I now see wikipedia text in Google-spam sites]. Second, by running as a virus-zombie, they can take over whatever authentication tokens are available for real mail. Note that this includes hashcash.

I still believe that a trust metric can be a part of a healthy balanced breakfast to end spam. But Google, the highest-profile deployment of a trust metric, seems to be experiencing a marked decline in quality, to the point where some people are questioning whether PageRank is such a good idea after all.

I don't know how to fix PageRank, but my current thinking, reinforced by my experiences here, is that negative recommendations are needed too. I feel quite empowered by the trust that Google places in me to rank a page up, but at the same time helpless to tell Google, "this site is pure spam".

Negative recs are not easy. For one, they don't really fit well in the PageRank model (the biggest difference between PageRank and the eigenvector-based tmetric used to power the diary ratings). Second, any simplistic approach (such as having an unauthenticated (or underauthenticated) "report this as spam" link) would be very vulnerable to DoS-type attacks, likely rendering the whole negative rec process unusable. There's a good technical reason to prefer monotonic trust metrics, and when I started my thesis project I concentrated on those, but I now think that their inability to use negative recs is crippling.

And, Zaitcev, I did notice that Orkut's trust model seemed quite primitive. I didn't kick its tires carefully, but I didn't see any obvious differences from Friendster's, which is simply based on the existence of a path of length no more than four. If Orkut is at CodeCon, I'm going to pin him down and see what he has to say for himself.

Wow, it's been a long time since my last post. I've been kinda in hermit-mode, spending more time with the family, and not feeling as productive as I should be with work. I think I'm emerging now. Things are going fairly well, and I have quite a few things I want to write about here.

OT

I also have a few things I want to write about which are more political and less related to free software. I finally did a bit of a brain-dump on my other blog. If you've been following the updates on my boys, then by all means don't miss the bit of improv radio theater we recorded a few days ago.

It's been a busy few weeks - my mom, then my brother and his
girlfriend came to visit, and somewhere in the middle of all that we
had an Artifex staff meeting.

Things are quieting down now. We had a nice family evening, playing
video games and doing a little papercraft. I
tried the tiger,
just on cheap paper and b/w laser printing, and it came out ok. Of course, Max
then wanted to do one of the motorcycles, but I convinced him that we
would some other day.

BitTorrent and RSS

There's a thread
going around the net on the benefits of combining RSS with
BitTorrent. I agree there's something there, but want to make a
distinction between the "easy" combination which is quite feasible
right now, and one that requires a bit more rocket science (actually,
Internet protocol design, but from what I know of both, the latter is
more difficult to do well).
In the "easy" combination, you have your whole RSS infrastructure
exactly as it is now, but use BitTorrent to distribute the
"attachments". People have been experimenting with RSS enclosures (for
speech, music, video, and whatnot) for a while, but they're not hugely
popular yet. One of the reasons is the difficulty and expense of
providing download bandwidth for the large files that people will
typically want to enclose. BitTorrent can solve that.

In fact, BitTorrent's strengths seem to mesh well with RSS. BT shines
when lots of people want to download the same largish file at the same
time - it's weaker at providing access to diverse archives with more
random patterns of temporal access. Also, BT scales nicely with the
number of concurrent downloaders - you get about equally good
performance with a dozen or ten thousand. So if someone shoots a
really cool digital video, posts it to their blog, then gets
Slashdotted, it all still flows.

Integrating BT with a daemon that retrieves RSS feeds in the
background has other advantages, as well. If the person opens the file
a while after the download begins (which might be as soon as the RSS
is updated), most or all of the latency of downloading that file can
be hidden. Further, since the BT implementation is released under a
near-public domain license, it should be relatively easy for people to
integrate it into their blog-browsing applications.

An example of a blog that would work superbly with BT is Chris Lydon's series of
interviews.

But Steve Gillmor's article isn't primarily about enclosures - it
suggests that we can use BT to manage the RSS feed itself. I think
there's something to the idea, but the existing protocol and
implementation isn't exactly what's needed. BT is best at downloading
large static files. You start with a "torrent" file, which is
essentially a Merkle hash tree of the file packaged up with a URL
where the "tracker" can be reached. All peers uploading and
downloading the file register with the tracker, and get a list of
other peers to connect with. Then, peers exchange blocks of the file
with each other, using very clever techniques to optimize the overall
throughput. After each block is transferred, its hash is checked
against what's in the torrent file, and discarded if it doesn't match.

But RSS files themselves are relatively small, so it's unlikely that
all that much bandwidth would be saved sending torrent files and
running a tracker, as opposed to simply sending the RSS file itself.
Further, the big performance problem with RSS is the tradeoff between
polling the RSS feed infrequently, resulting in large latencies
between the time the feed is updated and viewers get to see it, or
polling it frequently and chewing up tons of bandwidth from the
server. BT doesn't do much to help with this - you'd be polling the
torrent file exactly as frequently as you're polling the RSS file now.

I believe, however, that the BitTorrent protocol could be adapted into
one that solves the problem of change notification. The protocol is
very smart, and already has much of the infrastructure that's needed.
In particular, peers already do notify each other when they receive
new blocks. That's not change notification because the contents of the
blocks are immutable (and that's enforced by checking the hash), but
it's not too hard to see how it could be adapted. At heart, you'd
replace the static hash tree of the existing torrent file format with
a digital signature. The "publisher" node would then send new
digitally signed blocks into the network, where they'd be propagated
by the peers. There'd be essentially no network activity in between
updates, and, as in the existing BitTorrent protocol, the load on the
publisher node would be about the same whether it was feeding a dozen
or ten thousand listeners. I'd also expect latency to scale very
nicely as well (probably as the log of the number of peers, and with
fast propagation along the low latency "backbone" of the peer
network).

I'd hate to see such a beautiful work of engineering restricted to
just providing RSS feeds - ideally, it would be general enough to
handle all sorts of different applications which require change
notification. One such is the propagation of RPM or Debian package
updates, which obviously has strong requirements for both scaling and
robustness. The main thing that's keeping it from happening, I think,
is the dearth of people who really understand the BitTorrent protocol.

Proof systems

I've been hacking a bit on my toy proof language. Aside from
slowly bringing the verifier up to the point where it checks
everything that should be checked, I'm also hacking up an implementation
of the HOL inference rules constructed in ZF set theory.

It's immensely satisfying to construct proofs that are
correct with high assurance, which is such a contrast from
hacking code - any time you write nontrivial code, you know it's got
lots of bugs in it, many of which no doubt can be exploited to create
security vulnerabilities.

Oooh. Turns out the FAT file system is patented,
and Microsoft has a commercially reasonable licensing plan for it.
Of course, given the way the patent system is structured, this kind of
thing is absolutely inevitable.

As I've written
before, I think Paul Krugman got it wrong when he argued that
Microsoft's monopoly position is not much of a threat. On the other
hand, kudos to him for publicizing the
rotten state of the voting machine industry. Those in the know have
been aware of serious problems for some time, but the media so far has
been doing a good job of keeping the unwashed masses blissfully
unaware. Love Krugman or hate him, it'll be harder to keep this issue
quiet now.

Color profiling

A followup to my recent post
calling into question Argyll's multidimensional interpolation
algorithm. I had some email back and forth with Graeme, and it's now
clear that the multidimensional interpolation itself is quite
good. The "wiggliness" in my profiles seems to have been caused by an
overzealous attempt to optimize the per-channel curves. With those
disabled, I'm now getting some really, really good profiles.

I still think that my idea of a diffusion equation based interpolation
algorithm might have merit, but considering the good results I'm
seeing now from Argyll, it pretty much goes on the back burner along
with two or three dozen others. It might become interesting if reasonably
priced handheld spectrocolorimeters with a USB port ever make it
to market - typical professional profiles are made from 3000 or more
patches, but it might be possible to get good results from a much
smaller sample, if done right.

More proof links

Claus Dahl sent in a link to yet another
proof system/language. In the meantime, I've been corresponding with
Norm Megill of Metamath fame, Freek Wiedijk, and Michael Meyling of Hilbert II. There's something
addictive about formal proofs. How else to explain why there are so
many projects to work on them?