We went to the Farmers Market in Benicia, as we do most
Thursdays. This was the first week of corn season, so we got
some. It
was amazingly good. I like farmers markets; they're a good
way to make
a connection to the people who actually grow your food.

Fitz

Antony
Courtney
called me up today, and we had a nice chat. He is interested in
working on Fitz, largely so he can use it for his thesis work on
functional reactive user interfaces.

His work is done in Haskell, so Haskell bindings for Fitz
would be
part of the deal. I think these two things could work well
together.
Fitz has a somewhat functional design. The rendering of a
tree is a
function of the tree. Caching is strongly related to
memoization in
functional languages, and minimal updating is a form of
incremental
computation.

Language bindings are to be an integral part of Fitz. We'll
use the
Python bindings extensively for testing. It's also, I think,
the best
language for experimenting. But I don't mean to exclude Perl
or Ruby
programmers, either. I just wish it was easier to write high
quality
cross-language wrappers.

What I'd really like is something like Pyrex,
only with the ability to generate code for many different
runtimes.
Pyrex itself seems to be coming along quite nicely. As of
0.3, you can
define classes in C, which seems quite useful. At some
point, I might
be motivated to try an experiment of making Pyrex generate,
say, a
Ruby extension.

Picoservers

David McCusker writes briefly of picoservers.
These sound like fun. Basically, the problem boils down to:
how do you
best express asynchronous behavior in a programming
language? Threads
are one way, but they have lots of pitfalls, including
performance and
scaling issues. Event-based programming is more lightweight,
but has a
reputation for being very tedious and low-level. Also,
event-based
programming by itself can't take advantage of multiple
processors.

I haven't looked at SEDA carefully yet (it's the thesis work
of Matt
Welsh), but it looks interesting.

People have been thinking about asynchrony for a long time.
One of the
more elegant approaches is Hoare's Communicating Sequential
Processes
(CSP). I'm a bit surprised that CSP hasn't gone further. It
seems like
a nice higher level abstraction compared with event-driven
programming, but without all the nasty problems with race
conditions
and lock contention that threads bring you.

There is actually a CSP
implementation in C. It seems like more typing than
languages that
have CSP baked-in. Occam is the most famous of these
languages, but I
think Limbo
might be a more useful incarnation. Occam tends to be fairly
static,
but Limbo lets you create "threads" and channels very
dynamically.

Python's generators are already sorta like coroutines. David
Mertz
talks about using them to implement what he calls weightless
threads.

This will be another fairly technical entry on trees. It's
been in the
queue for a few days. The focus is on the Fitz display tree,
for which
memory efficiency is a major factor. Tonight we look at caching.

If you have a display tree, there are lot of things you
might want to
cache: intermediate renderings, bounding boxes and other
geometry
information, etc. For each Bezier curve, for example, you
might want
to cache a decomposition into triangles. The memory
footprint for
these cached objects might be significant in size compared
to the
original tree. This is why we want a cache rather than simply
annotating the tree nodes with the extra info.

Mutating the tree can invalidate cached data, as well. In
some cases,
the relationship between the mutation and the cache is
nontrivial. For
example, if you change the color of a Bezier shape, you
invalidate an
intermediate RGBA rendering, but the triangle decomposition
remains
valid.

We're not going to pin down the exact representation of the
tree. One
in-memory object per node is the simplest way, and should work.
Another approach that should work is storing a serialization
of the
tree in a btree-like structure. In this case, our node id is
effectively a file offset to the beginning of the
serialization. Thus,
the id of a node can change as the tree is mutated.

Dealing with "sliding" node id's is probably too hard for
clients of
the tree, so we have an additional concept of "node
reference", which
is an in-memory object that essentially wraps a node id.
When a node
id moves, the tree implementation updates the corresponding node
reference. This way, clients holding node references don't
have to
worry about them moving around.

Node reference might take dozens of bytes of RAM each, but
node id's
are essentially weightless. We hope that tree clients hold a
relatively small number of node references, even as the size
of the
tree scales up.

Now we get to tonight's central design question: what should
the key
of our various caches be? A node id? A node reference?
Something else?
Here, we consider some alternatives.

Persistent id

A very common pattern in databases is to add a persistent id
to each
node. The value is somewhat arbitrary, but must be unique
for each
node in the tree. If we had persistent id's, then it would
make good
sense to use them as the cache keys. The problem is the
extra storage
cost. We're trying to keep that down to the bone.

Node id

It's tempting to use node id's (ie file offsets in the btree
case) as
cache keys. The problem is that if the node id moves, the
cache key
needs to be updated. Keeping the inverse map from node id to
cache
keys has nontrivial storage costs, also.

A more subtle, but important, argument against is that
updating them
may be very rare in some usage scenarios. Thus, there is a
risk that
the update machinery won't be adequately tested.

Node reference

The cache key could simply be a pointer to a node reference
object.
If the tree moves the corresponding node id, it updates the
internals
of the node reference, but the pointer remains constant.

Cache in tree

A rather different approach is to insert the cached values
into the
tree. The advantage is that the RAM costs can be very low
(near-zero
if the tree is stored on disk as a btree). The disadvantage
is that
computing and evicting cached values now requires traffic
with the
tree, with attendant performance and fragility problems.

Also, if there are multiple caches, then they'll need to be
properly

multiplexed so values from the caches don't interfere with
each other.

Cache in node reference

This is something of a hybrid of the above three approaches.
Instead
of the cache being represented as a hash table to the side
of the
tree, the cache entry is an extra field in the node
reference. If
there is a single cache (or small, bounded number of
caches), then
this approach is appealing. Otherwise, you have to do
multiplexing
as above.

I think you see most of these approaches in real systems.
For one
example, the Gnome Canvas includes a bounding box in all
nodes, and
also an SVP (sorted vector path) in all Bezier shape nodes
(thus, I
claim, it represents "cache in tree"). Unfortunately, it never
evicts any elements from the cache, so the memory
requirements can
become quite painful.

For Fitz, I now think I have an answer. Most caches will use
node
references as keys. However, I may treat bounding boxes
specially, and
store them in node references. Saving a hashtable lookup may
be a
significant win, and it also helps that the value is of
small constant
size.

Also, note that navigational links (parent, first child, next
sibling), which are internal to the tree implementation, can
similarly
be cached in the node reference. The basic rendering
traversal is:
given a bounding box, find all child nodes intersecting that
bbox. If
the nodes are in-cache, it should be possible to do this
very quickly.

I'm happy with this, but still don't feel I've come to terms
with
change notification. I'll blog that in the next few days,
and probably
lose my remaining two readers.

Kids

Alan started his Korean Royal Court Martial Arts (Koong
Joong Mu Sool)
classes this week. It seems to be a perfect match for him -
he's in
much better shape.

David McCusker has a beautiful short piece
on overcomplex technology. We should be asking ourselved,
"is there a simpler way?" far more often.

In the case of trust metrics, there definitely was "goal
drift". I started out trying to make a better PKI, and came
up with the tmetric ideas as part of that. Since then, I've
come to consider PKI too hard, and in need of a fundamental
rethink. The tmetric ideas still seem sound to me, though.

davidw points out that a hardware metadata site
without the trust metric would still be useful. Of course. I
think a good rule of thumb is whether you want to automate
important decisions. If your query is: "what kind of
reliability can I expect from a JXQ-11?", then it's
reasonable to wade through the trolls, spammers, etc. (if any),
and you probably win it all back in overall simplicity. But
part of my idea is that you'd use such data to automatically
figure out orders to place electronically. Then, you really
do start to care about people putting false information into
the system.

I'm a little behind on sleep, so tonight's entry will be
short. Today
was a nice family day.

A friend gave Alan a crystal radio set, so we put up a 100'
antenna
and tried it out. We were able to get a faint signal on one
station,
so it was a cool demo of radio waves, but could have been
cooler.

I've been thinking more about trees over the weekend,
particularly
caching and change notification. Unfortunately, it gets
complicated,
and I worry that most readers don't have much context. At
some point,
I'll put up real infrastructure for Fitz, and the display
tree will
be part of the design docs. In the meantime, I like the blog
form.
I'll post more tomorrow.

Dave Winer has a big
thread on
blogs, journalism, and integrity. I'm not that moved by
arguments of
integrity. My feeling is that journalism is governed by
Sturgeon's Law
just like everything else. I fear that tech journalism is
particularly
affected, though. Most tech stories in the mainstream press
have
serious factual errors, and show lack of understanding on
the part of
the writer. I don't really care why tech journalism is
so
bad. It's likely to have something to do with the highly
centralized
structure of the media business, but I haven't thought much
about the
exact pathways. Dave asks: "Dumb-it-down or deliberate
manipulation?"
I'm not sure it matters much.

Blogs are also subject to Sturgeon's Law, of course. The
vast majority
are not worth reading. But there's real diversity out here
in
blog-land, no doubt related to the fact that blogs are not
owned by a
tiny number of megacorps. Can you imagine what a mainstream
story on
tree access protocols would look like? Yet, if you're one of
the few
people who cares about this, you're reading my blog, and I'm
probably
reading yours, and we're both engaging the subject very
deeply.

Dave points out interviews as particularly bad in the
mainstream. He's
right. The process is fundamentally broken. The ideal of
objectivity,
while it might be important in other contexts, is somewhat
pointless
in an interview. It's the interviewee's point of view you
care
about. Why filter and distort it through a journalist who
doesn't
understand the topic and is a bad writer to boot? A blog
lets you say
what you meant, and if people misinterpret you, you can
answer them.

Btw, I'm notoriously bad at checking my telephone answering
machine.
It's one way of doing flow control, I suppose. But I'll
check it
tomorrow. Now it's time to catch up on sleep.

Yesterday was Alan's last day at school. He was very
emotional about
saying goodbye to his friends and teacher. But that evening,
we went
to sushi to celebrate his graduation from kindergarten, and
ran into
one of his classmates. He worries a lot about making
friends, but he's
actually very social, much more so than either Heather or me
at his age.

Max is going through another great leap forward in language
development. As I blogged recently, he's working on
irregular
verbs. A few days ago, he said, "I broked it. I broked it. I
broke
it." You could read it on his face - "not quite right. Nope.
Aah,
nailed it!". This evening, he said "I dropped my bottle. Put
it over
my legs." And, touching the scotch tape I used to repair our
copy of
Goodnight Moon, "something sticky." It's only been a couple
of months
or so since most of his utterances were single words.

He's also very advanced physically. He can now kick a soccer
ball
well, in the direction he wants and with some force.
Also, he
blew my away by announcing "circle", then folding
his collapsible sunshield into precisely that shape.

Alan had a similar language burst at almost the same age
(25.5mo).
Actually, we have to be very careful about marvelling at
Max, because
it makes Alan feel jealous. We reassure him about how smart
he is, and
how proud we are of him, but he still expresses a lot of
doubts.

Keys

Wes briefly forgot
his ThinkPad's BIOS password. This kind of thing happens all
the time
to real people. I commented on the need for far more
sophisticated
rituals for guarding keys, with both social and technical
aspects.
It's a hard problem, and it clearly can be done in both
peer-to-peer
and centralized flavors. Governments and evil corporations have a lot
of
motivation to pursue the latter. I'd like to see more
thinking on
decentralized approaches.

Of course, at the heart of the problem is the fact that it's
all but
impossible to securely store a key on a general purpose PC.
Ry4an Brase pointed to a really
neat toy. This
particular
model is a bit limited (in particular, if you lose or break
it, you're
hosed), but I think more specialized hardware like this will
play an
important role.

Cheap parts

A few people have expressed interest in getting a dual
Athlon system
similar to spectre.
One question that came up: is generic RAM actually any less
stable or
reliable than the "name brand" variant? I really have no
idea. If you
were going to get a gig or more, the price difference could
be
significant. I chose not to take the risk, but I have a
feeling that
it's probably mostly a marketing strategy on behalf of the
"name
brands." For example, I know that Apple SDRAM, at $150
for a
256M PC133 SODIMM, is no different than Crucial's at $69.99
(including
shipping). The question is whether it's any more reliable
than the $38
part from Pricewatch. (side question: why the hell did Apple
put a
SODIMM socket in the iMac?)

Again, I think this would be a killer app for a trustworthy
metadata
system. What if almost all generic parts were good, but
there were a
few suppliers that weren't. Wouldn't it be cool to actually
know
that? Also, if people had a good place to report stuff like
drive
failures, I think information about lemon products would
disseminate
much faster.

The machine arrived today. It seems pretty sweet.
Ghostscript compile
is down to 53s. Fitz rendering of the tiger is 130ms, but
some of that
is debug overhead. I think I'm going to like this machine.

The only serious hitch so far seems to be on-board Ethernet,
which
gets stuck. Popping in a Tulip PCI board fixes that.

150 dpi

The Matrox G550 can handily drive the monitor at 2048x1536,
or around
150 dpi. It looks surprisingly good, I think in large part
to the
quality of the Matrox card. I think I might stay with this
resolution
a while. Obviously, most default fonts are too tiny, but I
can easily
configure the ones I really care about.

testrgb speed is 30.5 Mpix/s in 24-bit. This is pretty good,
but I was
hoping for better. I'm not sure I have everything tuned yet.
For one,
this is XFree86 4.1.0, and 4.2.0 is now out. Incidentally,
setting AGP
to 4x makes no noticeable difference. I'm not surprised, but
in theory
it should. testrgb is very bandwidth-intensive, which is
what AGP is
all about.

802.11b audio

rkrishnan: you're right of course, that real Internet
telephony is untrivial. But in talking about D/A's, my main
point was
that the same basic platform could also do CD-quality audio,
which
would make it much more interesting to "enthusiasts", as
opposed to
corporate customers.

I think it's inevitable that all these kinds of products
will come out
over the coming months.

A little Mac OSX hack called Silk made the
rounds of
the Mac blogs this week. I'm not sure exactly what it
does, but
it claims to turn on Quartz font rendering in Carbon apps.

However, it's not quite the same. In all the screenshots I
saw, the
Silk-rendered text did not have subpixel antialiasing.
Native Mac OSX
apps do. Check out Chimera screenshots 123,
for example.

At low resolutions, I think the choice between hinted aa,
and unhinted
aa (with subpixel positioning) is a matter of taste. Many people
complain that the latter is blurry, but there's also a case
that it's
more aesthetic. Reports from the field are mixed. Doc Searls
is bothered
by the blurring, but many OSX proponents love
unhinted rendering. See this thread
for more opinions, both pro and con.

I also think that by choosing unhinted rendering as the
"new, cool"
look, Apple has influenced the taste of Mac OSX users
greatly. OS9 (or
"Classic") apps look old, even dated, although their font
rendering is
actually sharper. I've been interested in font rendering for a
long time, and I did not anticipate the effect that
marketing
can have.

In any case, as resolutions go up, the win for unhinted aa
becomes
clearer. The irony is that not only are Apple-brand displays
stuck at
'90s-era resolutions, but Aqua isn't scalable, so as resolution
increases, fonts get tinier too. Why they decided this is
utterly
beyond me, especially as the underlying Quartz technology is
quite
scalable, as was the Display PostScript that preceded it.

Linux UI's aren't scalable either, in the sense that there's
a knob
you can turn to scale everything (good luck getting
everybody to
agree on the knob!), but they are overly configurable, so
you can
fiddle with the fonts and sorta get good results.

Few Linux UI's do unhinted rendering, either, but it's not
especially
difficult. In particular, there's no reason why a
Gecko-based browser
can't match Chimera almost pixel-for-pixel. All it would take is
turning off the hinting, and implementing subpixel positioning.

Future Ghostscripts will do unhinted aa, with subpixel
positioning, by
default. Hopefully, I'll have time to polish my patch, and
maybe get
it into HEAD, next week.

zooko posed a fun problem
on his blog (and in
#p2p-hackers
irc). It goes like this: Alice chooses a bit. She then sends
a message
M1 to Bob. However, at this point Bob should not be able to
figure out
the value of the bit. Bob encrypts a message using a key
derived from
M1, and sends this message (M2) to Alice. If the bit was 1,
then Alice
decrypts the message. However, if the bit was 0, then Alice
should be
unable to decrypt the message. Finally, in the last phase, Alice
reveals the bit, along with a proof that this was the same
value as
chosen when M1 was sent.

Zooko has a colorful motivating example, written in terms of
centaurs
and ogres. You might enjoy thinking about the puzzle,
especially if
you know some crypto. The answer will be revealed soon.

(PS to zooko: if you had a permalink, I would have happily
linked it)

The trust metric

Well, restricting recentlog to certified entries would
have been
ineffective, in addition to the various other downsides. So
much for
that idea, at least for now.

Out of curiosity, I ran PageRank on the Advogato trust
graph, treating
all links the same (one interesting twist would be to weight
blue and
purple links more than green ones). The node "bytesplit"
comes in at
rank 2444 out of 3209. If Advogato were to be based on
PageRank, I'm
not sure whether it would be better to set a threshold above
this or
below it. Another way of looking at the graph is that there
are two
independent certification paths from the seeds. That's not too
shabby.

So, a reasonable conclusion is that the trust metric is not
the best
way to address this conflict. I never really thought it was.

Overzealous spider

Whoever's behind 24.163.74.39, your spider is broken.
Requests like these are making up about 80% of all
traffic right now, and it's impacting responsiveness.

Ralph and I had been stuck for a long time on a bug that
caused the
symbold dict decoder to fail partway through all real
streams (but of
course succeed on the test stream in the jbig2 spec). We
wrote the UBC
people asking
for help (trace data would have been enormously helpful),
but were
completely ignored. I'd like to give them the benefit of a
doubt and
believe that they were simply too busy (lord knows I suck at
answering
my own email) rather than having sold their soul to the
corporate
devil, but it's impossible to be sure.

In any case, William Rucklidge, primary editor of the spec,
dropped by
the mailing list recently, and was able to provide spec
clarification
and trace data, and we fixed the bug handily. Thanks!

Voice on 802.11b

Wes pointed a
number of
interesting 802.11b voice products on the horizon. Vocera seems to have a
particularly
appealing product - a 46g, 10x3.5cm "badge" that has a
speaker and mic
built in, and can take Plantronics headsets. Also, Broadcom
and Agere
are
working on "VoIP phone on a chip" chips. There's a good
chance that
low-cost phones will be available soon. Current VoIP phones are
dramatically overpriced.

All this stuff seems designed for large corporations. I
don't see any
signs that anyone is trying to sell to actual people. Dual
16-bit,
44100Hz D/A's would be a good start (the chip is about $6).

Valgrind

I spent a lot more time with Valgrind today, and caught
quite a few
Ghostscript bugs. The tool rocks. movement expresses
amazement that people are still finding out about the tool.
I think
it's because you don't see how good Valgrind is until
you use it
yourself. There are a lot of crappy proof-of-concept free
software projects out there (and SourceForge hosts most of
them!). It's quite rare that a project delivers this level
of quality
so quickly, coming out of nowhere.

Mozilla

Today I wore the "Welcome Mozilla" t-shirt I got on the
launch of the
project over four years ago. I've also pretty much switched
over. Congratulations!

Petty personal fighting

chip86: I can certainly see why you're frustrated. In
general, I don't like to intervene in these cases. In my
experience,
ignoring the unpleasantness works better than anything else.

Even so, one simple change I'm considering is to only include
certified entries in recentlog. I thought about that last
time around,
but the problem went away before I got around to
implementing the
change. That change would be a little bit unfriendly to new
people
joining, which also makes me somewhat reluctant.

bytesplit: I don't understand why you and
Christian are
fighting, and I'm not sure I want to. In any case, I don't
recommend
Advogato as a forum to air your grievances.

I'm hesitant to post even this much, because I am not
getting
involved. I hope this doesn't escalate, but if it does, it
will be a
good testbed for the trust metric ideas.

Btw, George Monbiot has
some
writings that will be particularly interesting for those who
like
controversy.

I spent some time today using Valgrind to
track down
uninitialized memory reads and similar problems in
Ghostscript. So
far, 4 out of 5 have been legitimate problems, and at least
one was
probably capable of real mischief. The one false positive
was due to a
compiler optimization, and became clear on viewing the assembler
(roughly, ashort[0] == 42 && ashort[1] == 69 being
collapsed
into a single comparison).

Peter Deutsch reports that his stabs using Purify and
Insure++ were
very frustrating, because about 90-95% of the reports were false
positives.

The lack of a good free memory debugger has long been a
source of
disappointment to me. Valgrind has singlehandedly done a lot
to renew
my faith in free software.

New monitor

The new monitor for spectre appeared
today. It's a ViewSonic P95f.
Overall,
it seems pretty good, but not sensational. I wanted a 19" to
save
space and power, figuring that the next monitor I buy will be a
high-res LCD such as the IBM T221 (drool).

The monitor goes up to 1920x1440 (about 140 dpi), but
running test
patterns shows that the horizontal resolution is limited by the
aperture grill, which is about 100 stripes per inch. As far as
vertical resolution goes, you can actually see individual
scanlines
even at that resolution.

So the "real" resolution is somewhere around 1600x1200
(about 115
dpi). I wonder how shadow mask monitors such as the Hitachi CM715
stack up by comparison. That display has a dot pitch of
about .22mm,
which should match the 1600x1200 well. It's cheaper and
draws less
power, too.

I haven't finished playing with font rendering, but so far
it looks
like my original suspicion is true: at 140 dpi, antialiased
unhinted
text is nearly as contrasty as monochrome. Driving the
monitor at more
than its "real" resolution might be a good deal, if it means
you can
use unhinted aa fonts without compromise.

Internet telephones

Wes posted a link
on ENUM
recently. Basically, it's a mapping from telephone numbers
to domain
names, so you can use DNS to resolve them to IP addresses.
You reverse
the digits, join with '.', and put .e164.arpa at the end. So
my phone
number is 8.9.3.6.6.4.7.7.0.7.1.e164.arpa. This should
resolve to
something like 63.196.208.218, but of course doesn't. The phone
companies prefer it this way. There's no good technological
reason to
have a phone any more, but as long as it's too hard for
near-PhD's,
the phone companies don't have much to worry about.

I tried setting up gnomemeeting with blanu (who is
experimenting with streaming media), but we didn't get it to
work. On
my side, it was probably the laptop's sound driver, for
which I've
never really bothered to get the microphone to work.

Part of the problem is trying to use general purpose
hardware like a
PC with a sound card for a specific task. However, there's
no reason
why you couldn't build internet phone hardware. In fact, I
think it's
a great idea.

For my setup, I'd want two pieces of hardware. One would be
exactly
the same hardware as an Apple AirPort, but the phone port
would be for
voice calls, not for the modem. The AirPort is a 386 PC
anyway, so I
wouldn't be surprised if you could use it for this.

The other piece of hardware is simply A/D and D/A converters, an
802.11b card, and a battery. The cost of the 802.11b chipset
is around
$22, and cards are now retailing for $35
after rebate. There's no reason why this part couldn't be
retailed for
$100. Of course, what makes this product really
appealing is
playing Ogg files from your home media server. Before the
iPod was
released, some people thought that's what it would be.

I think whichever manufacturer figures this out is going to
sell a
hell of a lot of them. Making the (Windows) software nice
and easy to
use is nontrivial, but I don't care. I'd just run it off my
Linux
boxen anyway.

Google and the dangers of centralization

Google is amazing. When in online conversations, I routinely ask
Google for the answers to questions that come up, and I
almost always
get the answers. Not only that, but Google is fast. In
fact,
it's quite competitive with DNS.

It's going to be very, very hard for anyone else to compete with
Google, in part because of their PageRank algorithm (the
rest of it is
just kick-ass good implementation). As a result, Google is
in great
danger of become a central point of failure for the whole useful
Internet. People don't seem to have started worrying about
this yet,
probably because they're so damn good at what they do. By
contrast,
VeriSign (formerly Network Solutions, formerly the InterNic) got
people worried fairly early on, because they suck so hard.

But Google is not a public service. In fact, it will
probably become a
publicly traded corporation soon, with a fiduciary obligation to
shareholders. How much money might they be able to extract
from their
position? Think about that when you read The Google AdWords
Happening.

SourceForge

I note with some Schadenfreude that SourceForge no longer
seems to be
hosting their own downloads, using mirrors at Telia,
ibiblio, and
Belnet instead.

Why Schadenfreude? In large part because they don't listen
when I have
things to say. Other, healthier organizations and people
don't seem to
have this problem. Oh well.