Projects

Recent blog entries by slamb

One of my most productive days was throwing away 1000 lines
of code.
--
Ken Thompson

Interesting. One of my most productive days was throwing
away 15000 lines of code.

A consequence of the increased scale of systems? Maybe;
probably also apples and oranges. These 15000 lines of code
were written by a poorly supervised contractor, and Ken
Thompson's 1000 lines were probably his own work.

I mentioned before that Thunderbird and Mail.app have slightly different flags
for indicating that a message is ham rather than spam. Well, their interaction
seemed to be even weirder than that alone would explain - if a message was
marked as not junk in Mail.app, no attempt to mark it as junk in Thunderbird
would stick. Look for NonJunk
and you'll find this (reformatted to fit your television):

On startup, Thunderbird says that a message is not junk if Mail.app said it
was NotJunk. When marking a message as Junk, it doesn't clear Mail.app's
NotJunk flags. Brilliant! How could this plan possibly fail?

What annoys me is that Thunderbird added this feature after Mail.app but
made a subtle change that broke interoperability. Then they realized their
parsing sucked and they were interpreting Mail.app's NotJunk as saying Junk.
They fixed it with this hack job and the bug popped up elsewhere - now
Thunderbird's attempt to change the marking to junk won't stay across
restarts. A little forethought and there wouldn't have been this mess.

Last night I worked on an unobtrusive way to train SpamAssassin's Bayesian
database. (Autotraining sure spam and ham as it's delivered is nice, but you
at least need a way of correcting its mistakes or it will keep making them.)
The
sa-learn utility is quite easy to use, but how do you specify what
messages to feed to it? I haven't seen any good glue for this. You want to
feed it messages which have been examined and categorized, and ideally you
want to feed it each message exactly once. (sa-learn does realize
that it's seen a message before, but it still takes some processing time to do
even that.)

I decided to harness the power of RFC 2060. My trainer
connects via IMAP4rev1, executes a SEARCH command for
candidates (letting the server do the work of an arbitrarily complex query),
downloads the messages and pipes them through sa-learn, flags
them as learned (so the next search will skip them), and disconnects. I
implemented it using imapfilter, and so far it works quite
well. This approach would even work well if the SpamAssassin machine were
separate from the mail store machine.

In the process, I noticed that Thunderbird updates spam status on the IMAP
server in the Junk and NonJunk keywords. Mail.app does
the same, in the Junk and NotJunk keywords (plus a few
others). Did you see it? One uses NonJunk, the other
NotJunk. How hard would it have been to get these guys
in a room to fight this one out? Grr. They have a weird interaction because
they just didn't put any thought into it.

I also tried out Lua for the first time, as
it's imapfilter's extension language. Turns out I hate it. I really wanted to like
it. I had been thinking of using it all over an embedded product for rapid
development with little resources. It's minimalist,
fast, and so on. But it's just unpleasant to use. Maybe it's
too minimalist. I would have liked a separate array type (rather than
just "tables" / associate arrays), and I hate "high-level" languages without
exceptions. imapfilter's library is also a bit limiting - its
fetch_message and pipe_to do everything in memory.
That makes me more irritated that Lua doesn't just have an array slice syntax
I can use to pass message lists to fetch_message. And it means I
have to spawn sa-learn a bunch of times for reasonable memory
consumption, and starting a Perl process heavy with modules takes a long
time.

C. You are connected to a site pretending to be
www.url.com …
Something evil could be going on! Someone might be trying to trick you!
Though odds are this isn’t true, it’s likely that guilt or the legal
department
required us to put this dialog up just for this case.

No, no, no, no, no! This text is the entire purpose of SSL.
If it's
really unlikely, then thousands of people wouldn't have created an entire
ecosystem around validating identities. You have to realize that a private
conversation is totally worthless if you don't know who you are talking to, and
if nothing warns you when that validation fails, why would you have validation
at all? This text wasn't added by lawyers; it was added by people who just
spent man-centuries creating cryptosystems which would be absolutely
worthless if this text were not displayed.

This dialog box shouldn't say "don't worry, this is probably
something wrong with their setup. Just go on, send them your credit card
number like always." That would defeat the purpose of the system so bady
I'm having trouble coming up with an analogy. It's sort of like a policeman
seeing someone trying to pick a lock and opening it for them, then
standing by, smiling, as they walk off with all the valuables the lock was
protecting. If you downplay the security concerns of sending important
information over this link, you're basically telling the lock "sometimes keys
screw up, just let him in." (I warned you the analogy sucked.)

It should be alarming! It needs to be alarming
enough that if someone goes to their bank's website and sees this dialog
box, they won't enter their password. Instead, they'll call their bank on the
telephone and tell them that they've spotted fraud. This is the correct action -
it's either true or it will get the correct people angry at the security people
who screwed up the configuration. It's very rare for a major bank to totally
botch their security setup like this.

On the other hand, it shouldn't be so alarming that it will prevent people from
browsing some random untrusted website which they have no intention of
sending important information to. It's not uncommon for people to require
SSL on a site, not bother paying the money to have it signed by a widely-
trusted CA, and have instructions for people with particularly sensitive
passwords to import the certificate into their browser. That's not a site
configuration problem, either - it's a "you haven't given the computer a way
to verify their identity" problem.

I agree that examining a certificate and finding the problem is unrealistic for
most people. Maybe the details of the certificate should be in an "Advanced"
pull-out or something.

I'm not convinced there's a problem with the status quo. For the 90% of people
you describe, the SSL certificate dialog box comes down to this:

Your connection to www.bigbank.com is insecure. It's
likely that people are trying to steal your money.

Give them my money | Cancel

My parents don't understand X.509 PKI, but they do understand that they care if
a connection is secure if and only if they plan to send financial credentials over
it. They know - and the computer doesn't - what information they are planning
to send. Thus, they are capable of responding to this dialog correctly 100% of
the time. Choosing either option for them would be right less than 100% of the
time. A complicated voting scheme would be right less than 100% of the time.