parsing

I think I'm just going to remove the lex/yacc parser. Anything I want
to do with syntax in ramblings files is going to be completely ad-hoc:
datamining based on existing text files instead of any kind of sane
structure. So let's just use a direct tokenizer.
What the current lexer produces is a list of strings and xhtml
elements in xexpr form.
OK: replaced with whitespace/workd tokenizer state machine +
individual word matcher.