I looked into this question at one time and tried to build a parser that
operated directly on the input character stream instead of tokens.

One problem I ran into is the problem with lookahead. Typically parsers
(YACC) only have one token of lookahead. If your input tokens are
characters instead of keywords, this leads to greater ambiguity. For
one production might be possible on a string of characters if the next
token is the `FOR' keyword. That same string of characters might also
work if the next token was an identifier. A traditional parser will
not have a problem with that but one based on characters will because
given the single lookahead 'F', it can not determine if the lookahead
is a keyword or a identifier.

Another problem is dealing with comments because they can occur ANYWHERE.
This can be solved by simply restricting where comments can go (which may
not be a bad idea in any case).