Subject: Re: READ-DELIMITED-FORM
From: Erik Naggum <erik@naggum.no>
Date: 05 Sep 2002 12:43:22 +0000
Newsgroups: comp.lang.lisp
Message-ID: <3240218602163684@naggum.no>
* Tim Bradshaw
| Can you explain why?
Because the reader algorithm is defined in terms of tokens that are examined
before they are turned into integers, floating-point numbers, or symbols.
The tokens ., .., and ... must all be interpreted (or cause errors) prior to
being turned into symbols, and if you expect to be able to look at them
after `read´ has already returned, the original information is lost and you
will have insurmountable problems reconstructing the original characters
that made up the token, just like you cannot recover the case information
from a token that turned into an integer or symbol. The hard-wired nature
of ) likewise has to be determined prior to processing it as a terminating
macro characters.
The usual way to implement the tokenization phase of the reader is to work
with a special buffer-related substring or mirrored buffer that characters
are copied into and then to use special knowledge of this buffer in the token
interpretation phase. The way I implement tokenizers and scanners is with
an offset from the current stream head to peek multiple characters into the
stream. When the terminating condition has been found, I know how many
characters to copy, if needed, and I am relatively well-informed of what I
have just scanned. When the token has been completed, I let the stream head
jump forward to the point where I want the next call to start. This may be
several characters shorter than I scanned ahead, naturally. I invented this
technique to parse SGML, which would otherwise have required multiple-
character read-ahead or some buffer on the side and much overhead.
--
Erik Naggum, Oslo, Norway
Act from reason, and failure makes you rethink and study harder.
Act from faith, and failure makes you blame someone and push harder.