Subject: Re: parsing lisp expression
From: Erik Naggum <clerik@naggum.no>
Date: 1997/12/21
Newsgroups: comp.lang.lisp
Message-ID: <3091670152628940@naggum.no>
* Tim Bradshaw
| Doing the reverse shouldn't be much harder I think (you might have toi
| hack readtables), so I see no need for perl.
this is not quite as easy as it seems. first, whether a read object is the
first element of a list or is itself, is only defined by the _following_
character, which is at odds with how the Lisp reader works. second, the
end of the input must be defined externally to the syntax for lists: once
we have read a token, we cannot return until the final token is either a
non-list (maybe a semicolon or a period?) or the end of the input (maybe
end of line?). third, while the singleton list is x[], there is no concept
of an empty list in that syntax, unless such it served by some other
special value (such as `nil'). all this means that reading that kind of
list is a quite different process from the normal `read-delimited-list'.
so I threw together this to demonstrate that it is possible to write a
moderately compact parser for a "botched syntax", using the Lisp reader for
the real work:
(defun read-botched-syntax (&optional stream (eof-error-p t) (eof-value nil))
(labels ((read-botched-list ()
(loop initially (read-char stream) ;discard #\[
until (eq (peek-char t stream) #\])
collect (read-botched-internal)
finally (read-char stream))) ;discard #\]
(read-botched-internal ()
(loop with first = (read stream)
for look-ahead = (peek-char t stream nil nil)
while look-ahead
while (eql look-ahead #\[)
do (setq first (cons first (read-botched-list)))
finally (return first))))
(let ((*readtable* (copy-readtable)))
;; make #\[ and #\] terminate tokens. #'identity is never called.
(set-macro-character #\[ #'identity nil)
(set-macro-character #\] #'identity nil)
(if (null (peek-char t stream eof-error-p nil))
eof-value
(read-botched-internal)))))
this will do lots of uninspiring things if much more than simple tokens are
present in the input, so a possible replacement that tries to limit itself
to tokens would go like this:
(read-botched-internal ()
(let* ((char (peek-char t stream))
(macro-function (get-macro-character char)))
;; heuristically determine whether this will be read as a token
;; this works in CMUCL and Allegro CL for Unix, not in CLISP
(if (or (null macro-function) ;this handles #\\
(eq macro-function (get-macro-character #\A)))
(loop with first = (read stream)
for look-ahead = (peek-char t stream nil nil)
while look-ahead
while (eql look-ahead #\[)
do (setq first (cons first (read-botched-list)))
finally (return first))
(error 'reader-error
:stream stream
:format-control "~@<Syntax error in ~S (character ~@C).~:@>"
:format-arguments (list stream char))))))
a more complete approach would be replacing the reader macro function for
all (relevant) characters with one's own token-reader, but that's just too
much work for now.
#\Erik
--
If you think this year is number 97, | Help fight MULE in GNU Emacs 20!
_you_ are not "Year 2000 Compliant". | http://sourcery.naggum.no/emacs/