Chris Lattner wrote:>> Are there any well known techniques that are useful to provide buffered input> for a lexer, without imposing arbitrary restrictions on token size?

Well, there is the possibility of dynamic buffer resizing.

> [Flex uses a pair of large input buffers, 16K by default, with each token> having to be smaller than a buffer. For anything vaguely resembling a> programming language, I'd think a 16K token limit wouldn't be a problem....]

Just as a pestiferous hypothetical counterexample, what if there were
a language that allowed literal image, animation, and sound values as
immediate inline constants? You would need some kind of "aware"
editor that didn't attempt to display them as text, but conceptually,
writing an image of how a particular dialog box is supposed to look in
the source code is no different from writing the number 3. They are
just constants. One gets a one-byte representation, and the other
requires more than one byte.

There are no extant examples of languages so cavalier with
multi-kilobyte constants yet, but I expect it in the next five years.
Having them be immediate instead of keeping a resource library around
and referring to them by number or whatever would just make it easier
to work with.

Meanwhile, and as others here have pointed out, C++ name manglers, in
the presence of macros that expand into class definitions, can
blithely create useless 10-kilobyte identifiers.

Bear
[If there were such a language, I think I'd write my lexer with some
special case code to suck up the big bags o' bits. There's plenty
of ways to read such stuff in pieces and then glue them together. -John]