Eric O'Dell (eric@gadgetguru.com) wrote:: I'm a C programmer with no real background in parsing or compiler: design, and I'm slowly slogging through Holub's _Compiler Design in C_: and the documentation for Flex and Bison and other tools (notably: PRECCX). I'm working on some ideas for a language which will be: compiled to bytecode to be interpreted by a virtual machine. The: language is C-ish in syntax, but I'd like to give the user the ability: to overload existing operators and define new operators. I pretty much: understand how to handle the overloading of existing binary and unary: operators. The problem lies with defining new operators.

: For new operators, the user will be able to choose characters from a: special operator character set (the usual C operator characters, plus: some extras) or alternatively give them names that follow the same: rules as any other variable or function identifier. I'd also like to: be able to define operators that consist of more than one token, like: the trinary "? :"

Interesting problem. Have a look at Algol-68 to see how they did it
for hints on how to do the language syntax end of defining
user-defined operators. The scanning/parsing issue is much
harder. Most scanners hardwire the punctuation-based symbols into
their automata and you can probably forget all about extending them at
run time. The only option would be to have pseudo-tokens for each
individual punctuation symbol and then try and reconstruct them at
parser level. This will certainly break LALR or LL(1) constraints for
common languages, so basically it just gets harder and harder.

However, all is not lost. My RDP parser generator has a scanner that
was specifically designed to support this kind of thing. The way that
it works is that all keywords, including punctuation based ones are
held in a run-time loaded symbol table, and the scanner simply does
longest match tokenisation on the basis of this table. It is not as
fast as a DFA but it is totally flexible. In addition, RDP is
specifically aimed at neophyte users, and many people have found it an
easy introduction to the subject of translator design. It comes with
masses of documentation, including a large case study manual that
takes you through the design of a range of translators including two
compilers, an assembler and a raft of interpreters.

CAVEATS:

1. The run time extendible scanner has not often been used for
run-time extendible languages and you'll need to understand some of
the internals to make it work properly. This caveat doesn't apply to
`standard' use of the scanner using non-dynamic tokens because RDP
handles all that automatically.

2. RDP is LL(1) only although we have more powerful parser generators under
development.