I find myself in the position of needing to know more about compilers
than I thought I would ever want to know.

Plenty of books, plenty of documentation out there, but they all
pretty much start the same way and they all leave me with the same
queasy feeling. Sometimes, those queasy feelings go away as you get
into a subject because the mysteries underlying them gradually reveal
themselves. Sometimes, those queasy feelings go away through mere
familiarity, and long, bitter experience has taught me that
familiarity of that kind breeds disaster.

>In order to get started on compiler design and implementation, I've been>reading various sources (including the FAQs for this newsgroup) and WEB>references. I keep running across a small set of terms that, evidently>everyone knows except me. [:-)

And me.

>For example, in the JavaCC distribution there is a very simple grammar>called Simple1. In the description of this grammar (and the others, also),>the words "terminals" and "non-terminals" are used to name what, to my naive>eye, look like, well, simple rules.

This brave question eventually elicited a knowledgeable and thoughtful
response from one Robert Rayhawk, whose response leaves me feeling
that my queasy feelings are entirely justified.

>The sense of the word 'terminal' in compiler discussions, is that the>thing is at the 'end'.

Pretty easy to see that the terminals always wind up as the terminal
leaf on a tree...at least the way the trees are usually drawn. Redraw
the tree without rearranging the structure and the "terminals" could
just as well wind up at the top.

Don't know the terminology of trees. Terminals never connect
anything, they only lead to something else, no matter how you walk the
tree. Still not a good enough definition of a terminal, since it is
easy to imagine non-terminals that sometimes only lead to one other
thing. Or am I missing something here?

>Permit me to give a quick explanation, and then a slightly longer one.>>There is a hierarchy of peices to a source file, like the branches of>a tree. Out at the end of the branches are the terminals (the>end-points) of the tree-hierarchy.>>The theory of compiler tools is that a source file in which we might>be interested>> 1) has a collection of pieces (since we start our conversation here,>it seems funny to call these end-points, or terminals ... but>eventually you learn that you can climb up trees and down trees>fluidly and calling something start or finish is just a name, as long>as we agree on names we could give it any name).>> 2) the basic theory goes on to assume that the pieces in the source>file are arranged in some kind of a structure. This structure could>just be a sequence, or it could actually be hierarchical like a>corporate structure.>> 3) the basic theory then makes the single most important assumption>of all. It is assumed that if we can detect the structure (or the>mere pattern) of the collection of peices in the source, from that we>can discern the meaning of the source file (we can determine the>semantics). As you progress in your studies and master the vocabulary>of this technology, you will have occassion to look back at this>assumption with some curiosity.>
Hull breach. Damage report. Condition red.

Translation: if this all sounds like BS to you, it's because it is.

>Some tools break this set of assumptions into two separate programs>. The first part tries to find the pieces. This discovery phase is>usually called Syntax analysis. As one other poster pointed out the>tools use a set of rules to define the pieces that should be found one>by one in the source.(These are the syntax rules).>>These discovered pieces are called terminals, this will seem a little>more clear in a moment.>>A second phase gathers the discovered pieces and tries to see if they>match specified patterns of pieces. Again as another poster pointed>out, that phase gets rules from you that are a separate type of>rule.(These are the grammar rules) In some tools this is handled by a>program that is distinct from the front-end, and this phase is called>parsing, or Semantic analysis.>>Some tools make your job easy by merging the tool interfaces so that>you think you are only talking to one analysis program. It is easy to>get started with these tools, but as you are finding it can be a>little confusing.>>The parser that is driven by the sequence of tokens from the>front-end, arranges these into a hierarchy. This is really the first>place that a tree shows up in the discussion. The parser records>pieces in a tree structure, and we say that the original source had>pieces 'in a certain hierarchical arrangement' that we>perceive. Acually all the source file has is characters>>We impose the hierarchy with our minds and tools. The end-points of>this project tree structure (psychological projection of programmatic>recordation) are simply called 'terminals'. Those terminals can only>be found by the front-end syntax analysis that decides certain>anticipated character sequences consitute a token to be passed to the>parser.>>Any rule given to the parser that asserts that one or more tokens>constitue a pattern, defines a non-terminal.>>non-terminal-example: RAW-TOKEN>>Of course, non-terminals can have two or more tokens.>>non-term-1: TOKEN1 TOKEN2>>These are definitions (inside the parser control file) of things which>the syntax engine is not smart enough to find: syntax (basically) can>only find one token at a time.>>The simplistic non-terminals illustrated so far are at the first level>in from raw tokens (terminals) in a hierarchical structure. Any rule>(controlling the parser) that defines things at that level, or further>in towards the trunk of the tree, are non-terminals.>>In most practical tools, the whole set of rules must be gathered>(hierarchically) into one final non-terminal. This grammar rule can>be called, END, or BEGIN, or if you like FRED-SHMED. It is just a>name.>>In a certain sense, a tree has two end-point types. The trunk and the>numerous ends of the branches.>>Downstream processes 'walk' the tree recorded by the parser,>efficient systems tend to walk in one direction, but you eventually>encounter complex requirements that force algorithms to either look>back in to other levels, or to have type data passed to routines as>they go further out on the limbs.>>Once you see that it is a tree, and you can call the trunck begin or>end, you see that the multitude of end-points on the branches are as>well named 'terminal' as any other name.>>It is important to see that the terminals are the entire list of>vocabulary shared between the syntax analysis and the semantic>analysis.

This sounds like a hard definition, except that it merely transforms
one muddy concept (terminal) into even muddier concepts: the
distinction between syntax analysis and semantic analysis.

>The grammar can not resolve any detail more precise than the>list of tokens defined in the interface to syntax at the front-end.