Hi I am reading 'The TEXbook' by Donald E. Knuth. But in my point of view is very didactic for people who want to learn TeX. Most of the pages contain exercises. But many things of how a TeX implementation parses a TeX code was untouched. I was wondering why the control sequence \input MS reads the argument 'MS' and not only 'M' as I suspected ($\sqrt ab$ is the square root of a and b is outside the root..) for instance.

In which order does a TeX parser process the characters of each of the 16 categories, which are listened on chapter 7....?

Does anyone know a complete reference or book which tells exactly how a TeX parser works and lists all the 900 build-in control sequences in TeX?

Other than The TeXbook (which does cover all of this technical detail in the later chapters), TeX by Topic is normally well regarded (and free). In both cases, you'll need to look at the bits about formal grammar, with <balanced text>, <general text>, etc. as key concepts. You might also consider the sources to TeX: they are after all definitive. I'm also not sure what you mean by 'modern': TeX was finalised in 1990 and there hasn't been a lot to say about its parsing model since! (I've posted as a comment as I'm not sure this is really what you are after.)
– Joseph Wright♦May 13 '15 at 10:05

@JosephWright Yes it was \input not \import. What do you mean with 'bits about formal grammar'? Thanks for the hint 'TeX by Topic'
– MatthiasMay 13 '15 at 10:10

2

Taking the example of \input. Looking in the index for The TeXbook, \input is defined (underlined index entry) on page 214 (spiral bound edition). Looking there, I find \input<file name>. The formal definition for a <file name> is on page 278, again found using the index. Chapters 24-26 have a lot of the formal grammar stuff (summaries of vertical/horizontal/math modes). The formal syntax stuff is all about those < ... > descriptors.
– Joseph Wright♦May 13 '15 at 10:15

Is every at least primitive control sequence covered by The TeXbook?
– MatthiasMay 13 '15 at 10:21

2

@Matthias: Yes. And there are not 900, but 200+. :-)
– Martin SchröderMay 13 '15 at 10:40

2 Answers
2

The syntax for 〈file name〉 is not standard in TeX, because different
operating systems have different conventions. You should ask your local
system wizards for details on just how they have decided to implement file names. However, the following principles should hold universally:
A 〈file name〉 should consist of 〈optional spaces〉 followed by explicit
character tokens (after expansion). A sequence of six or fewer ordinary
letters and/or digits followed by a space should be a file name that works in essentially the same way on all installations of TeX. Uppercase
letters are not considered equivalent to their lowercase counterparts in
file names; for example, if you refer to fonts cmr10 and CMR10, TeX
will not notice any similarity between them, although it might input the
same font metric file for both fonts.

In short, \input starts expanding tokens, ignoring spaces, until finding a nonspace unexpandable token which should be a character token (or a control sequence \let to a character token). Everything up to the first noncharacter token will be considered as part of the file name (expansion continues to be performed).

The .tex extension will usually be appended if not found in the specified file name (but this is implementation dependent; for example, Textures allowed file names without extension).

So bizarre code such as

\let\aletter=a
\input }{\aletter x

will try to input a file called }{ax.tex. A space token after the file name will be ignored.

Particular implementations can provide slightly different conventions; for instance, TeX Live allows the file name to be quoted between "; a leading " will make TeX look for another one (always doing expansion) and the file name will consist of every character token (including space tokens) until the trailing "; the two double quotes characters will be stripped off.

With XeTeX and LuaTeX still different conventions apply. For instance, LuaTeX allows \input {file} and will input file.tex; to the contrary, pdftex will try inputting {file}.tex.

Note that the same rules apply whenever TeX is looking for a file name, that is, for \openin, \openout and \font. However, XeTeX uses special conventions for the file name in \font that aren't applicable to other cases in which a 〈file name〉 is being looked for.

There are several other cases in which arguments to primitives are not braced: for instance

\dimen 123 = 1234pt

will not take just 1 as the register, but will go on (with macro expansion) until finding something that's not a digit (in this case the space).

In essentially all of these situations, a trailing space token will be swallowed, being considered a delimiter of the required tokens.