Well, today I finally achieved my ambition to build LuaTeX using Visual Studio. It took me about 25 hours of my evenings to do it but at long last I can now step through the code with a nice visual debugger to begin to understand more about this marvellous TeX engine. It wasn't trivial but neither was it quite as complex as I'd feared. Simply Happy Days! Here's a screenshot of it in action.

Well, it feels like it has taken a lot of writing to get to this, the final Part of A minimal LuaTeX setup on Windows. At the end of Part 5 we had discussed creating a minimal TDS-compliant directory structure to contain the file types we need to install for a minimal plain TeX setup:

TeX Font Metric files (extension .tfm)

Font encoding files (extension .enc)

Glyph data files (extension .pfb, on Windows)

The plain TeX format file (extension .fmt)

The plain TeX source files (plain.tex and hyphen.tex)

.map files (specifically for pdfTeX)

We decided on a minimal TDS-compliant directory, starting with c:\luatexblog\texmf. Considering just the fonts, they will be located in subdirectories of c:\luatexblog\texmf. We'll create a set of directories which follow the structure:

c:\luatexblog\texmf\fonts\[type]\[supplier]\[typeface]

Where [type] will be

tfm: for .tfm files (TeX font metrics)

type1: for .pfb files (Printer Font Binary)

Where [supplier] will be public (i.e., for free fonts) and [typeface] will simply be cm (for Computer Modern). In addition, under c:\luatexblog\texmf\fonts\ we'll need to create directories for

map: for .map files (pdfTeX and LuaTeX font mapping files)

enc: for .enc files (font encoding)

Finally, we directories to contain

plain TeX source files (plain.tex and hyphen.tex)

the plain TeX .fmt file

the texmf.cnf file that we'll write for Kpathsea

Going back to the Kpathsea documentation which gives a nice example of a skeleton TDS, you should create a directory structure that looks something like this:

Note that if you add more .pfb files under the directory c:\luatexblog\texmf\fonts\type1 then it is best practice to create a new subdirectory whose name reflects the supplier, for example I have added "adobe" as an example. Under each supplier you add a name for the typeface, e.g., utopia, and that's where you would put the .pfb files:

c:\luatexblog\texmf\fonts\type1\adobe\utopia\*.pfb

Here is where we will put the various files we need.

File type

TDS file path

.tfm

c:\luatexblog\texmf\fonts\tfm\public\cm\

.pfb

c:\luatexblog\texmf\fonts\type1\public\cm\

.enc

c:\luatexblog\texmf\fonts\enc\

.map

c:\luatexblog\texmf\fonts\map\

plain.tex

c:\luatexblog\texmf\tex\plain\base\

hyphen.tex

c:\luatexblog\texmf\tex\generic\hyphen\

texmf.cnf

c:\luatexblog\texmf\web2c\

plain.fmt

c:\luatexblog\texmf\web2c\

Filling these directories with files

Clearly, we will be generating plain.fmt and you saw in Part 5 where to get plain.tex and hyphen.tex. We will shortly be writing texmf.cnf by hand, so that leaves the following files to be obtained from somewhere:

TeX Font Metric files (.tfm)

Font encoding files (.enc)

Glyph data files (.pfb, on Windows)

.map files (specifically for pdfTeX)

But firstly, a note of caution

TeX Font Metric files (for text fonts) need to used and obtained with a little caution because they are tied to a specific font encoding. In general, and particularly with plain TeX (which assumes a certain encoding) you cannot just use them without knowing how they were encoded when they were created. For example, the afm2tfm.exe utility available from TeX Live (converts Adobe's AFM files to TeX's .tfm) can be given an encoding vector on its command line. Certainly, LaTeX has far more flexibility with encodings but plain TeX is rather less versatile.

Obtaining the .tfm files for the Computer Modern fonts

Using the method of browsing TeX Live, you can access the Computer Modern .tfm files here:

svn://tug.org/texlive/trunk/Master/texmf-dist/fonts/tfm/public/cm

One oddity: manfnt.tfm
This .tfm is required to build the plain TeX format and you can get it here.

Obtaining the .pfb files for the Computer Modern fonts

The American Mathematical Society provides the Computer Modern fonts in Adobe Type 1 format, which can be downloaded as part of their AMSFonts collection.

Obtaining the .enc files

What about luatex.exe?

Note: edit your PATH
Don't forget that you will need to add c:\luatexblog to your computer's PATH environment variable otherwise your PC won't be able to find luatex.exe when you try to run it!

Kpathsea and texmf.cnf

We are nearly finished! All we now need to do is tell Kpathsea where to locate the various files in our minimal TDS tree and we do this through a texmf.cnf file that we must save to c:\luatexblog\texmf\web2c\texmf.cnf.

If you look at the texmf.cnf file supplied with TeX Live it looks quite daunting and complex because Kpathsea's powerful searching algorithms allow you to construct quite complex expressions to describe paths and directory structures. Kpathsea allows you to create TeX installations of quite some complexity with multiple TDS trees being used for different purposes. We will not even touch a tiny fraction of Kpathsea's power and flexibility.

Describing the features of Kpathsea in detail is far beyond the scope of this post, perhaps one for another day. Interested readers should refer to the Kpathsea documentation and the texmf.cnf file available on theTeX Live repository – it contains very many helpful comments. For those who are comfortable reading C, there is a lot of additional information in the comments scattered throughout the Kpathsea source code. Happy reading!

Final steps

Set an environment variable called TEXMFCNF which tells Kpathsea where to start looking for your configuration files (texmf.cnf). For our installation it should be set to

TEXMFCNF=c:\luatexblog\texmf\web2c\

Enable Kpathsea debugging environment variables:

KPATHSEA_DEBUG_OUTPUT=c:/kspsluatex.log

KPATHSEA_DEBUG=-1

Put the following into a text file called texmf.cnf and save it toc:\luatexblog\texmf\web2c

Summary of these texmf.cnf variables

$SELFAUTOLOC: An environment variable set by Kpathsea (when it starts running) which gives the location of the executable; i.e., c:\luatexblog for luatex.exe as that is where we put it.

TEXINPUTS: This variable specifies where pdfTeX (and LuaTeX) finds its input files. Image files are considered input files and searched for along this path.

TEXFONTMAPS: Search path for font map (.map) files.

TFMFONTS: Search path for font metric (.tfm) files.

TEXFORMATS: Search path for format (.fmt) files.

T1FONTS: Search path for Type 1 font files (.pfa and .pfb).

ENCFONTS: Search path for encoding (.enc) files.

General notes

Kpathsea uses forward slashes "/"

In our example, TEXINPUTS starts with "." and has a second path "c:/luatexblog/texmf/tex//" (separated by ";") which ends in "//".

".": this means "the current directory".

The "//" means search recursively into the directory

";" is a separator for Kpathsea "path elements"

Something I have ignored: ls-R databases
Kpathsea can use an externally-built filename database file named ls-R that maps files to directories, thus avoiding the need to exhaustively search the disk. See the Kpathsea documentation for more detail.

Running luatex

If you type luatex --help at the DOS prompt you'll see a number of command-line options. But of these only 4 are of immediate interest:

--fmt=FORMAT

load the format file FORMAT

--ini

be iniluatex, for dumping formats

--output-directory=DIR

use DIR as the directory to write files to

--output-format=FORMAT

use FORMAT for job output; FORMAT is 'dvi' or 'pdf'

Note that when you run LuaTeX it will write the PDF file, and maybe others, to the current working directory unless you specify an another directory using --output-directory=DIR

Generating plain.fmt

We'll use the fact that LuaTeX will write to the current directory unless told otherwise. Start a DOS prompt and change the directory to where we want the plain.fmt file to be located:

c:\luatexblog\texmf\web2c

Run the command line:

c:\luatexblog\texmf\web2c> luatex --ini plain.tex \dump

You should see something like the following, and a file called plain.fmt created inc:\luatexblog\texmf\web2c

Conclusion

Over these 6 tutorials I have tried to cover, in general terms, some of the "TeX landscape" and to equip you with enough information to begin building your own LuaTeX test environment, should you wish to. Of course, I have omitted vast amounts of technical detail in the interest of simplicity and building "a conceptual framework" for your own investigations. I hope that I have not made any serious errors but if anyone spots some, do please let me know so that I can release corrected and updated posts.

I hope that somewhere "out on the web" someone has enjoyed these and found them to be useful.

A summary of what we are going to do next

This is going to be a pretty long post, and I've been working on it for days! I've been trying to get the right flow of ideas and the level of technical detail and concepts "just right". We'll cover a lot of ground, skimming over some deep details, but hopefully end up with something that is useful. As always, you are the main audience and judges of whether this content is a useful addition to "the blogosphere": if you want to comment, please do.

In this part of the tutorial we'll work towards a minimal LuaTeX installation using one of the simplest TeX formats: Donald Knuth's plain TeX format, as described in The TeXbook. To do this we will be taking the following steps in the process:

Identify all the various files and resources that LuaTeX will need to process a document written in the plain TeX format.

Work out the best way to organise these various file types on your hard drive.

Work out how we will tell Kpathsea where to find these files:

using environment variables;

using a minimal hand-written texmf.cnf file.

Build the plain TeX format file (plain.fmt).

What we are going to leave for later: staying simple
We are going to ignore the (wonderful world of) OpenType fonts (for now) and stay with the far simpler Adobe Type 1 PostScript fonts. In addition, we are not going to explore any LuaTeX-specific features such as \directlua{...}, purely to keep the discussion as simple as possible at this point.

The plain TeX format

Knuth's plain TeX format is described in great detail in The TeX Book, and has the advantage that it requires requires just two files to build the format file:

Step 1: What other files and resources does LuaTeX need for plain TeX?

When Donald Knuth wrote the original TeX engine, the output of "TeXing a document" was something called a DeVice Independent file (referred to from now on as a DVI file). Describing the DVI file format in detail is beyond the scope of this post and there are plenty of resources on the web which you can access for more detail; for example, the UK TUG FAQ is a good starting point, as is the Wikipedia entry.

Newer TeX engines have, of course, been developed to output PDF files in addition to DVI files, most notably, starting with pdfTeX. LuaTeX can be seen as an extension of pdfTeX and also outputs PDF directly.

Input, typesetting and output (DVI vs PDF)

To help with understanding the following sections, it will be useful to consider the "typesetting process" as built up of three fundamental activities:

For current purposes, it is the process of "writing the typeset result to a file" which we need to discuss. Compared to writing DVI files, when writing PDF files TeX engines need access to additional resources, and that is going to affect the resources we need to make available through our setup and installation. Of course, both DVI and PDF files contain a representation the typesetting work done by the TeX engine; however, they differ in one very important way: PDF files output directly by TeX engines embed (i.e., write into) the PDF file the actual data required to display fonts, whereas DVI files do not. You can think of the DVI file format as an "intermediate file format" which provides a description of the typeset results, but to visualise the results described by DVI files they have to be processed by external applications. It is the job of these external applications, often called "drivers", to make sure that they have access to the data required to display fonts: whether on a screen, on a desktop printer or any other device. The philosophy behind the design of the DVI file format was to create a representation of the typeset result which could then be output on any device through the use of the appropriate "device driver software": leaving the messy device-dependent details to external applications. Hence the name DeVice Independent file. It also explains (in part!) why DVI files are tiny compared to their PDF counterparts: DVI files do not contain font (glyph) data (or images etc), whereas those resources are embedded into PDF files produced by TeX engines.

In summary, the most important point for us is that for TeX to output a DVI file it does not need access to the actual data required to display the fonts used in your document. Now, if you are new to TeX this may seem very strange and almost a contradiction: a typesetting program that does not need access to fonts? To explain this, we need to be very clear on precisely what conventional or original TeX engines actually understand by "a font".

Of metrics, characters, glyphs and encodings

The subject of fonts is a huge topic, one I intend to write about in future posts, but for now I need to introduce four key concepts at this point in the story: metrics, characters, glyphs and encodings.

Characters and glyphs

I thought long and hard about how to explain the difference between characters and glyphs but I think the the Unicode standard does it as well as anything I've read, so I'd like to quote from the Unicode standard (version 6.0) which says:

Characters are the abstract representations of the smallest components of written language that have semantic value. They represent primarily, but not exclusively, the letters, punctuation, and other signs that constitute natural language text and technical notation.

Glyphs represent the shapes that characters can have when they are rendered or displayed. In contrast to characters, glyphs appear on the screen or paper as particular representations of one or more characters.

So, you can think of a character as being the name of a fundamental building block of a language (e.g., the letter 'capital A') and a glyph is a character being expressed in a specific visual form. So, for example, the following SVG graphic show 4 glyphs representing the character 'capital A':

Metrics

Firstly, I must stress again, that I am not discussing OpenType font technology but restricting the discussion to the older world of Adobe Type 1 fonts for use with plain TeX. As far as TeX engines are concerned, to do their job of typesetting they treat glyphs as simple boxes and all they want to know is three simple values for each glyph you want to typeset: width, height and depth. This set of numbers are called metrics.

Now, I have deliberately used the term glyph, not character, because a glyph is the visual representation of a character and, clearly, it is the 'size' of the glyph boxes that TeX wants to know. For example, you can easily see this because if you type a row of characters, say, 'capital A', in Microsoft Word and apply a different typeface to each one then it should be clear that the width of each glyph depends on the typeface you have applied: i.e., the specific visual representation, i.e., the glyph. So, when you use a paricular "font" with TeX, all that TeX is worried about are the metrics which provide numeric information about the glyphs. The typesetting algorithms inside the TeX engine do not care about the specifics of what the glyph looks like, it just wants the metrics so that it can calculate line breaks, compute the layout of a formula or where to end the page.

TeX font metric files

When the TeX engine is typesetting your document, breaking paragraphs into lines, constructing mathematical formulae, all it wants to know are some numeric values about the glyphs in the fonts you are using. It does not care about what the individual glyphs actually look like.

Metrics in reality: more than just width, height, depth
I have simplified the discussion somewhat. The actual metrics used by TeX engines include a range of additional data values which I won't discuss here but I just want to note that real metrics contain more data than just width, height and depth of glyphs. Actually, in reality, there are two classes of metrics that TeX engines require: metrics for text fonts and metrics for math fonts. To typeset mathematics, TeX engines need some additional numbers (metrics) which the TeX engine uses to control the processes which construct the typeset formula.

Encodings

Font encoding is a messy topic, one which is impossible to cover thoroughly in a few lines. So, my apologies in advance to any experts reading this, but I'm aiming for "minimal simplicity", at this point. Although a pretty obvious thing to say, what we need to realise is that when software is storing or processing text data, it is actually working with numbers: numbers which represent characters. When it comes to displaying the text (which internally is being stored as numbers) there has a be a process to decide which characters are actually being represented by that set of numbers. We need some form of "mapping" from those numbers to the characters they are expected represent. That mapping is called... the encoding. An encoding is simply a set of numbers which are allocated to a specific range of characters.

Introduction to Adobe Type 1 PostScript fonts (ignoring OpenType)

To assist with the discussions, we should think of a "font" as consisting of two files:

the font metrics: a file containing numeric data describing the width, height, depth of the glyphs in the font. Again, I stress this is a simplification because real metric files contain a range of additional data.

the font glyphs: this is the data which describes how to draw the glyphs themselves, i.e., the lines and curves from which glyph shapes are built.

Within the world of Adobe Type 1 PostScript fonts, the font metric files are called Adobe Font Metrics or AFM files (.afm) and the font glyphs (on Windows) are stored in a separate file called Printer Font Binary (PFB or .pfb files). AFM files are a simple text file format whereas PFB files are a compact binary format.

Do TeX engines use Adobe Font Metrics files?

No. TeX engines expect their metric files in a very specific format called the TeX Font Metric (TFM, .tfm) format. This is a highly compact binary format, unlike Adobe's AFM format. Of course, there are utilities to convert from AFM files to TFM files for use with TeX; for example, afm2tfm.exe shipped with TeX Live.

Answer to: What other files and resources does LuaTeX need for plain TeX?

Finally, we are in a position to answer this question. For LuaTeX to successfully output a PDF file containing the typeset results it needs access to:

TeX Font Metric files (extension .tfm)

Font encoding files (extension .enc)

Glyph data files (extension .pfb, on Windows)

The plain TeX format file (extension .fmt)

The file(s) containing your document (extension .tex etc)

And one we have not explained: .map files (specifically for pdfTeX)

In addition, of course, to any graphics you want to include but we'll leave graphics to the LaTeX format.

pdfTeX and .map files: a primer
We have noted that TeX engines such as pdfTeX and LuaTeX can output direct to PDF but to do so they need access to the actual font data files which contain the descriptions of what the glyphs look like (.pfb files on Windows) so that they can embed this data into the PDF. We have seen that, for the pure typesetting purposes, TeX engines only need access to TeX font metrics. The magic ingredient which connects the two is called a font map file and is a specific requirement for pdfTeX-related TeX engines. Quoting a slightly edited extract from the pdfTeX manual:

"Font map files provide the connection between TeX tfm font files and the outline font file names (.pfb files). They contain also information about re-encoding arrays, partial font embedding (“subsetting”), and character transformation parameters (like SlantFont and ExtendFont). Those map files were first created for dvi postprocessors. But, as pdfTeX in pdf output mode includes all pdf processing steps, it also needs to know about font mapping, and therefore reads in one or more map files. Map files are not read in when pdfTeX is in dvi mode. By default, pdfTeX reads the map file pdftex.map. In Web2c, map files are searched for using the TEXFONTMAPS config file value and environment variable."

Step 2: Work out the best way to organise these various file types on your hard drive

So we have identified the resources that LuaTeX needs, we now need to think about how should we organise these files in the most appropriate way: i.e., a suitable directory structure. Readers who have been following this set of tutorials may already have seen the posting on the TeX Directory Structure (TDS), and that's what I'll use to guide the remainder of this tutorial.

Reminder: Kpathsea!
Don't forget that the Kpathsea path-searching library (built into LuaTeX) is the vital component that will be searching through the TDS directory structure to locate the files and resources that LuaTeX will be looking for. Consequently, it is a good idea to make sure that your directory structure is optimised to make the best use of Kpathsea, so that LuaTeX can find files as fast as possible.

You can have multiple TDS trees: thanks to Kpathsea!
The Kpathsea library supports the use of multiple TDS trees so that you can split a big TeX installation into multiple directory structures, providing flexibility to manage your TeX installation. A paper by By Michael J Downes: Managing Multiple TDS Trees covers this in some detail.

"In this document, we shall designate the root TDS directory by "texmf" (for “TEX and METAFONT”). We recommend using that name where possible, but the actual name of the directory is up to the installer. On PC networks, for example, this could map to a logical drive specification such as T:. Similarly, the location of this directory on the system is site-dependent. It may be at the root of the file system; on Unix systems, /usr/local/share, /usr/local, /usr/local/lib, and /opt are common choices."

So, our first task is to create a directory structure which is rooted in a directory called "texmf". As the TDS specification says "the location of this directory on the system is site-dependent" so we can put it wherever we choose. On my PC I'm going to use

c:\luatexblog\texmf

Further, section 3 Top-level directories of the TDS specification says that "the directories under the texmf root identify the major components of a TeX system". However, the TDS specification also notes (section A Unspecified pieces) that the location of certain file types is not covered by the recommendations:

The location of executable programs is too site-dependent to recommend a location. A site may place executables outside the texmf tree altogether, in a platform-dependent directory within texmf, or elsewhere.

The location of implementation-specific files (e.g., TeX .fmt files): by their nature, these must be left to the implementor or TeX maintainer.

So, where we put the luatex.exe file and the plain TeX .fmt file is up to us. The Kpathsea documentation (currently for version 6, July 2010) gives a nice example of a skeleton TDS which I'll use for our minimal install for plain TeX. Based on the reasoning above, we need to define directories which contain:

TeX Font Metric files (extension .tfm)

Font encoding files (extension .enc)

Glyph data files (extension .pfb, on Windows)

The plain TeX format file (extension .fmt)

The plain TeX source files (plain.tex and hyphen.tex)

.map files (specifically for pdfTeX)

These will be subdirectories of c:\luatexblog\texmf. Starting with the font-related directories, we'll create a set of directories which follow the structure:

c:\luatexblog\texmf\fonts\[type]\[supplier]\[typeface]

Where [type] will be

tfm: for .tfm files (TeX font metrics)

type1: for .pfb files (Printer Font Binary)

Where [supplier] will be public (i.e., for free fonts) and [typeface] will simply be cm (for Computer Modern). You can see this is the directory structure used on TeX Live:

In addition, under c:\luatexblog\texmf\fonts\ we'll need to create directories for

map: for .map files (pdfTeX and LuaTeX font mapping files)

enc: for .enc files (font encoding)

Now we just need a directories to contain

plain TeX source files (plain.tex and hyphen.tex)

the plain TeX .fmt file

the texmf.cnf file that we'll write for Kpathsea

As discussed in previous posts, Kpathsea uses a mixture of environment variables and configuration files (called texmf.cnf) to perform its path-searching magic. Actually, it uses a fairly complex interplay between environment variables and variables named in configuration files. I'm not going to explore this because it is described, in detail, in the Kpathsea documentation.

Giving Kpathsea a starting point

Clearly, when you start LuaTeX (and hence Kpathsea) there has got to be some way for the Kpathsea library to "hook into" your computer setup, a kind of "entry point" if you like so that it knows where to find your texmf.cnf file(s). You do this by setting an environment variable called TEXMFCNF which tells Kpathsea where to start looking for your configuration files (texmf.cnf).

Debugging Kpathsea searches

Most of us have, at one time or another, experienced situations where the TeX engine cannot locate a particular file or class of files. This can be rather frustrating so it is well worth setting a couple of environment variables to switch on Kpathsea's debugging (creating a log file): telling you where Kpathsea is looking for a particular file, or type of file. This can be extremely helpful to diagnose "can't find file" errors. The environment variables you need to set are KPATHSEA_DEBUG and KPATHSEA_DEBUG_OUTPUT.

KPATHSEA_DEBUG_OUTPUT: this is the path and name of the log file to record the debug output (for example, KPATHSEA_DEBUG_OUTPUT=c:/kspsluatex.log).

KPATHSEA_DEBUG: this takes a numeric value which controls the type of debugging output to generate. If you set it to -1 then Kpathsea will log everything into the file pointed to by KPATHSEA_DEBUG_OUTPUT. However, note that setting KPATHSEA_DEBUG=-1 will create a lot of output. Other values for KPATHSEA_DEBUG are documented here.

OK, I think that's enough for one post. I'll (hopefully) complete this series in the next tutorial. Until then, stay tuned and let me know if you spot errors in the above. I'll fix a few tiredness-induced typos in this post, too! (update: ahem, few typos fixed... was rather late to be writing this...)

Time for a summary and review

My objective in writing this set of tutorials is to equip you with the basic information you need to create a truly minimal LuaTeX installation (running under Windows) for the purposes of experimenting with this amazing TeX engine. So, why might you want to do that, you may ask? A good question and here's my personal viewpoint. LuaTeX is still rapidly evolving: it is in beta status (0.65 as I write this on 17 January, 2011) and a stable production version is likely to be at least a year away. Despite the brilliant team behind it, LuaTeX, like all beta software, is going to have bugs that need to be fixed. In addition, there are internals to the LuaTeX executable program that may change (APIs, libraries) as it evolves and matures. Running LuaTeX with an absolutely minimal setup allows you to create a controlled environment, of minimal complexity, which can help you save time when tracking down any problems you encounter during experiments and testing. There is, as always in life, a flip side to the story.

You will not have access to all the wonderful tools, packages, fonts etc that distributions such as TeX Live or MiKTeX provide, ready-installed for you. At least initially, be prepared to encounter lots of instances where LuaTeX cannot locate particular packages or other resources, simply because you have not installed them into your minimal environment.

When approaching some of the mailing lists for help or advice, you'll need to be aware that your setup will be different to the ones being used by the wider community. On occasion, this may make it a bit more difficult to get advice or guidance but, to date, in my experience it has not been a serious issue.

However, you'll soon build up the collection of packages, fonts etc that you need for your work or experiments. In addition, you learn a great deal about how LuaTeX operates and, through that, develop problem-solving techniques which will always be useful.

Where are we and where are we going?

To summarise:

We have introduced the Kpathsea library that LuaTeX uses to locate files.

We have seen that Kpathsea uses a mixture of environment variables and configuration files (texmf.cnf files) to locate files and resources on your computer and so tell LuaTeX where to find things.

We have not yet covered, in any detail, the very important topic of "format files" (.fmt files) which are the last topic we need to address before moving on to the details of installation.

Format files, what are they?

In an earlier part of this tutorial series we discussed the difference between a "TeX engine" (e.g., LuaTeX, as in luatex.exe) and "macro packages" such as LaTeX. For anyone new to the TeX world, these differences can, initially, be quite confusing. I'll recap here to help clarify the difference. When you download and run luatex.exe without any setup or installation you are running luatex.exe in its most "primitive form": it understands hundreds of very "low-level" commands which are built into the executable file itself. These low-level commands, (so-called "primitives"), built into the executable file, are the fundamental building blocks from which tools and commands to typeset documents are written and constructed: commands written by "TeX programmers" for everyday use by people who want to typeset beautiful documents. In essence, the raw luatex.exe file understands a very powerful, but low-level, "typesetting programming language", but one which is not intended to be used directly (entirely) "as is". Instead, it is the job of "TeX programmers" to use this programming language to write "higher level commands" designed for "users" to write their documents. You will soon get to know these "higher level commands" as macros, for that is how they are referred to in the mainstream TeX literature. The ability to program with these hundreds of low-level commands is what gives TeX engines such power and flexibility to typeset almost anything you can think of. The "typesetting programming language", which I'll refer to as "TeX", can be used to write anything from a quick one-line, time-saving macro through to complete document preparation systems containing of tens of thousands of lines of TeX programming code. In reality, the TeX typesetting (programming) language is, for many (including me :-(), an extraordinarily difficult language to program with, and I have the most profound admiration and respect (and some envy ;-)) for those who have mastered it.

So, the TeX language gives you the ability to write (i.e., program) a comprehensive set of commands that you can use to write and prepare your documents: i.e., to format them using your preferred "document preparation system" (i.e., large collection of TeX macros). And finally we get to the point: LaTeX is one such "document preparation system" written in the TeX language to format your document according to the commands provided through LaTeX. So, we can speak of the "LaTeX format". Of course this is not the end of the story because although LaTeX provides a great many commands, it also provides a framework through which it can be extended even further. TeX programmers can, and do, of course, write "add on packages" for LaTeX which extend the core LaTeX program to provide specialist applications, additional features and so forth: so-called "LaTeX packages". Think of there being a "set of core LaTeX commands" plus "add-on LaTeX packages". The "set of core LaTeX commands" is intended to be stable and not change so that the commands and tools it provides can be relied upon by LaTeX package developers without fear that their code suddenly breaks due to changes in the "LaTeX core" (suddenly making their code incompatible). Now, I hope it is clear that if you want to typeset any document prepared with the LaTeX format then each time you run your "TeX engine" to typeset your LaTeX-based document, your TeX engine has to process the thousands of lines of code from which LaTeX is written.

Hmmm, sounds a painfully slow process? Executing all that raw LaTeX code each time I run my TeX engine? Well, no, because TeX offers a way round this using format files.

What TeX can do is process all that raw LaTeX code (i.e., just plain text) just once and dump out the processed code into a binary file version which it can reload and execute far faster than processing the original plain text source code form. This "binary brain dump" that TeX spits out is called a format file and has the extension .fmt. TeX engines can load and execute these "binary brain dumps" far faster than loading and executing the raw unprocessed plain text source code version, and that's why they do it.

So, a "format file" is your TeX engine's "binary brain dump" equivalent of a large collection of plain text macros which collectively provide document formatting capabilities. LaTeX is one example format, other include the venerable plain TeX (written by Donald Knuth) and the amazingly powerful ConTeXt format written by Hans Hagen.

Making TeX engines perform a binary brain dump

OK, so how do I convince TeX engines to do "binary brain dumps"? You just have to ask nicely, speaking gently via the command line. If have a copy of the raw luatex.exe file and you type luatex --help at the DOS prompt you will see a summary of the numerous command line options for LuaTeX. Two of these are of interest:

--fmt=FORMAT

load the format file FORMAT

--ini

be iniluatex, for dumping formats

--fmt=FORMAT: a primer
Suppose you have a document (written using the LaTeX format) called thesis.tex and you want to typeset it with the LuaTeX engine. Assuming you have already generated the latex.fmt file, you would type something like this at the command line:

luatex --fmt=latex thesis.tex

All being well, LuaTeX would output thesis.pdf containing your beautifully typeset masterpiece. Clearly, thesis.tex will have to be in a location where LuaTeX (i.e., Kpathsea) can locate it.

What's this --ini option do?

In short, the --ini command line option tells the TeX engine to go into its special "brain dump mode" to generate a .fmt file. Clearly, the TeX engine needs a bit more than just the --ini option, it needs the source code that it is expected to process and dump out to a .fmt file.

Typically, to generate a particular format, e.g., the LuaLaTeX format (more on that in future posts), you need to acquire something called an .ini file named, unsurprisingly, "formatname.ini", as in lualatex.ini for the LuaLaTeX format.
These .ini files contain all the magic and code required to generate the .fmt file and are usually available on CTAN or found by browsing TeX Live. Successfully processing .ini files will require additional files which the .ini file "inputs". For example, here is the lualatex.ini file sitting on my PC.

Did you spot the missing link?

This is all very well, but before I can use LuaTeX I have to generate .fmt files, but how does LuaTeX find all these .ini files, .fmt files in the first place? You guessed it: we need texmf.cnf before we can do anything like this. Stay tuned.

In Parts 1, 2 and 2a I made reference to such things as "environment variables" and a file called texmf.cnf. I this section of the tutorial I will try to bring these concepts together and get down to some practical details. But first we need to set the scene with a very important topic: Kpathsea.

Kpathsea, what's that?

The name Kpathsea originates from Karl's pathsearching, named after the author of the library, Karl Berry. As discussed on the Kpathsea web pages, it was written to provide a number of programs with a unified method for path-searching: finding files and resources that those programs needed during the course of execution. TeX and related programs are designed to run on very many different platforms (Windows, Linux, Mac...) and one of the keys to doing this successfully is ensure that the underlying source code from which they are built is as platform-independent as is possible to achieve.

So, in essence, Kpathsea is a path-searching library which is built into the TeX executable file ("TeX engine") to provide unified methods for path-searching (finding files) that will work on many different computer platforms. When looking for font metric files, .tex files, .sty files, graphics and many other file resources, the "TeX engine" (e.g., LuaTeX) is using the Kpathsea library to find what it needs to typeset your document.

LuaTeX offers an alternative: a primer
For LuaTeX this is just one part of the story. With LuaTeX you can implement your own Lua code to replace the Kpathsea file searching methods via LuaTeX's "callback" methods. That's for another day, here we'll stick to using Kpathsea. If you are interested to read more about the powerful LuaTeX callbacks, consult The LuaTeX Reference Manual.

Kpathsea, texmf.cnf and environment variables

So, at long last you may say :-), we've finally reached the point where the various components start to come together.

texmf.cnf is the name given to the runtime configuration files used by the Kpathsea path-searching library to locate resources (find files) on your computer and tell the "TeX engine", including LuaTeX, where the "TeX engine" can locate the file(s) it needs in order to typeset your document.

a TeX installation may use multiple texmf.cnf files to configure the installation. The use of multiple texmf.cnf files is discussed in the main Kpathsea documention.

In addition to using texmf.cnf files, Kpathsea also uses environment variables to find resources. In essence, Kpathsea uses a mixture of environment variables and values stored in texmf.cnf files as a way to locate resources.

Kpathsea also sets environment variables

In addition to reading environment variables from your computer, Kpathsea alsosets some environment variables too: SELFAUTOLOC, SELFAUTODIR and SELFAUTOPARENT to the location, parent and grandparent directory of the executable (LuaTeX in our case).

A simple example

If you are reading this and already have a working LuaTeX installation, the following LuaTeX example will typeset the values of SELFAUTOLOC, SELFAUTODIR and SELFAUTOPARENT from your LuaTeX configuration.

TeX Live is a comprehensive TeX distribution which includes a huge collection of TeX packages, fonts, tools, utilities and binaries for Unix, GNU/Linux and Windows. You can browse the master TeX Live repository online using TortoiseSVN. Note that TortoiseSVN is a shell application which means that is designed to work via Windows explorer.