This document describes the requirements, design, and configuration of the
LLVM compiler driver, llvmc. The compiler driver knows about LLVM's
tool set and can be configured to know about a variety of compilers for
source languages. It uses this knowledge to execute the tools necessary
to accomplish general compilation, optimization, and linking tasks. The main
purpose of llvmc is to provide a simple and consistent interface to
all compilation tasks. This reduces the burden on the end user who can just
learn to use llvmc instead of the entire LLVM tool set and all the
source language compilers compatible with LLVM.

The llvmctool is a configurable compiler
driver. As such, it isn't a compiler, optimizer,
or a linker itself but it drives (invokes) other software that perform those
tasks. If you are familiar with the GNU Compiler Collection's gcc
tool, llvmc is very similar.

The following introductory sections will help you understand why this tool
is necessary and what it does.

At a high level, llvmc operation is very simple. The basic action
taken by llvmc is to simply invoke some tool or set of tools to fill
the user's request for compilation. Every execution of llvmctakes the
following sequence of steps:

Collect Command Line Options

The command line options provide the marching orders to llvmc
on what actions it should perform. This is the request the user is making
of llvmc and it is interpreted first. See the llvmcmanual page for details on the
options.

Read Configuration Files

Based on the options and the suffixes of the filenames presented, a set
of configuration files are read to configure the actions llvmc will
take. Configuration files are provided by either LLVM or the
compiler tools that llvmc invokes. These files determine what
actions llvmc will take in response to the user's request. See
the section on configuration for more details.

Determine Phases To Execute

Based on the command line options and configuration files,
llvmc determines the compilation phases that
must be executed by the user's request. This is the primary work of
llvmc.

Determine Actions To Execute

Each phase to be executed can result in the
invocation of one or more actions. An action is
either a whole program or a function in a dynamically linked shared library.
In this step, llvmc determines the sequence of actions that must be
executed. Actions will always be executed in a deterministic order.

Execute Actions

The actions necessary to support the user's
original request are executed sequentially and deterministically. All
actions result in either the invocation of a whole program to perform the
action or the loading of a dynamically linkable shared library and invocation
of a standard interface function within that library.

Termination

If any action fails (returns a non-zero result code), llvmc
also fails and returns the result code from the failing action. If
everything succeeds, llvmc will return a zero result code.

llvmc's operation must be simple, regular and predictable.
Developers need to be able to rely on it to take a consistent approach to
compilation. For example, the invocation:

To accomplish this, llvmc uses a very simple goal oriented
procedure to do its work. The overall goal is to produce a functioning
executable. To accomplish this, llvmc always attempts to execute a
series of compilation phases in the same sequence.
However, the user's options to llvmc can cause the sequence of phases
to start in the middle or finish early.

Phases

llvmc breaks every compilation task into the following five
distinct phases:

Preprocessing

Not all languages support preprocessing;
but for those that do, this phase can be invoked. This phase is for
languages that provide combining, filtering, or otherwise altering with the
source language input before the translator parses it. Although C and C++
are the most common users of this phase, other languages may provide their
own preprocessor (whether its the C pre-processor or not).

Translation

The translation phase converts the source
language input into something that LLVM can interpret and use for
downstream phases. The translation is essentially from "non-LLVM form" to
"LLVM form".

Optimization

Once an LLVM Module has been obtained from
the translation phase, the program enters the optimization phase. This phase
attempts to optimize all of the input provided on the command line according
to the options provided.

Linking

The inputs are combined to form a complete
program.

The following table shows the inputs, outputs, and command line options
applicable to each phase.

Phase

Inputs

Outputs

Options

Preprocessing

Source Language File

Source Language File

-E

Stops the compilation after preprocessing

Translation

Source Language File

LLVM Assembly

LLVM Bytecode

LLVM C++ IR

-c

Stops the compilation after translation so that optimization and
linking are not done.

-S

Stops the compilation before object code is written so that only
assembly code remains.

Optimization

LLVM Assembly

LLVM Bytecode

LLVM Bytecode

-Ox

This group of options controls the amount of optimization
performed.

Linking

LLVM Bytecode

Native Object Code

LLVM Library

Native Library

LLVM Bytecode Executable

Native Executable

-L

Specifies a path for library search.

-l

Specifies a library to link in.

Actions

An action, with regard to llvmc is a basic operation that it takes
in order to fulfill the user's request. Each phase of compilation will invoke
zero or more actions in order to accomplish that phase.

This section of the document describes the configuration files used by
llvmc. Configuration information is relatively static for a
given release of LLVM and a compiler tool. However, the details may
change from release to release of either. Users are encouraged to simply use
the various options of the llvmc command and ignore the configuration
of the tool. These configuration files are for compiler writers and LLVM
developers. Those wishing to simply use llvmc don't need to understand
this section but it may be instructive on how the tool works.

Overview

llvmc is highly configurable both on the command line and in
configuration files. The options it understands are generic, consistent and
simple by design. Furthermore, the llvmc options apply to the
compilation of any LLVM enabled programming language. To be enabled as a
supported source language compiler, a compiler writer must provide a
configuration file that tells llvmc how to invoke the compiler
and what its capabilities are. The purpose of the configuration files then
is to allow compiler writers to specify to llvmc how the compiler
should be invoked. Users may but are not advised to alter the compiler's
llvmc configuration.

Because llvmc just invokes other programs, it must deal with the
available command line options for those programs regardless of whether they
were written for LLVM or not. Furthermore, not all compiler tools will
have the same capabilities. Some compiler tools will simply generate LLVM assembly
code, others will be able to generate fully optimized byte code. In general,
llvmc doesn't make any assumptions about the capabilities or command
line options of a sub-tool. It simply uses the details found in the
configuration files and leaves it to the compiler writer to specify the
configuration correctly.

This approach means that new compiler tools can be up and working very
quickly. As a first cut, a tool can simply compile its source to raw
(unoptimized) bytecode or LLVM assembly and llvmc can be configured
to pick up the slack (translate LLVM assembly to bytecode, optimize the
bytecode, generate native assembly, link, etc.). In fact, the compiler tools
need not use any LLVM libraries, and it could be written in any language
(instead of C++). The configuration data will allow the full range of
optimization, assembly, and linking capabilities that LLVM provides to be added
to these kinds of tools. Enabling the rapid development of front-ends is one
of the primary goals of llvmc.

As a compiler tool matures, it may utilize the LLVM libraries and tools
to more efficiently produce optimized bytecode directly in a single compilation
and optimization program. In these cases, multiple tools would not be needed
and the configuration data for the compiler would change.

Configuring llvmc to the needs and capabilities of a source language
compiler is relatively straight-forward. A compiler writer must provide a
definition of what to do for each of the five compilation phases for each of
the optimization levels. The specification consists simply of prototypical
command lines into which llvmc can substitute command line
arguments and file names. Note that any given phase can be completely blank if
the source language's compiler combines multiple phases into a single program.
For example, quite often pre-processing, translation, and optimization are
combined into a single program. The specification for such a compiler would have
blank entries for pre-processing and translation but a full command line for
optimization.

Each configuration file provides the details for a single source language
that is to be compiled. This configuration information tells llvmc
how to invoke the language's pre-processor, translator, optimizer, assembler
and linker. Note that a given source language needn't provide all these tools
as many of them exist in llvm currently.

In the directories searched, each configuration file is given a specific
name to foster faster lookup (so llvmc doesn't have to do directory searches).
The name of a given language specific configuration file is simply the same
as the suffix used to identify files containing source in that language.
For example, a configuration file for C++ source might be named
cpp, C, or cxx. For languages that support multiple
file suffixes, multiple (probably identical) files (or symbolic links) will
need to be provided.

Which configuration files are read depends on the command line options and
the suffixes of the file names provided on llvmc's command line. Note
that the -x LANGUAGE option alters the language that llvmc
uses for the subsequent files on the command line. Only the configuration
files actually needed to complete llvmc's task are read. Other
language specific files will be ignored.

Syntax

The syntax of the configuration files is very simple and somewhat
compatible with Java's property files. Here are the syntax rules:

The file encoding is ASCII.

The file is line oriented. There should be one configuration definition
per line. Lines are terminated by the newline (0x0A) and/or carriage return
characters (0x0D)

A backslash (\) before a newline causes the newline to be
ignored. This is useful for line continuation of long definitions. A
backslash anywhere else is recognized as a backslash.

A configuration item consists of a name, an = and a value.

A name consists of a sequence of identifiers separated by period.

An identifier consists of specific keywords made up of only lower case
and upper case letters (e.g. lang.name).

Values come in four flavors: booleans, integers, commands and
strings.

Valid "false" boolean values are false False FALSE no No NO
off Off and OFF.

Commands start with a program name and are followed by a sequence of
words that are passed to that program as command line arguments. Program
arguments that begin and end with the % sign will have their value
substituted. Program names beginning with / are considered to be
absolute. Otherwise the PATH will be applied to find the program to
execute.

Strings are composed of multiple sequences of characters from the
character class [-A-Za-z0-9_:%+/\\|,] separated by white
space.

White space on a line is folded. Multiple blanks or tabs will be
reduced to a single blank.

White space before the configuration item's name is ignored.

White space on either side of the = is ignored.

White space in a string value is used to separate the individual
components of the string value but otherwise ignored.

Comments are introduced by the # character. Everything after a
# and before the end of line is ignored.

The table below provides definitions of the allowed configuration items
that may appear in a configuration file. Every item has a default value and
does not need to appear in the configuration file. Missing items will have the
default value. Each identifier may appear as all lower case, first letter
capitalized or all upper case.

Name

Value Type

Description

Default

LLVMC ITEMS

version

string

Provides the version string for the contents of this
configuration file. What is accepted as a legal configuration file
will change over time and this item tells llvmc which version
should be expected.

b

LANG ITEMS

lang.name

string

Provides the common name for a language definition.
For example "C++", "Pascal", "FORTRAN", etc.

blank

lang.opt1

string

Specifies the parameters to give the optimizer when
-O1 is specified on the llvmc command line.

-simplifycfg -instcombine -mem2reg

lang.opt2

string

Specifies the parameters to give the optimizer when
-O2 is specified on the llvmc command line.

TBD

lang.opt3

string

Specifies the parameters to give the optimizer when
-O3 is specified on the llvmc command line.

TBD

lang.opt4

string

Specifies the parameters to give the optimizer when
-O4 is specified on the llvmc command line.

TBD

lang.opt5

string

Specifies the parameters to give the optimizer when
-O5 is specified on the llvmc command line.

TBD

PREPROCESSOR ITEMS

preprocessor.command

command

This provides the command prototype that will be used
to run the preprocessor. This is generally only used with the
-E option.

<blank>

preprocessor.required

boolean

This item specifies whether the pre-processing phase
is required by the language. If the value is true, then the
preprocessor.command value must not be blank. With this option,
llvmc will always run the preprocessor as it assumes that the
translation and optimization phases don't know how to pre-process their
input.

false

TRANSLATOR ITEMS

translator.command

command

This provides the command prototype that will be used
to run the translator. Valid substitutions are %in% for the
input file and %out% for the output file.

<blank>

translator.output

bytecode or assembly

This item specifies the kind of output the language's
translator generates.

bytecode

translator.preprocesses

boolean

Indicates that the translator also preprocesses. If
this is true, then llvmc will skip the pre-processing phase
whenever the final phase is not pre-processing.

false

OPTIMIZER ITEMS

optimizer.command

command

This provides the command prototype that will be used
to run the optimizer. Valid substitutions are %in% for the
input file and %out% for the output file.

<blank>

optimizer.output

bytecode or assembly

This item specifies the kind of output the language's
optimizer generates. Valid values are "assembly" and "bytecode"

bytecode

optimizer.preprocesses

boolean

Indicates that the optimizer also preprocesses. If
this is true, then llvmc will skip the pre-processing phase
whenever the final phase is optimization or later.

false

optimizer.translates

boolean

Indicates that the optimizer also translates. If
this is true, then llvmc will skip the translation phase
whenever the final phase is optimization or later.

false

ASSEMBLER ITEMS

assembler.command

command

This provides the command prototype that will be used
to run the assembler. Valid substitutions are %in% for the
input file and %out% for the output file.

On any configuration item that ends in command, you must
specify substitution tokens. Substitution tokens begin and end with a percent
sign (%) and are replaced by the corresponding text. Any substitution
token may be given on any command line but some are more useful than
others. In particular each command should have both an %in%
and an %out% substitution. The table below provides definitions of
each of the allowed substitution tokens.

Substitution Token

Replacement Description

%args%

Replaced with all the tool-specific arguments given
to llvmc via the -T set of options. This just allows
you to place these arguments in the correct place on the command line.
If the %args% option does not appear on your command line,
then you are explicitly disallowing the -T option for your
tool.

%force%

Replaced with the -f option if it was
specified on the llvmc command line. This is intended to tell
the compiler tool to force the overwrite of output files.

%in%

Replaced with the full path of the input file. You
needn't worry about the cascading of file names. llvmc will
create temporary files and ensure that the output of one phase is the
input to the next phase.

%opt%

Replaced with the optimization options for the
tool. If the tool understands the -O options then that will
be passed. Otherwise, the lang.optN series of configuration
items will specify which arguments are to be given.

%out%

Replaced with the full path of the output file.
Note that this is not necessarily the output file specified with the
-o option on llvmc's command line. It might be a
temporary file that will be passed to a subsequent phase's input.

%stats%

If your command accepts the -stats option,
use this substitution token. If the user requested -stats
from the llvmc command line then this token will be replaced
with -stats, otherwise it will be ignored.

%target%

Replaced with the name of the target "machine" for
which code should be generated. The value used here is taken from the
llvmc option -march.

%time%

If your command accepts the -time-passes
option, use this substitution token. If the user requested
-time-passes from the llvmc command line then this
token will be replaced with -time-passes, otherwise it will
be ignored.

Any common programming language (e.g. C, C++, Java, Stacker, ML,
FORTRAN). These languages are distinguished from any of the lower level
languages (such as LLVM or native assembly), by the fact that a
translationphase
is required before LLVM can be applied.