Identifiers start with a letter or underscore, then may contain additionally letters, digits, and underscores. Identifiers don't have any limit on length at the moment, but some sane-but-generous length limit may be imposed in the future (256 chars, 1024 chars?). The following examples are all valid identifiers.

a
_a
A42

Opcode names are not reserved words in PIR, and may be used as variable names. For example, you can define a local variable named print. Note that currently, by using an opcode name as a local variable name, the variable will hide the opcode name, effectively making the opcode unusable. In the future this will be resolved.

The PIR language is designed to have as few reserved keywords as possible. Currently, in contrast to opcode names, PIR keywords are reserved, and cannot be used as identifiers. Some opcode names are, in fact, PIR keywords, which therefore cannot be used as identifiers. This, too, will be resolved in a future re-implementation of the PIR compiler.

The following are PIR keywords, and cannot currently be used as identifiers:

A label declaration consists of a label name followed by a colon. A label name conforms to the standard requirements for identifiers. A label declaration may occur at the start of a statement, or stand alone on a line, but always within a subroutine.

A reference to a label consists of only the label name, and is generally used as an argument to an instruction or directive.

A PIR label is accessible only in the subroutine where it's defined. A label name must be unique within a subroutine, but it can be reused in other subroutines.

There are two ways of referencing Parrot's registers. The first is through named local variables declared with .local.

The type of a named variable can be int, num, string or pmc, corresponding to the types of registers. No other types are used.

The second way of referencing a register is through a register variable $In, $Sn, $Nn, or $Pn. The capital letter indicates the type of the register (integer, string, number, or PMC). n consists of digit(s) only. There is no limit on the size of n. There is no direct correspondence between the value of n and the position of the register in the register set, $P42 may be stored in the zeroth PMC register, if it is the only register in the subroutine.

Are delimited by double-quotes ("). A " inside a string must be escaped by \. The default format for a double-quoted string constant is 7-bit ASCII, other character sets and encodings must be marked explicitly using a format flag.

Heredocs work like single or double quoted strings. All lines up to the terminating delimiter are slurped into the string. The delimiter has to be on its own line, at the beginning of the line and with no trailing whitespace.

Assignment of a heredoc:

A heredoc as an argument:

Although currently not possible, a future implementation of the PIR language will allow you to use multiple heredocs within a single statement or directive:

Define a constant named identifier of type type and assign value const to it. The type must be int, num, string or a string constant indicating the PMC type. This allows you to create PMC constants representing subroutines; the value of the constant in that case is the name of the subroutine. If the referred subroutine has an :immediate modifier and it returns a value, then that value is stored instead of the subroutine.

.const declarations representing subroutines can only be written within a .sub. The constant is stored in the constant table of the current bytecode file.

Define a subroutine. All code in a PIR source file must be defined in a subroutine. See the section "Subroutine modifiers" for available modifiers. Optional modifiers are a list separated by spaces.

The name of the sub may be either a bare identifier or a quoted string constant. Bare identifiers must be valid PIR identifiers (see Identifiers above), but string sub names can contain any characters, including characters from different character sets (see Constants above).

Defines the namespace from this point onwards. By default the program is not in any namespace. If you specify more than one, separated by semicolons, it creates nested namespaces, by storing the inner namespace object in the outer namespace's global pad.

Set the current PIR line number to the value specified. This is useful in case the PIR code is generated from some source PIR files, and error messages should print the source file's line number, not the line number of the generated file. Note that line numbers increment per line of PIR; if you are trying to store High Level Language debug information, you should instead be using the .annotate directive.

Set the current PIR file name to the value specified. This is useful in case the PIR code is generated from some source PIR files, and error messages should print the source file's name, not the name of the generated file.

Makes an entry in the bytecode annotations table. This is used to store high level language debug information. Examples:

An annotation stays in effect until the next annotation with the same key or the end of the current file (that is, if you use a tool such as pbc_merge to link multiple bytecode files, then annotations will not spill over from one mergee's bytecode to another).

One annotation covers many PIR instructions. If the result of compiling one line of HLL code is 15 lines of PIR, you only need to emit one annotation before the first of those 15 lines to set the line number.

The key must always be a quoted string. The value may be an integer, a number or a quoted string. Note that integer values are stored most compactly; should you instead of the above annotate directive emit:

then instead "42" is stored as a string, taking up more space in the resulting bytecode file.

Define "main" entry point to start execution. If multiple subroutines are marked as :main, the last marked subroutine is used. Only the first file loaded or compiled counts; subs marked as :main are ignored by the load_bytecode op. If no :main modifier is specified, execution starts at the first subroutine in the file.

Run this subroutine when loaded by the load_bytecode op (i.e. neither in the initial program file nor compiled from memory). This is complementary to what :init does (below); to get both behaviours, use :init :load. If multiple subs have the :load pragma, the subs are run in source code order.

Run the subroutine when the program is run directly (that is, not loaded as a module), including when it is compiled from memory. This is complementary to what :load does (above); to get both behaviours, use :init :load.

Execute this subroutine immediately after being compiled, which is analogous to BEGIN in Perl 5.

In addition, if the sub returns a PMC value, that value replaces the sub in the constant table of the bytecode file. This makes it possible to build constants at compile time, provided that (a) the generated constant can be computed at compile time (i.e. doesn't depend on the runtime environment), and (b) the constant value is of a PMC class that supports saving in a bytecode file.

{{ TODO: need a freeze/thaw reference }}.

For instance, after compilation of the sub 'init', that sub is executed immediately (hence the :immediate modifier). Instead of storing the sub 'init' in the constants table, the value returned by 'init' is stored, which in this example is a FixedIntegerArrray.

Execute immediately after being compiled, but only if the subroutine is in the initial file (i.e. not in PIR compiled as result of a load_bytecode instruction from another file).

As an example, suppose file main.pir contains:

and the file foo.pir contains:

Executing foo.pir will run both foo and bar. On the other hand, executing main.pir will run only foo. If foo.pir is compiled to bytecode, only foo will be run, and loading foo.pbc will not run either foo or bar.

The marked .sub overrides a vtable function, and is not stored in the namespace. By default, it overrides a vtable function with the same name as the .sub name. To override a different vtable function, use :vtable('...'). For example, to have a .sub named ToString also be the vtable function get_string), use :vtable('get_string').

When the :vtable modifier is set, the object PMC can be referred to with self, as with the :method modifier.

Specifies a unique string identifier for the subroutine. This is useful for referring to a particular subroutine with :outer, even though several subroutines in the file may have the same name (because they are multi, or in different namespaces).

The :instanceof pragma is an experimental pragma that creates a sub as a PMC type other than 'Sub'. However, as currently implemented it doesn't work well with :outer or existing PMC types such as Closure, Coroutine, etc.

Specify the name by which the subroutine is stored in the namespace. The default name by which a subroutine is stored in the namespace (if this modifier is missing), is the subroutine's name as given after the .sub directive. This modifier allows to override this.

Takes either 2 arguments: the sub and the return continuation, or the sub only. For the latter case an invokecc gets emitted. Providing an explicit return continuation is more efficient, if its created outside of a loop and the call is done inside a loop.

At the top of a subroutine, declare a local variable, in the manner of .local, into which parameter(s) of the current subroutine should be stored. Available modifiers: :slurpy, :named, :optional, :opt_flag.

Using the push_eh op you can install an exception handler. If an exception is thrown, Parrot will execute the installed exception handler. In order to retrieve the thrown exception, use the .get_results directive. This directive always takes one argument: an exception object.

This is syntactic sugar for the get_results op, but any modifiers set on the targets will be handled automatically by the PIR compiler. The .get_results directive must be the first instruction of the exception handler; only declarations (.lex, .local) may come first.

To resume execution after handling the exception, just invoke the continuation stored in the exception.

See PDD23 for accessing the various attributes of the exception object.

This is equivalent to <var1> = <var1> <op> <var2>. Where op is called an assignment operator and can be any of the following binary operators described earlier: +, -, *, /, %, ., &, |, ~, <<, >> or >>>.

directly corresponds to the set opcode. So, two low-level arguments (int, num, or string registers, variables, or constants) are a direct C assignment, or a C-level conversion (int cast, float cast, a string copy, or a call to one of the conversion functions like string_to_num).

Assigning a PMC argument to a low-level argument calls the get_integer, get_number, or get_string vtable function on the PMC. Assigning a low-level argument to a PMC argument calls the set_integer_native, set_number_native, or set_string_native vtable function on the PMC (assign to value semantics). Two PMC arguments are a direct C assignment (assign to container semantics).

For assign to value semantics for two PMC arguments use assign, which calls the assign_pmc vtable function.

This section describes the macro layer of the PIR language. The macro layer of the PIR compiler handles the following directives:

.include '<filename>'

The .include directive takes a string argument that contains the name of the PIR file that is included. The contents of the included file are inserted as if they were written at the point where the .include directive occurs.

The include file is searched for in the current directory and in runtime/parrot/include, in that order. The first file of that name to be found is included.

The .include directive's search order is subject to change.

.macro <identifier> [<parameters>]

The .macro directive starts the a macro definition named by the specified identifier. The optional parameter list is a comma-separated list of identifiers, enclosed in parentheses. See .endm for ending the macro definition.

.endm

Closes a macro definition.

.macro_const <identifier> (<literal>|<reg>)

The .macro_const directive is a special type of macro; it allows the user to use a symbolic name for a constant value. Like .macro, the substitution occurs at compile time. It takes two arguments (not comma separated), the first is an identifier, the second a constant value or a register.

The macro layer is completely implemented in the lexical analysis phase. The parser does not know anything about what happens in the lexical analysis phase.

When the .include directive is encountered, the specified file is opened and the following tokens that are requested by the parser are read from that file.

A macro expansion is a dot-prefixed identifier. For instance, if a macro was defined as shown below:

this macro can be expanded by writing .foo(42). The body of the macro will be inserted at the point where the macro expansion is written.

A .macro_const expansion is more or less the same as a .macro expansion, except that a constant expansion cannot take any arguments, and the substitution of a .macro_const contains no newlines, so it can be used within a line of code.

The parameter list for a macro is specified in parentheses after the name of the macro. Macro parameters are not typed.

The number of arguments in the call to a macro must match the number of parameters in the macro's parameter list. Macros do not perform multidispatch, so you can't have two macros with the same name but different parameters. Calling a macro with the wrong number of arguments gives the user an error.

If a macro defines no parameter list, parentheses are optional on both the definition and the call. This means that a macro defined as:

can be expanded by writing either .foo or .foo(). And a macro definition written as:

can also be expanded by writing either .foo or .foo().

Note: IMCC requires you to write parentheses if the macro was declared with (empty) parentheses. Likewise, when no parentheses were written (implying an empty parameter list), no parentheses may be used in the expansion.

Heredoc arguments

Heredoc arguments are not allowed when expanding a macro. This means that, currently, when using IMCC, the following is not allowed:

Using braces, { }, allows you to span multiple lines for an argument. See runtime/parrot/include/hllmacros.pir for examples and possible usage. A simple example is this:

This will expand the macro foo, after which the input to the PIR parser is:

Within the macro body, the user can declare a local variable with a unique name.

The .macro_local directive declares a local variable with a unique name in the macro. When the macro .foo() is called, the resulting code that is given to the parser will read as follows:

The user can also declare a local variable with a unique name set to the symbolic value of one of the macro parameters.

So, the special $ character indicates whether the symbol is interpreted as just the value of the parameter, or that the variable by that name is meant. Obviously, the value of b should be a string.

The automatic name munging on .macro_local variables allows for using multiple macros, like so:

This will result in code for the parser as follows:

Each expansion is associated with a unique number; for labels declared with .macro_label and locals declared with .macro_local expansions, this means that multiple expansions of a macro will not result in conflicting label or local names.

Defining a non-unique variable can still be done, using the normal syntax:

When invoking the macro foo as follows:

there will be two variables: b and x. When the macro is invoked twice:

the resulting code that is given to the parser will read as follows:

Obviously, this will result in an error, as the variable b is defined twice. If you intend the macro to create unique variables names, use .macro_local instead of .local to take advantage of the name munging.