The -run-pass option in llc allows you to create MIR tests that invoke just
a single code generation pass. When this option is used, llc will parse an
input MIR file, run the specified code generation pass(es), and output the
resulting MIR code.

You can generate an input MIR file for the test by using the -stop-after or
-stop-before option in llc. For example, if you would like to write a test
for the post register allocation pseudo instruction expansion pass, you can
specify the machine copy propagation pass in the -stop-after option, as it
runs just before the pass that we are trying to test:

llc-stop-after=machine-cpbug-trigger.ll>test.mir

After generating the input MIR file, you’ll have to add a run line that uses
the -run-pass option to it. In order to test the post register allocation
pseudo instruction expansion pass on X86-64, a run line like the one shown
below can be used:

#RUN:llc-o-%s-mtriple=x86_64---run-pass=postrapseudos|FileCheck%s

The MIR files are target dependent, so they have to be placed in the target
specific test directories (lib/CodeGen/TARGETNAME). They also need to
specify a target triple or a target architecture either in the run line or in
the embedded LLVM IR module.

The MIR code coming out of -stop-after/-stop-before is very verbose;
Tests are more accessible and future proof when simplified:

Use the -simplify-mir option with llc.

Machine function attributes often have default values or the test works just
as well with default values. Typical candidates for this are: alignment:,
exposesReturnsTwice, legalized, regBankSelected, selected.
The whole frameInfo section is often unnecessary if there is no special
frame usage in the function. tracksRegLiveness on the other hand is often
necessary for some passes that care about block livein lists.

The (global) liveins: list is typically only interesting for early
instruction selection passes and can be removed when testing later passes.
The per-block liveins: on the other hand are necessary if
tracksRegLiveness is true.

Branch probability data in block successors: lists can be dropped if the
test doesn’t depend on it. Example:
successors: %bb.1(0x40000000), %bb.2(0x40000000) can be replaced with
successors: %bb.1, %bb.2.

MIR code contains a whole IR module. This is necessary because there are
no equivalents in MIR for global variables, references to external functions,
function attributes, metadata, debug info. Instead some MIR data references
the IR constructs. You can often remove them if the test doesn’t depend on
them.

Alias Analysis is performed on IR values. These are referenced by memory
operands in MIR. Example: :: (load 8 from %ir.foobar, !alias.scope !9).
If the test doesn’t depend on (good) alias analysis the references can be
dropped: :: (load 8)

MIR blocks can reference IR blocks for debug printing, profile information
or debug locations. Example: bb.42.myblock in MIR references the IR block
myblock. It is usually possible to drop the .myblock reference and simply
use bb.42.

If there are no memory operands or blocks referencing the IR then the
IR function can be replaced by a parameterless dummy function like
define @func() { ret void }.

It is possible to drop the whole IR section of the MIR file if it only
contains dummy functions (see above). The .mir loader will create the
IR functions automatically in this case.

Currently the MIR format has several limitations in terms of which state it
can serialize:

The target-specific state in the target-specific MachineFunctionInfo
subclasses isn’t serialized at the moment.

The target-specific MachineConstantPoolValue subclasses (in the ARM and
SystemZ backends) aren’t serialized at the moment.

The MCSymbol machine operands are only printed, they can’t be parsed.

A lot of the state in MachineModuleInfo isn’t serialized - only the CFI
instructions and the variable debug information from MMI is serialized right
now.

These limitations impose restrictions on what you can test with the MIR format.
For now, tests that would like to test some behaviour that depends on the state
of certain MCSymbol operands or the exception handling state in MMI, can’t
use the MIR format. As well as that, tests that test some behaviour that
depends on the state of the target specific MachineFunctionInfo or
MachineConstantPoolValue subclasses can’t use the MIR format at the moment.

When the first YAML document contains a YAML block literal string, the MIR
parser will treat this string as an LLVM assembly language string that
represents an embedded LLVM IR module.
Here is an example of a YAML document that contains an LLVM module:

The machine basic blocks and their instructions are represented using a custom,
human readable serialization language. This language is used in the
YAML block literal string that corresponds to the machine function’s body.

A source string that uses this language contains a list of machine basic
blocks, which are described in the section below.

The instruction’s name is usually specified before the operands. The example
below shows an instance of the X86 RETQ instruction with a single machine
operand:

RETQ $eax

However, if the machine instruction has one or more explicitly defined register
operands, the instruction’s name has to be specified after them. The example
below shows an instance of the AArch64 LDPXpost instruction with three
defined register operands:

$sp, $fp, $lr = LDPXpost $sp, 2

The instruction names are serialized using the exact definitions from the
target’s *InstrInfo.td files, and they are case sensitive. This means that
similar instruction names like TSTri and tSTRi represent different
machine instructions.

The register primitive is used to represent the register
machine operands. The register operands can also have optional
register flags,
a subregister index,
and a reference to the tied register operand.
The full syntax of a register operand is shown below:

The register machine operands can reference a portion of a register by using
the subregister indices. The example below shows an instance of the COPY
pseudo instruction that uses the X86 sub_8bit subregister index to copy 8
lower bits from the 32-bit virtual register 0 to the 8-bit virtual register 1:

%1 = COPY %0:sub_8bit

The names of the subregister indices are target specific, and are typically
defined in the target’s *RegisterInfo.td file.

The global value machine operands reference the global values from the
embedded LLVM IR module.
The example below shows an instance of the X86 MOV64rm instruction that has
a global value operand named G:

$rax = MOV64rm $rip, 1, _, @G, _

The named global values are represented using an identifier with the ‘@’ prefix.
If the identifier doesn’t match the regular expression
[-a-zA-Z$._][-a-zA-Z$._0-9]*, then this identifier must be quoted.

The unnamed global values are represented using an unsigned numeric value with
the ‘@’ prefix, like in the following examples: @0, @989.

where <kind> is describing how the jump table is represented and emitted (plain address, relocations, PIC, etc.), and each <index> is a 32-bit unsigned integer and blocks contains a list of machine basic block references.

A CFI Index operand is holding an index into a per-function side-table,
MachineFunction::getFrameInstructions(), which references all the frame
instructions in a MachineFunction. A CFI_INSTRUCTION may look like it
contains multiple operands, but the only operand it contains is the CFI Index.
The other operands are tracked by the MCCFIInstruction object.