Bit 0 - Turns the cache on (1) or off (0)
Bit 1 - Determines if user mode and non-user mode use the same address
mapping. 1 if they do, or 0. Should be 1 for use with MEMC.
Bit 2 - 0 for normal operation, 1 for special monitor mode (processor
runs at memory speed and address/data always put on external
pins even if data fetched from cache - for logic analyser
to trace the program properly).
Other bits reserved.

Register 3 - Which areas are cachable
Controls which areas of memory are cachable, in 2Mb chunks.

Bit 0 - 1 if virtual addresses &0000000-&01FFFFF are cachable, 0 if not
Bit 0 - 1 if virtual addresses &0200000-&03FFFFF are cachable, 0 if not
...
Bit 31 - 1 if virtual addresses &3E00000-&3FFFFFF are cachable, 0 if not

Register 4 - Which areas are updateable
Controls which areas of memory are updateable, in 2Mb chunks. Writes to non-updateable
memory go to the real memory, not the cache. This is suitable for things like ROMs, since
you don't want the cached data to be altered by attempted writes.

Bit 0 - 1 if virtual addresses &0000000-&01FFFFF are updateable, 0 if not
Bit 0 - 1 if virtual addresses &0200000-&03FFFFF are updateable, 0 if not
...
Bit 31 - 1 if virtual addresses &3E00000-&3FFFFFF are updateable, 0 if not

Register 5 - Which areas are disruptive
Controls which areas of memory are disruptive, in 2Mb chunks. Writes to disruptive
areas of memory cause the cache to be flushed. For example, writing to physical memory at
&2000000-&2FFFFFF on an MEMC system will usually cache virtually addresses memory
and if this location was cached, an attempt to read it would read back the old contents.

Bit 0 - 1 if virtual addresses &0000000-&01FFFFF are disruptive, 0 if not
Bit 0 - 1 if virtual addresses &0200000-&03FFFFF are disruptive, 0 if not
...
Bit 31 - 1 if virtual addresses &3E00000-&3FFFFFF are disruptive, 0 if not

Register 2 is set to zero after power-up, and registers 3-5 are undefined. The registers 3-5
should be set up correctly before the cache is switched on. You should always check the processoridentity before setting up the registers, unless you are completely certain your code will only
ever be executed on an ARM3 processor.

ARM 610

Register 0 - Processor identification (read only)
The value returned for an ARM610 processor should be &4156061x.

Register 3 - Domain Access Control (write only)
This register holds the current access control for domains 0 to 15. Each domain has two
bits (domain 0 bits 0,1 ... domain 15 bits 30,31) which may be set as follows:

When writing to this register, any value written will cause the Translation Look-aside
Buffer to be flushed.

Register 6 - Data fault address / TLB purge
When reading this register, you can determine the virtual address of the last page fault.

When writing this register, the value given (in bits 14-31) is treated as an address. The
TLB will be searched for a corresponding address and if it is found, it is marked as
invalid. This is to allow the page table in main memory to be updated and the now-invalid
entries in the on-chip TLB to be purged without assuming the penalty of flushing the
entire TLB.

Register 7 - IDC flush (write only)
Any data written to this location will cause the IDC (Instruction/Data cache) to be
flushed.

Registers 8 to 15 - Reserved
Accessing these registers will cause the undefined instruction trap to be taken.

ARM 710

This is similar to the ARM610.

Register 0 - Processor identification (read only)
The value returned for an ARM610 processor should be &4104710x.

Register 3 - Domain Access Control (write only)
This register holds the current access control for domains 0 to 15. Each domain has two
bits (domain 0 bits 0,1 ... domain 15 bits 30,31) which may be set as follows:

When writing to this register, any value written will cause the Translation Look-aside
Buffer to be flushed.

Register 6 - Data fault address / TLB purge
When reading this register, you can determine the virtual address of the last page fault.

When writing this register, the value given (in bits 14-31) is treated as an address. The
TLB will be searched for a corresponding address and if it is found, it is marked as
invalid. This is to allow the page table in main memory to be updated and the now-invalid
entries in the on-chip TLB to be purged without assuming the penalty of flushing the
entire TLB.

Register 7 - IDC flush (write only)
Any data written to this location will cause the IDC (Instruction/Data cache) to be
flushed.

Registers 8 to 15 - Reserved
Accessing these registers will cause the undefined instruction trap to be taken.

ARM 7500

The registers are exactly the same as the ARM710, except the processor ID (register 0) will be
different. The datasheet did not specify what should be expected.

ARM 7500FE

The registers are exactly the same as the ARM710, except the processor ID (register 0) will be
different. The datasheet did not specify what should be expected, however interrogation of the
Bush set-top box reveals &41077100.

StrongARM SA110

Register 0 - Processor identification (read only)
The value returned for an SA110 processor should be &4401A10x.

Register 3 - Domain Access Control (read/write)
This register holds the current access control for domains 0 to 15.
The document I have contains no further details, though I would assume it would be similar
to the ARM610/710/etc usage.

Register 4 - Reserved - do not attempt to access

Register 5 - Fault status (read/write)
When reading, this holds the status of the last data fault (not updated for pre-fetch
fault). Only the bottom byte is of significance.

Co-processors

There are between zero and three possible co-processors. Most desktop ARM systems do not have
logic for external co-processors, so we may either use that which is built into the ARM itself,
or an emulated co-processor.
CP15 is reserved on the ARM 3 and later processors for internal configuration, as described in
this document.
CP0 and CP1 is used by the floating point system. It may either be an external floating point
chip (as used with the ARM 3), hardware built into the processor (as in the ARM 7500FE), or a
totally software-based emulation (as with the FPEmulator that we all know).

When the ARM executes a co-processor instruction, or an undefined instruction, it will offer it
to any co-processors which may be presently attached. If hardware is available to process the
given instruction, then it is expected to do so. If it is busy at the time the instruction is
offered, the ARM will wait for it.
If there is no co-processor capable of executing the instruction, the ARM will take its
undefined instruction trap, in which case the following will happen:

The PSR and PC are both saved (the method differs for 26 bit and 32 bit ARMs)

SVC mode (26 bit) / UND mode (32 bit) is entered, and the I bit of the PSR is set

The instruction at address &00000004 is executed

This trap may be used to add instructions to the instruction set by emulation, or to implement a
software emulation of hardware that isn't fitted. The Floating Point
Emulator works by doing this.

To return, simply pull the saved PC and PSR (depends on 26/32 bit) and push them to the current
PC and PSR, like MOVS PC, R14 in 26 bit systems. This will pick up with the
instruction following the one which caused the trap.

All of the co-processor instructions can be executed conditionally. Please note that the
conditionals relate to the status of the ARM processor, and not the status of any of the
co-processors. This is because the ARM always tries the instruction first, and offers it around
and maybe takes the undefined application trap, so the conditions are ARM related.
To make this clearer:

It is worth pointing out that objasm specifies co-processor registers using the CR
notation (ie, CR0 - CR15), which is first defined with the CN directive. It does not
appear as if default co-processor instructions are defined in Nick Roberts' ASM, though I've only
looked in the instructions at the "defined symbols" section...
Darren Salt's ExtBASICasm provides the register names C0 - C15 to refer to the
co-processors. So if any of these examples fail when you try to assemble them, please check what
format your assembler provides these instructions.

MRC

The instruction MRC transfers a co-processor register to an ARM register. It takes
the form:

MRC <co-pro>, <op>, <ARM reg>, <co-pro reg>, <co-pro reg2>, <op2>

The co-processor is denoted in most assemblers by CPx.
The register <co-pro reg> is written to <ARM reg>, using
operation <op>. This may, possibly, be further modified by
<co-pro reg2> and <op2>. For an idea of the sorts of times
when this might be necessary, consider instructions of the form LDR Ra, [Rb], #x.
The final <op2> may be omitted, as it is in the example, but the other parts
of the MRC instruction must be supplied.

MCR

The instruction MCR transfers an ARM register to a co-processor register. It takes
the form:

MCR <co-pro>, <op>, <ARM reg>, <co-pro reg>, <co-pro reg2>, <op2>

The co-processor is free to interpret the fields as it desires, but the standard interpretation
is that the contents of the ARM register are written to the co-processor register using the
operation code given, which may be further modified by the second co-processor register and/or
the second operation code.

LDC and STC

The instruction LDC loads data from memory into the co-processor register, while
STC saves data from a co-processor register to memory.
The ARM should supply the address, the co-processor accepts the data and controls how much is
transferred.

If the 'L' flag is specified, a long transfer is performed. Otherwise a short transfer is
performed. The 'L' flag follows the extension, like LDCEQL.
The address is an expression which results in an address being generated, so examples of which
are:

[Rx]
[Rx, #x] !
[Rx], #x

These are like those used for the LDR instruction. However they are
only eight bits wide and specify word offsets (the ARM types are 12 bit and byte offset).
What happens is the 8 bit unsigned offset is shifted left two bits and added or subtracted from
the base register, this may be done before or after the base is used as the transfer address.
The new base value can be written back, or left unmodified.
The next difference is that post-indexed addressing requires explicit setting of the W bit of the
instruction (unlike LDR/STR which always does it when post-indexed).
You set the 'W' bit with the '!' flag, like STR CP0, CR1, [R2, #16]!.
The base register is used for the first transfer. If there are any further transfers, the base
will be incremented by one word for each of those additional transfers.

CDP

The instruction CDP instructs the co-processor to do some processing. It takes the
form:

CDP <co-pro>, <co-pro reg1>, <co-pro reg2>, <co-pro reg3>, <op>

This tells the co-processor to do something. The ARM will not wait for it to finish, nor is any
sort of status sent back to the ARM. It is possible for a co-processor to maintain a queue of
instructions, allowing it and the ARM to process in parallel.
A variant of this may be obtained with the floating point hardware; while it does not (to my
knowledge) support a queue of instructions, it is true that the ARM will await the FPU to finish
an operation before providing the next. With careful coding, it would therefore be possible to
get the ARM to do some sort of processing (a few instructions) in between sending an instruction
to the FPU and reading it's result back.
So instead of:

FLTE F0, R0
FLTE F1, R1
MUFE F2, F0, F1
FIX R0, F2
MOV R1, #0

you could save a small amount of time with:

FLTE F0, R0
FLTE F1, R1
MUFE F2, F0, F1
MOV R1, #0
FIX R0, F2

as the FPU could be finishing the MUF while you MOV. The hardware FPU (as in the 7500FE) runs
asynchronous - you can switch to synchronous by setting a bit in the FPSR. The software emulation
always runs synchronously, and as it uses the ARM in order to emulate the FP instructions, there
is no possible advantage to be gained.
Obviously the above example is somewhat contrived. However it is only an example. Real life code,
such an an MP3 decoder, could well benefit from careful arrangement of code.

There are no rules for the register types and/or the operation codes. These depend upon the
co-processor.