ARM processors support various performance monitoring registers, the most basic being a cycle count register. This is how to make use of it on the Raspberry Pi 3 with its ARM Cortex-A53 processor. The A53 implements the ARMv8 architecture which can operate in both 64- and 32-bit modes, the Pi 3 uses the 32-bit AArch32 mode, which is more or less backwards compatible with the ARMv7-A architecture, as implemented for example by the Cortex-A7 (used in the early Pi 2’s) and Cortex-A8. I hope I’ve got that right, all these names are confusing

The performance counters are made available through coprocessor registers and the mrc and mcr instructions, the precise registers used depending on the particular architecture.

By default, use of these instructions is only possible in “privileged” mode, ie. from the kernel, so the first thing we need to do is to enable register access from userspace. This can be done through a simple kernel module that can also set up the cycle counter parameters needed (we could do this from userspace after the kernel module has enabled access, but it’s simpler to do everything at once).

To compile a kernel module, you need a set of header files compatible with the kernel you are running. Fortunately, if you have installed a kernel with the raspberrypi-kernel package, the corresponding headers should be in raspberrypi-kernel-headers – if you have used rpi-update, you may need to do something else to get the right headers, and of course if you have built your own kernel, you should use the headers from there. So:

Looks like we can count a single cycle and since the Pi 3 has a 1.2GHz clock the loop time looks about right (the clock seems to be scaled if the processor is idle so we don’t necessarily get a full 1.2 billion cycles per second – for example, if we replace the loop above with a sleep).