The CPU executes "almost all the instructions in one clock cycle."

Question:

I measured the execution time by actually executing the program.
However, the results differ from the time stated in the manual.
According to the manual, the CPU executes "almost all the instructions in one clock cycle."
Is this actually true?

Answer:

When an instruction is fetched from internal ROM and executed without pipeline hazards, almost all instructions can be executed in 1 clock.
However, if an instruction is fetched from external memory, at least 4 clocks are required to fetch a one-word (4-byte) instruction.

This results in requiring four times as much time as when fetching from internal ROM. In addition, some instructions may actually require 2 clocks instead of 1 clock.

If the execution result of an instruction is referenced by the next instruction, extra execution clocks may be required (when instructions are in succession, as in mov 3.r10 followed by st.b r10,11 [r29]).
Because such delay factors are mixed together, the instruction execution time becomes longer.
For details of these behaviors, refer to 5.4 "Number of Instruction Execution Clock Cycles" in the Architecture User's Manual.
Note that it is very difficult to figure out the behavior of the pipeline in detail.