Within my lifetime, we went from CPUs that were simple enough we can now simulate them at the logic-gate level in ____ing Javascript to CPUs complex and powerful enough to emulate yesterday's mainstream CPUs at the ____ing logic-gate level in ____ing Javascript. I would not even be that surprised if the result outperformed the the original hardware.

For http://visual6502.org/sim/varm/armgl.html, would be much nicer that dragging would pan rather than "3D rotate" the view. The panning with wasd is too slow and not compatible with some keyboard layouts.

And of course zooming around mouse cursor rather than around center of screen would also help to zoom towards the part you want.

The 3D rotation is gimmicky but not actually useful to see the gates, and the current UI just doesn't let me zoom to gates I want without spending too much effort fighting the slow panning and the zooming target.

>"One very nice thing about the 32-bit instruction set is its pervasive conditional execution, which helps one avoid branching over code. For example, this sequence of instructions resets the register r0 to 0 if its value is equal to or less than zero, or forces its value to 1 if its value is greater than zero:

Without the conditional moves (MOVLE and MOVGT) after the compare (CMP), you'd have to branch after the compare, which is wasteful."

How are those those two conditional moves after the CMP operation more efficient than branching? Aren't they kind of branches themselves? What would the alternative "branching" sequence look like then?

The big deal is the conditional branch (the bgt). If the processor gets it wrong it's a pipeline flush. And best case you still have extra instructions for the branches. The conditional mov example is a fixed cost of a single "wasted" cycle, which matches the best case of the branching example (branch correctly predicted to mov r0,#1 and fall through). The worst case for the branching version is probably somewhere ~15 cycles depending on the uArch, but is still 1 cycle for the conditional move.

All of that being said, the branching version tends to be nicer for OoO cores since there aren't data dependencies on the flag registers any more, hence why you see RISC ISAs designed for OoO cores removing conditional execution for most instructions (AArch64 and RISC-V standout here).

In the ARM2 era (probably the same for ARM1?) a basic ALU instruction such as MOV took 1 cycle, and a branch took 4 (if taken) or 1 (if not). (There were extra DRAM page cycles every 4 words too)

So for a simple if/else, it was usually both less code and faster to use a straight line of conditional instructions. In more complicated cases, if the programmer was feeling clever, it was possible to update the status flags to get three-way (or more!) conditionals in straight-line branchless code. Fun!

The ARM1 did not have any atomic operation. You only need those if you have more than one processor. It also lacked the multiply and multiply-accumulate instructions, as stated above. These took multiple cycles, which is not very RISC-like. That is also true of the load multiple and store multiple instructions of the ARM2 (I don't remember if the ARM1 had them). The ARM2 also added the coprocessor interface.