Chris King wrote:
>On 9/6/07, Tom <tom.primozic@gmail.com> wrote:
>
>
>>However, would it be possible to "emulate" cpu registers using software? By
>>keeping registers in the main memory, but accessing them often enough to
>>keep them in primary cache? That would be quite fast I believe...
>>
>>
>
>This makes me wonder... why have registers to begin with? I wonder
>how feasible a chip with a, say, 256-byte "register-level" cache would
>be.
>
>
Such chips exist. The Itanium is one example.
The problem is gate delays. The purpose of registers is to be faster
than L1 cache (which typically has a 2-3 clock delay associated with
it). But the more registers you have, the more gate delays you need to
read or write registers- the naive implementation takes O(log N) gate
delays to access O(N) registers- reality is more complicated than this.
But the rule more registers = more gate delays holds true. And these
gate delays translate into a slower chip (one way or another- either you
have to lower your clock rate or add more pipeline stages or both to
deal with the larger register cache). Of course, more registers make
compilers happy, and lowers pressure on the cache bandwidth (as the
compiler doesn't need to spill/refill registers quite so often). This
is why the 64-bit x86 is generally faster than the 32-bit x86- going
from 8 (6 in practice) to 16 (14 in practice) registers was a big step
up. The Itanium has a large enough register set that it's performance
is probably getting hurt by it, but it's hard to tell with the
everything else going on.
The sweet spot for register sets seems to be in the 16-64 range- less
than that, and you're being hurt by the increased memory pressure, more
than that and you're probably being hurt by the slower register addressing.
Brian