Stefan Reinauer <stepan at openbios.org> writes:
> * jason schildt <jschildt at lnxi.com> [050903 00:03]:
>> +/* We can reduce the size of code generated by romcc by
>> + * changing all of the fixed size types that live in registers
>> + * into simple unsigned variables. (ie s/uint8_t/unsigned/g)
>> + */
>> Why is this? I would consider specifying an 8bit type to be more
> space-safing than using some generic untyped integer value. If not this
> should be fixed in romcc..
This is a fundamental limit, especially on 32bit x86 with it's
non-symmetric registers. romcc allocates registers and registers are
not 8 bits. Therefore it requires an extra operation to mask the
register value to be 8 bits, after the operation.
Theoretically it could help by allowing use of registers such as %ah but
the problem is that you cannot perform a register to register between
register combinations like %esi, %ah and it gets even worse when you
include the mmx and sse registers. So %ah is essentially unusable.
Since using smaller values does not increase the numbers of registers
you can use and using smaller registers requires an extra mask step.
Using smaller values increases the code size.
The comment was added to document this fact so we can revisit this
later, if it becomes important. Hopefully this begins dispelling the
myth that sub word sized quantities are more efficient to use.
If you are really into register savings bit-fields can help. Especially
when you have more than 2 values in a register. You have to pack and
unpack the values but if you don't have them all unpacked
simultaneously it can help.
A sub word type is only slightly better than a bit-field in that you
can use the register directly. But it still requires maintenance work
to keep from having anything more than a sign bit in the registers
high bits.
>> /* AMD K8 Unsupported 1Ghz? */
>> if (id == (PCI_VENDOR_ID_AMD | (0x1100 << 16))) {
>> - if (is_cpu_pre_e0()) // CK804 support 1G?
>>> + device_t dev_2 = PCI_DEV(0,0x18,2);
>> + if(pci_read_config32(dev_2,0x9c) < 0x20f00) {
>> The function call looks a lot more readable here. How much is the gain
> of manual inlining here?
100% The call actually works. cpuid requires 4 registers and we don't have
that many to spare at this point in the code. What this bit does is read a
cached copy of the cpu rev from a scratch register in pci configuration space.
Probably the clearest thing to have would be a set of functions that perform
this test. Something like is_cached_cpu_pre_e0(). Almost as good was be
a good comment.
Eric