I know several people build Athlon-MP setups using Athlon-XP and some clever hardware hacking, to the best of my knowledge the chips are very similar and since not many MP chips are required and they are built by the same specs - I assume that the odds that a given set of XPs would work as MPs with the correct gates connected are quite high. (I'm not encouraging this it will void warrenty and can make your system go boom - if you break it you get to keep both pieces)

when i last reseated my heatsink, a couple of years back, it magically changed from being an XP (1900+ to be precise) to an MP.

it doesn't seem to have done anything beyond change the name my BIOS reports on boot though

just thought i'd add to the confusion..._________________['Dial' in Welsh means 'revenge'...If a welsh speaker enters an English phone booth, having paid the coin/phone card the next command the l.e.d screen throws at the customer is 'revenge!' thus setting the tone for a confrontational conversation]

I don't understand why you're so concerned about space and memory, unless you use an old box with very limited harddisk space and limited ram. I use -O2 -finline-functions (the only extra one in -O3 that's worth using on x86, as I understand it).

I don't have any documentation ready, but I do read up on what knowledgeable users, both on this forum and elsewhere, say and recommend, and I follow up on their links, as well as the GCC doumentation. I don't claim to be an expert, but following the advice of experts and reading on the subject whenever I come across it, has led me to use these flags, with good result. I have a stable and speedy system that I am content with.

The size of the executable isn't due to a concern with drive space (the difference is negligable at that level) or really with RAM. I use -Os on my modern SMP systems rather than -O2, because the only difference is that -Os doesn't use the alignment optimizations. Those aligmnet optimizations do things like inserting extra space before function calls, loop jump targets, etc. While that can result in a very small performance increase (basically due to simplified mathematical operations), it also spreads code out. That spreading out of code increases the chance of cache misses. On a uniprocessor system with a modern processer that has lots of on-chip cache, that's not a huge deal. However, on an SMP system with separate chips - and thus separate on-chip caches - the performance hits of cache misses can be pretty sigificant - significant enough to outweigh the minor alignment benefit. Having to fetch stuff from system memory is *way* slower than fetching from processor cache - it's like the difference between swapping and fitting in to physical memory. It's this space concern that's the main reason why it's a bad idea to buld everything with -O3 (well, that and -funroll-loops results in slower code when the number of iterations isn't know in advance, which is the case in lots of loops that I write - and presumably in other coders' stuff).

I guess I'm technically worrying about memory, after all, but I'm really worrying about L1 and L2 cache usage, rather than system memory. And there's not much you can do to increase the size of the on-chip cache, which is typically on the order of 128-512K. My pretty current Athlon MP system, for example, has 1.5GB RAM, but the chips only have 256K of cache. When you're tyring to fit as much code as possible into 256K, it's worthwhile to worry about space. Not to mention that compilation time is slightly improved over -O2, and significantly improved over -O3. Referring to is as optimizing for size is deceptive, though, since people do usually think of system memory and drive space first - forgetting about the cache which is arguably more important.

That said, I was able to find more information on the SSE thing which agreed - basically the SSE implementation on the Athlons isn't all that awesome - it's more for compatability. The Athlons do, however, have a kick-ass 387 unit (which is what's used in place of SSE).

hi there,

What if you have a small loop that if correctly aligned will fit entirely in a cache line and in the fetch buffer?

You want hot loops aligned for performance. The keywords is hot as if the compiler inlines everything, yes, it increases code size and can increases cache misses.

I don't know much about the AMD architecture implementations... But aligning hot loops on a POWER4/POWER5 does make a significant difference in performance.

As for the kernel I think it is probably safe to try and you will either a get a kernel that will boot or no. Since it is basically a hamstrung Hammer core it should be safe but dont say I advised you to do it

I have a mobile sempron 2800+(s754)@1600Mhz with a k8 kernel, works like a charm.
I have the following CFLAGS: -O2 -march=i686 -pipe -fomit-frame-pointer, gives generally good performance, but I am planning to experiment with them to get a faster system.

I thought I'd throw 2 pennies into the Os vs O2 debate. I've been using Os not so much because lack of memory, drive space, or even cache (primarily because I didn't think about the cache), but because for me the bottleneck isn't the speed of the program when it is running in RAM, but when loading a program from the real bottleneck - storage. Once an application is loaded, unless it is a 3D game, I notice no difference between optimizations. Not that there are no differences, as benchmarks will show, but since the computer is already faster than I am, I can't notice those differences. What I do notice is the wait from when I click on an icon until the program actually opens. Consider Behemoth OpenOffice.org, which can take some time to load. If compiled via Os instead of O2 (which last I checked, required some fiddling with the ebuild), it should load faster, which is what interests me.

I suppose if I had an application where a few % of speed matters (like transcode), I'd be more conscious of optimizing for the CPU vs size.

Anyone else consider this aspect of size vs loop optimization?

EK_________________There are more things in heaven and earth, Horatio, Than are dreamt of in your philosophy.

Yes, I actually use it (I've never benchmarked it, but I've read it works well). However, my understanding is that prelink deals with reducing the overhead of dynamically linking libraries into specific memory addresses. Os, on the other hand, reduces the overall file size, which reduces amount of data flowing over IDE cable (or SATA these days), thus speeding application launch time. I would also think it would make Linux's RAM disk cache more effective (by allowing more files to be stored in the cache at one time).

In other words, I would think combining prelink and Os would reduce load times of applications, which to me is more important than how fast my word processor or browser is when it is running (as long as it is faster than I am )