If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Announcement

Collapse

No announcement yet.

Don't Look For Gentoo's CPU Optimization Options To Land In The Mainline Linux Kernel

Comment

The -march option isn't just about SIMD operations or the newer instructions. It specifies specific features of that μarch so that issues can be avoided. If an instruction has a false dependency for the source register on some older μarchs then the compiler will reorder the instructions or insert a NOP to improve performance on those μarchs. Or if an architecture has stall issues with partial register update then some workaround will be used. OTOH if information about the cache, μop buffer size, decoder properties... is known then it'll have proper tactics for those, like aligning instructions, reorder the branches...

Some famous examples are shr reg, imm8, adc reg, 0 or sbb reg, reg on x86 which have a fast special case, a dependency on the flags, a partial register update problem with 8/16-bit operations or some combinations of them on some μarchs. Or popcnt which has a false dependency on Sandy Bridge and Ivy Bridge. Or lea which has different performance depending on the format and μarch. Therefore a multiplication by some constant can be faster with lea or mul depending on which target you're compiling for

Yep, in addition, perhaps also more important is that - march can optimize for the specific CPU cache layout/sizes.

1 like

Comment

Really, the biggest support issue comes in when someone has to replace the CPU with a less-capable one (or perhaps more likely, tries to move the disks to another system) and the instructions are not there anymore. This happens rarely, though.

Personally I use --march=native and disable everything I don't need (filesystem options like encryption or ACLs, memory cgroups, modules system, etc.), and reduce the number of allowed CPU threads too. It can make a surprising difference on limited VMs, if not in raw CPU, then in slab kernel memory usage (which in turn can decrease cache pressure and reduce I/O on a busy server with millions of files). Compiles faster as well! But I am a little crazy.

Comment

Really, the biggest support issue comes in when someone has to replace the CPU with a less-capable one (or perhaps more likely, tries to move the disks to another system) and the instructions are not there anymore. This happens rarely, though.

This may have been more common than you think.
Especially with Intel Atom processors, which had the MOVBE instruction starting from the early models. However, that instruction was introduced in Intel Core CPUs only with Haswell and in AMD CPUs only with Excavator.

Comment

Really, the biggest support issue comes in when someone has to replace the CPU with a less-capable one (or perhaps more likely, tries to move the disks to another system) and the instructions are not there anymore. This happens rarely, though.

This is of course a real problem one may run into but I don't see this as a reason or valid argument not to have the option of using -march=native built into the kernel so we don't have to apply that patch manually.

Distributions aren't going to compile kernels with -march=native unless someone like AMD decides to make a "Ryzen distribution" or something like that were all the packages are compiled for that CPU arch. That leaves people like you and me and others who compile the kernel themselves for some reason. If you know how to do that and choose to do that then you're going to be aware that you compiled your kernel with the options you used and deal with it. Nobody's going to run to the bug tracker and file a bug saying "I compiled my kernel with march=znver2 and now I can't use that kernel on a Pentium II, it's someone else's fault".

Comment

When kernel devs opine that "couple months" == six years, it doesn't inspire confidence. Between this and the recent anti-ZFS API change, it seems like kernel devs don't care about freedom of choice. I thought that part of the driving philosophy behind free (as in freedom) software was to not be told how to use the software.

Comment

Really, the biggest support issue comes in when someone has to replace the CPU with a less-capable one (or perhaps more likely, tries to move the disks to another system) and the instructions are not there anymore. This happens rarely, though.

Quite frankly, this is not a kernel issue though. If you compile your own kernel with hardware-specific features and then you change hardware it's your own problem.

Comment

The most important part of the reply is that "there was no measurable benefits" - most of the kernel is handcrafted to fit in cache lines... (and we're talking L1 cache here).

For all the other, more specific things (sse* avx* neon etc etc) there are handcrafted ASM implementations that uses it, and handles all the side-effects that the instructions causes (lazy saving and restoring of fp state etc).

The kernel is a pretty special use-case for a compiler... Avoiding unintended side effects is one issue, another it potential slowdowns due to f.ex. the inability to use lazy operations because the compiler thinks it knows better.