I piddled with all kinds of ways to get around calling wake_affine()entirely, and/or calling it with the affine candidate to no avail. Bestresult was always to do the silly looking thing, namely test the currentcpu for wake affine decision, but slip in the shared cache cpu.

I bet the below helps, though there will still be cache misses, so therewill still be pain for extreme switchers. Question is whether theramp-up gain is worth it. I think yes, since it's up to 100%. Would bemost excellent to find a way to know in advance when the cost will betoo high, and then not go there. Same applies for doing the affinitydecision every time for extreme switchers. It's expensive for those,especially so when they're pinned, but pays in the general case.

Anyway...

PREFER_SIBLING is set at the CPU domain level if you don't have powersaving set, so you get to eat cache misses for each cpu, whether it'ssharing a cache or not as you traverse. Lots of CPUs, LOTS of pain.