MuQSS - The Multiple Queue Skiplist Scheduler v0.112

It's getting close now to the point where it can replace BFS in -ck releases. Thanks to the many people testing and reporting back, some other misbehaviours were discovered and their associated fixes have been committed.

In particular,
- Balancing across CPUs was not looking at higher and lower scheduling policies correctly (SCHED_ISO, SCHED_IDLEPRIO and realtime policies)
- A serious stall/hang could happen with tasks using sched_yield (such as f@h client and numerous GPU drivers)
- Some minor accounting issues on new tasks with affinity set were fixed
- Overhead was further decreased on task selection
- Spurious preemption on CPUs where the preempted task had already gone are now avoided
- Spurious wakeup on CPUs that were assumed and are no longer idle are avoided
- A potential race in suspending to ram was fixed
- Old unused code from BFS was removed, along with unnecessary intermediate variables.
- Clean ups
- Some work towards actually documenting MuQSS in Documentation/scheduler/sched-MuQSS.txt was done, though incomplete.

22 comments:

I've been trying various versions of MuQSS (while staying on BFS for production/work) and muqss-112 is the first one that passes my "wiggle test" - niced kernel build on all 8 vcores, playing a HD movie in vlc and frantically jiggling a terminal window around no longer causes stalls or jerks; it's completely smooth all the time,as if idle. Well done! \o/

One suggestion: I've noticed that the global_rq contains various atomic_t counters. It might be a good idea to make them cacheline-aligned so that they don't incur false sharing, which can lead to pretty pathological stalls esp. with contended SMT threads. I can create a GH pull if you like.

I'm not certain whether it's only imagination but with .112 reaction with compiz is very very good: no delays during app switching right now

Might run compiz with sched_yield later to see how that works out

also portage (Gentoo Linux' package manager) seems to work really quickly with it

Once a new Chromium version is out I'll do a compilation test (update) and see if I can run additional backup jobs to really stress the system to see if I can still do work (that would be close to the ultimate test, well - in the extreme probably adding a game to the mix - we'll see about that ;) )

Thanks, KoT. It's probably not your imagination since there were quite significant scheduling logic issues missing until 112. It should now be equal to or better than BFS in every way, and as you see from the comments section here, you're not the only one who's noticed it.

Yes that's correct, thanks for testing it. I may be able to go back to the old way of yielding (like BFS) now that I've fixed other bugs in the code but your testing needs to confirm that's where the problem lies.

To run this new scheduler on a 2009 Mac Mini Core Duo Intel processor machine: How much a slow down compared to using BFS would one experience?How big an overhead of this "it takes a thousend" cpus scheduler is it?

The idea is that this scheduler is a drop-in replacement for BFS where you won't notice any difference at all; this is why it took me years to come up with a design that had the best of both worlds. It should be perfectly fine in an old mac mini.

@ck:The issues I've reported last time for v0.111 have completely gone away with MuQSS v0.112 (without changes to the rest of the system software).Thank you for your great work!With this test run I've also been lucky to find a tunable again for my ancient TOI revision, named "no_flusher_thread", that, defaulting to 1 and now set to 0, makes the whole combination (MuQSS, BFQ, WBT, TOI) work fine without failures or performance regression. I'm glad that I can report 10 successful hibernations, done from time to time, within 1 1/2 days uptime atm.Maybe that tunable eases some race condition/ timing issue, that an effective MuQSS brings into that old TOI algorithms. Painful, that I don't have enough programming knowledge to interpret it in depth.

Running MuQSS (by means of the Liquorix kernel), also seehttps://liquorix.net/atom

Just earlier, the combination of PulseAudio suspended via pasuspender (to run an application using ALSA) while alt-tabbing to Google Chrome to doublecheck something caused an unrecoverable stall, a clean shutdown was no longer viable (it would've probably taken hours, everything was incredibly slow).