This version brings to the table the same locking scheme for trying to wake tasks up as mainline which is advantageous on process busy workloads and many CPUs. This is important because the main reason for moving to multiple runqueues was to minimise lock contention for the global runqueue lock that is in BFS (as mentioned here numerous times before) and this wake up scheme helps make the most of the multiple discrete runqueue locks.

Note this change is much more significant than the last releases so new instability is a possibility. Please report any problems or stacktraces!

There was a workload when I started out that I used lockstat to debug to get an idea of how much lock contention was going on and how long it lasted. Originally with the first incarnations of MuQSS on a 14 second benchmark with thousands of tasks on a 12x CPU it obtained 3 million locks and had almost 300k contentions with the longest contention lasting 80us. Now the same workload grabs the lock just 5k times with only 18 contentions in total and the longest lasted 1us.

This clearly demonstrates that the target endpoint for avoiding lock contention has been achieved. It does not translate into performance improvements on ordinary hardware today because you need ridiculous workloads on many CPUs to even begin deriving advantage from it. However as even our phones now have reached 8 logical CPUs, it will only be a matter of time before 16 threads appears on commodity hardware - a complaint that was directed at BFS when it came out 7 years ago but they still haven't appeared just yet. BFS was shown to be scalable for all workloads up to 16 CPUs, and beyond for certain workloads, but suffered dramatically for others. MuQSS now makes it possible for what was BFS to be useful much further into the future.

Again - MuQSS is aimed primarily at desktop/laptop/mobile device users for the best possible interactivity and responsiveness, and is still very simple in its approach to balancing workloads to CPUs so there are likely to be throughput workloads on mainline that outperform it, though there are almost certainly workloads where the opposite is true.

I've now addressed all planned changes to MuQSS and plan to hopefully only look at bug reports instead of further development from here on for a little while. In my eyes it is now stable enough to replace BFS in the next -ck release barring some unexpected showstopper bug appearing.

EDIT: If you blinked you missed the 107 announcement which was shortly superseded by 108.

72 comments:

Gah! I just finished building 107... :-P An update, after our previous discussions, I looked at some load testing, and noticed that during a kernel build, CPU1 was always at 0%. So I switched back to vanilla and built 107; for x64 to see if it fixed the CPU issue. It appears that it did, as its building now under MuQsS 107, with CPU{0,1} both at 100%, and Firefox couldn't care less. :) Will report back tomorrow, and hopefully get to my i686 cross-compile too.-jwh

Darn that xfs problem keeps coming back. It's definitely a lingering bug and I need to figure out why, even if it runs fine afterwards as it's a sign that potentially something can go wrong. I know what the bug means (the worker is running on the wrong CPU temporarily) but I thought I'd fixed every way that might happen.

@Con:I thought this above posted warning's trace has nothing to do with xfs. I just figured out, that it's in my .config as module. Immediately thrown CONFIG_XFS_RT to n.I don't make any use of xfs on my machine, btw.

Yet another round of tests:https://docs.google.com/spreadsheets/d/1ZfXUfcP2fBpQA6LLb-DP6xyDgPdFYZMwJdE0SQ6y3Xg/edit?usp=sharing

This time I used intel_pstate+performance instead of acpi-cpufreq+performance. It enables longer uptime since I don't need to reboot to switch frequency driver, and on cfs both drivers+performance gives approximately the same results (see '4.7 bfs502').

Throughput is good on my 2c/4t cpu. However I'd like to see some results with many cores to see where muqss really shines against bfs. I don't report any errors after 6 hours of uptime.Well done Con ! Maybe now the flow of patches will dry up :)

On a side note, I noticed you put a 'bfs512-fixes2_1.patch' for bfs512. Does this patch include all the fixes you gathered with muqss ? Because 'bfs512 + bfs512-fixes.patch' was giving me errors with xfs.

Thanks for those results, as always, Pedro. Indeed it appears that the interactive=0 mode is quite redundant and I'm thinking of getting rid of it entirely unless I make some dramatic changes to it, which basically I'd prefer not to bother. The bfs fixes2 patch still won't fix the xfs problem (which affects both bfs and muqss it seems.)

Hey Manuel; fwiw, I do not get that workqueue.c warning. Also, I compared the v107 and v108 dmesg outputs (in case the warning showed up in the latter), and they are essentially identical. I'm on an old Athlon64 X2 system though; you're on Intel I recall? I also don't have TOI, and I didn't compile in BFQ; I'm using CFQ atm (rotational drive). All working well so far Con!-jwh

@-jwh, @CK:Yes it's an intel dual core without HT capability. But it's still at kernel 4.7.6 and my .config is unchanged since MuQSS first tests beginning. In the last step I've gone from 106 to 108. But this is absolutely NO hot issue, I don't suffer from any disadvantage vs. 106 after getting this warning.BR, Manuel Krause

Somehow "they" have trashed the 4.7.7 release, today. Commits went off the tree, but no patch available. My TOI patch base is outdated so much... but the modded version kept working til 4.7.6, o.k. only without encryption. I adjust it from kernel to kernel version. Annoying work, for 4.8 I'm not gone through all obstacles so far. "We can win, if we want..." 8-DBR, Manuel Krause

running two audio players at the same time (VLC + audacious) against pulseaudio 9.0,

while editing a PDF document in WINE via PDF-XChange Editor and occasional browsing in the web (looking up e.g. words).

Desktop is run on nvidia proprietary drivers with 'threadirqs' appended to kernel and priority raised of relevant IRQs, WINE, audacious, vlc and compiz and X to cut down latency

and compiling firefox or webkit-gtk

All that leads to NO to absolute minimal disturbances in playback (actually only twice so far and one equaled to a less than a second, 1/3 or 1/4 of a second "blip" during playback of audacious; the second now was a delayed sound segment [1-2 seconds] being interspersed while regular playback continued in audacious - these seemed to happen way more often in the past).

Might need to raise audio quality in pulseaudio to see how MuQSS can keep up with it

webkit-gtk and chromium are the worst offenders since with CFS and some BFS versions usually latency and smoothness especially of PDF Annotating in WINE, audio playback and mouse movement really suffers during those extreme cases

Compilations are reniced but usually under CFS it doesn't really help or seem to have an effect.

No regression in compilation time for firefox or sqlite, compilations are done in RAM (zram) on a newly created Btrfs after every boot - so that mostly excludes i/o limitations

Note that several security enhancing features for 4.8 of RAM randomization got enabled which might also raise latency and increase load somewhat - but so far all looks well,

Once enough experience was collected I'll do another stage4 build (which I usually don't do that often but let's see how it compares)

Sounds like a problem with a yield implementation rather than anything else. If your driver doesn't use sched_yield it wouldn't be a problem. There are literally thousands of discussions regarding the (in)correct use of sched yield and the battle has raged for decades. I'll take a look at yield and see what it's doing. Meanwhile you could try running your application SCHED_ISO to see what that does (just as a data point.)schedtool -I -e mpv

Ah in that case try NOT using sched_iso. Since it IS a realtime policy, if the application is written loosely enough that threads can lead to priority inversion, it can worsen performance out of sight.

Added a 002 pending patch for 108 as well which does something I forgot to do and improves performance a little more (I originally posted this in the wrong muqss version announce, even I'm losing track.)

Just wondering wouldn't it be more comfortable to have a single public git repo (github?) where you can simply push your changes, plus you can simply tag a commit which for example shows what release it is? Also having branches is much easier to search for patches.

I ran simple Unigine tests on my machine and version 108 seems to be best, it's within the error margin, but still...https://docs.google.com/spreadsheets/d/1EayezAsGlJdXjZbS3b9m7YtvtRF-DJ3xrT3hYCvfymQ/edit?usp=sharing

Would it make sense to run tests with smth in the background, like "stress" utility?

Can You please look at this crash: http://pastebin.com/5m22kAar .I was compiling another kernel in VirtualBox meanwhile watching a video in Youtube and it crashed, it slowly stopped to respond to anything I do and I had to poweroff the laptop.Kernel: 4.7.6 + 108 + wbt + bfq.

I gave another shot at 'muqss108 + 2 pending patches' with the runqlat utility.I had to revert 2 commit in the kernel to get runqlat working (from this bug report https://github.com/iovisor/bcc/issues/728).As a reminder, the test consist in building ffmpeg with increasing number of -j and running runqlat. This time, the results are in msecs and I ran runqlat for a much longer duration (120s instead of 10s) in the hope to catch high latencies if there are any.

I put the results in the spreadsheet. I'd like to make better charts, but google sheets doesn't have a lot of options.The latencies are much higher with muqss. I don't know if these results are expected. They are reproducible.

Yes it's possible for now, tough I doubt you mean milliseconds as according to that the latencies were over 32 SECONDS at one stage. I have an idea what might be causing them and will be posting a new pending patch soon.

Manuel could you please try without bfq first and then without wbt next. I know you all love bfq but there are just too many patches adding too many variables to this testing for me to know what's going on, especially with bfq.

For me it was a good decision to have stopped testing yesterday evening.Some hours ago, I've then added all the 7 piled up 108 pending patches (+muqss106-009) upon my otherwise unchanged (sorry ;-) ) 4.7.7 setup. Although uptime is quite short, I want to let you know early, that everything seems to be working well again. Going to test it longer. (And, when errors occur, begin to re-test with the other patches removed.)

@ck+:For normal usage everything went very well, only my TOI failed at the 3rd hibernation's resume.Currently I'm cross-testing this newer kernel with same setup+.config with Alfred's very last 4.7 patch, to see if 4.7.5 to 4.7.7 transition introduced malfunctions, before removing TOI/ BFQ/ WBT. Unfortunately, I'm quite sticking to this formally good working combination. :-/

Does someone know, whether in-kernel suspend-to-disk got better/faster over the last year?

I'm not doing any formal testing or benchmarking, but I did build MuQSS 108 for the 4.7.7 kernel on PCLinuxOS. It is the most responsive desktop that I've seen so far. Great work!(I also am using bfq.)

Hey Con; on my i686 build, I tried to disable SMT/SMP/HT but the build failed. Enabling SMT fixed it; I'm wondering if the equivalent of bfs497-fix_smt_nonice.patch might be needed or something?

Also, when I re-enabled SMT, the kernel config asked to choose the CPU governor; is there any difference to MuQSS on what is selected here (like ondemand vs schedutil)?

Fyi, on the x64 dual-core (its thru patch 3 atm), when I cross-compiled i686, I forgot to change its make config (set to -j4 as it used to be a Phenom X4). Load average reached almost 6, and I didn't notice any sluggishness at all!

Thanks jwh7. I'll look into the build issue but I'll worry more about fixing all builds when all the remaining bugs are shaken out - it won't cause any demonstrable detriment to enable SMT as a workaround for now. There's no difference in terms of what governor you should choose on muqss. Glad to hear you're benefiting from the interactivity too.