A development blog of what Con Kolivas is doing with code at the moment with the emphasis on linux kernel, MuQSS, BFS and -ck.

Monday, 24 October 2016

linux-4.8-ck4, MuQSS CPU scheduler v0.116

Yet another bugfix release for MuQSS and the -ck patchset with one of the most substantial latency fixes yet. Everyone should upgrade if they're on a previous 4.8 patchset of mine. Sorry about the frequency of these releases but I just can't allow a known buggy release be the latest version.

I'm hoping this is the release that allows me to not push any more -ck versions out till 4.9 is released since it addresses all remaining issues that I know about.

A lingering bug that has been troubling me for some time was leading to occasional massive latencies and thanks to some detective work by Serge Belyshev I was able to narrow it down to a single line fix which dramatically improves worst case latency when measured. Throughput is virtually unchanged. The flow-on effect to other areas was also apparent with sometimes unused CPU cycles and weird stalls on some workloads.

Sched_yield was reverted to the old BFS mechanism again which GPU drivers prefer but it wasn't working previously on MuQSS because of the first bug. The difference is substantial now and drivers (such as nvidia proprietary) and apps that use it a lot (such as the folding @ home client) behave much better now.

The late introduced bugs that got into ck3/muqss115 were reverted.

The results come up quite well now with interbench (my latency under load benchmark) which I have recently updated and should now give sensible values:

If you're baffled by interbench results, the most important number is %deadlines met which should be as close to 100% as possible followed by max latency which should be as low as possible for each section. In the near future I'll announce an official new release version.

Pedro in the comments section previously was using runqlat from bcc tools to test latencies as well, but after some investigation it became clear to me that the tool was buggy and did not work properly with bfs/muqss either so I've provided a slightly updated version here which should work properly:

52 comments:

I have been an anonymous user of BFS for years now and I have been doing my own testing on MuQSS.

MuQSS have been working flawless until V0.111 in my laptop. When I tried to compile V0.115 I noticed a regression in the compile kernel. Basically the kernel with V0.115 when on full load starts to stall progressively and eventually freezes the laptop.

My concept of full load is to compile a new kernel with 4 processes, Web browser running and running glxgears with optirun (nothing outstanding).

The laptop is an HP with Processor: Intel(R) Core(TM) i7-4510U CPU @ 2.00GHz, 8GB ram and using p-state powersave. The laptop has a hybrid GPU (Intel+NVidia).

I trace the problem back to a commit between V0.111 and V0.112, more exactly the commit fdd879d37e6ca088410511e9f1146c328700e92a.

I'm writing this message from a kernel with V0.111 and all the patchs up to V0.112 with exception of the one in the commit fdd879d37e6ca088410511e9f1146c328700e92a.

No. The behavior happens after a fresh start of the machine. Normally I never suspend/resume the machine. Now I'm more puzzled about what could be happening. I even compiled a kernel from source without any type of patchs except the MuQSS using as base the Debian default kernel. The results are the same.

Where do you think I should be looking at this stage? That commit, in my machine, is doing something else outside the suspend/resume functions.

It could be some issue with the hybrid gpu driver brought out through some bizarre coincidence on muqss. Additionally the only code in that patch that affects anything outside the suspend is reverted in this one liner I created for you:http://ck.kolivas.org/patches/muqss/Test/muqss116-test_task_not_wake_cpu.patchAnything's worth a shot.

That's great thanks for testing it! It means somehow your hardware/driver combination is setting affinity on something repeatedly and running into problems. You can run with that patch it should be safe. As for the bfq warning, sorry I know very little about bfq and you'd need to check with the author of the bfq patch here http://algo.ing.unimo.it/people/paolo/disk_sched/ . You can simply switch io schedulers on the fly if it's a problem too.

@ck:Yep, BFQ is in next door. Paolo's door. I just looked to the warning and with the hurry didn't think properly... My fault. What is in your opinion what is the best approach to discover the possible combination of hardware/driver creating the problem?

@Eduardo:Thanks for the tip. Is known if the warning has consequences? Or can be disregarded?

I've put new results of the runqlat tests with MuQSS 116 in both interactive and non-interactive mode.

For the curious, I also tested throughput with different frequency governor on CFS. They make significant differences with make j1, but at high load they seem roughly equivalent. Unfortunately I have no means to see the differences in power usage.

With v116 I'm now (again) seeing "NOHZ: local_softirq_pending 10" messages under load. First saw those with early versions, but I'm sure they went away at some point (with or after v112 I think). Some kind of starvation?

I was being slightly facetious on that one. It's a warning from a CPU that has been told it's okay to go idle where it discovers there are still softirqs to service. Since rescheduling is done cross-cpus lockless on muqss, it may just be a simple matter of disabling the warning but I'll investigate to make sure something isn't being missed.

I have noticed that the game ark survival evolved runs bad with ck kernel and amd Cool n Quiet enabled. Im getting very short fps dropdowns to 10fps or 15fps every 1 or 2 seconds. with stock kernel I get at worse 20fps but only when moving fast around otherwise it's around 30fps or more. disabling amd cool and quiet is also working. I have noticed that in ark the cpu usage is spiking a lot maybe sth isn't fast enough to clock the cpu higher? the cpu clocks ingame around 1400mhz and 4200mhz. I have a amd fx-8350 and a nvidia gtx1070 (just got it and was a bit sad about those stutters)

I might have very similar problem with VRQ (Alfred's kernel https://cchalpha.blogspot.fi/), but in Diablo 3.When I start game (via wine, of course) using VRQ kernel, whole game stutters very bad, FPS is around 15 (normally about 80-90). When it stutters, everything freezes (even mouse) for a second or so, then resumes and so on.Maybe it's smth specific which mux started doing as well. At least this really interests me and I think may interest Alfred as well.

What we have in common is HW manufacturer, I have Phenom II 975 BE.

I have to note that none of previous kernels I used (ubuntu, mainline, BFS and mux < 115) except VRQ, had this problem, but now I'll go and test latest mux as well. Yesterday LOL started stuttering badly as well (after I installed mux 116), but I thought this may be some other issues, like mesa from git.I'll keep an eye on system when this happens again.

Forgot to add that I play games using "performance" governor, eg. max freq all the time. I have enabled Cool'n Quiet since ages and no other kernels show the issue, except VRQ (will test latest mux soon).Latest tests show that "ondemand" behaviour, using VRQ, does not show the problem. But I'll test again w/o C&Q + VRQ/MUX and post results.

I think I have an answer on this.Cool n quiet for AMD means dynamic frequency/voltage/etc. scaling. I just tried disabling it and frequency stays at max speed at all times, which means it's about the same as running with "performance" governor.In addition, when I run my gaming tests, mainline kernel is always better with "ondemand" governor by a large margin, BFS/MUX using "ondemand" are comparably slow.Check https://docs.google.com/spreadsheets/d/1EayezAsGlJdXjZbS3b9m7YtvtRF-DJ3xrT3hYCvfymQ/edit?usp=sharing , page "Perf. (DRI2), OND GOV", You'll see BFS/MUX reaches about 30 FPS while standard Ubuntu kernel reaches 42 FPS.This explains gains and losses with Anonymous tests.

im not such a linux expert but as far as I understand Cool n Quiet is a bios function like intel eist. It underclocks the cpu when not in use. So im already using ondemand. I will try tommorow some other games because ark is a bad game to test I think I saw sth similar on Saints Row 4. and yes I tried to disable Cool n Quiet and it worked but maybe it's just the extra fps boost that did the trick.

switched back to ck kernel without the 4 patches and again back with the patches. And even though I said they don't change sth they make a little difference especially when I have higher fps in ark it works good and the stutters don't appear. So the only problem I have now is the relativ huge performance loss with ondemand. I will also try some unigine benchmarks if disabling cool n quiet, switching kernel or using performance governour makes any difference.

my results you have to download them to see:https://www.dropbox.com/sh/dybposdl9t52u7o/AACT64VoaLQUoV3Yh6iAsS9ea?dl=0

you can see that ck kernel with ondemand gives me around 200 less points in unigine-heaven compared to archlinux stock kernel but using performance governour works the same as using no cool and quiet and gives both kernels equal performance again.

Yes, I'd like to second that proposal for those people experiencing lags with plain v0.116: On my system these atm. 4 added commits cure a lazy mouse pointer after periods of being unmoved and graphical window content update delays at switching to top within KDE (also in the presence of video playback, what formerly stuttered at that very moment).Using these addon commits is IMO worth a try.

@CK:Also after making heavy use of /dev/shm backed by swapping-to-second-disk, in the presence of flash-streamed TV in firefox, the recovery of video and other KDE windows is almost immediate.This is a really remarkable advantage, as we may all know about the swap bottleneck.

@ck:Out of curiosity I've also added the two newest commits upon v0.116 (all six) and enabled the new possible settings: CONFIG_TICK_CPU_ACCOUNTING=yCONFIG_RCU_NOCB_CPU=yCONFIG_RCU_NOCB_CPU_ALL=y(Being in doubt, whether the latter two appear useful.)

At resume I get the following warning, you may want to have a look at: http://pastebin.com/FPAbxAv0