Thanks(?) to the massive changes to the mainline kernel I'd been forced to rewrite significant components of BFS to work properly with them, specifically the cpu frequency governors. At the same time I've had quite a bit of energy and enthusiasm for working on BFS in a way I haven't had in a long time. As a result, this updated version not only addresses the remaining cgroup stub patch bug (mentioned on the previous announcement) but implements further improvements and clean ups to go with those improvements.

Alas I still have no explanation for the random lockups some people are seeing, but I have seen reports of it happening on mainline kernels as well now, so while I'm always suspicious of my own code, there is also the chance that BFS exacerbates an issue in mainline. Something that appears common is onboard Intel graphics with the Haswell chipset.

Additionally I had reports of people being unable to suspend with BFS from 4.7 but I haven't heard back from them on later versions.

The short summary of improvements in this version are less overhead, higher throughput and less latencies.

I've rewritten the skiplist implementation to not require a malloc/free on insertion/removal of a new node which seemed to noticeably improve throughput at high loads.
Now that CPU frequency governors know what the scheduler is doing, the approach of BFS of old of knowing what the governor was doing and working around it is no longer helpful and I've removed the whole sticky task and offset for throttled CPUs and throughput has actually improved instead.
I've also added some micro-optimisations and cleanups.
I've added a minor change for offlining CPUs to prevent tasks trying to schedule to them.

The set of patches in ck4 is the largest in the ck patchset since the early 2.6 patchset days. I've also included the patch from Alfred (thanks!) to fix the warning that happens with suspend which is mostly harmless.

Each patch included has a mini changelog at the top.

I'm also keen to get feedback from people on if they see any noticeable interactive/responsiveness regressions by disabling the interactive flag as follows:

38 comments:

Thanks Con. 4 bfs releases in 2 weeks, well done !I feel the need to run throughpout tests again against bfs 497.The performance is the same as with bfs 490, and almost on par with bfs 467 (linux 4.4).https://docs.google.com/spreadsheets/d/1ZfXUfcP2fBpQA6LLb-DP6xyDgPdFYZMwJdE0SQ6y3Xg/edit?usp=sharing

I'll run the rq latencies test latter.I think this will close the chapter of bfs on linux 4.7.

@ckIt seems that the code changes are fininally settle down. I'm also working on an embedded version of skip list which similar to the skip list changes in 0497, and some other improvement about skip list. I'll release the code after cleaning up the commits in a day or two.

I have finished the code of my embedded version and new implementation of skip list for BFS. In short, there is improvement comparing to the baseline version. If you are instersting, please visit http://cchalpha.blogspot.com/2016/09/about-skip-list-in-bfs.html and http://cchalpha.blogspot.com/2016/09/new-implementaion-of-skip-list-for-bfs.html for detail.

I'm using bfs 497 for half a day and first lockup is here. It does not seem to be an issue directly with bfs, but most likely it triggers this error which is some sort of graphic memory management problem (that's as far I know).On previous BFS versions as well as Alfred's VRQ versions or Ubuntu standard kernels there are no issues like this. There were no system updates in between using 490 and 497 either.

I think that I heard problems with rsync and for other person with nvidia too and haswell, I only know exactly my problem, with stock kernel and bfs 472 nothing but the updates maky my pc unusable due to the freezes, gaming browsing( I use hardware accel in firefox nightly and chromium with vaapi support) it may exacerbate the hangs but the problem is here in some way..( I couldn't try yet the new version but some people in aur page has posted freezes problems and more in the arch forum, tell us for testing or something

I am in the 4.7.3-5 ck kernel haswell without the patch since aproximately half and hour, any freeze until now, before the freeze with gaming occurs in minutes with browsing usually in half and hour but for now it appears stable.. for me. i'll post the results

the version without the patch is working perfectly in my machine, thanks for your work con, I couldn't test the patch because it' isn't in the repo, but for me it isn't any need of use because the bfs 497 is perfect stable and reliable included temps(but it is colder in the city that days too xD)

I am the one from Arch forum who had problems using rsync to backup ~22G while running linux-ck-piledriver 4.7.3. I am now running rsync tests with Graysky's latest release, 4.7.3-5, from linux-ck repo, and so far after hours I haven't encountered freezes.

While using this, I had an issue where Discord (a VOIP client) just started lagging out on my Intel Core m5 with it getting insufficient CPU, which didn't happen on the stock scheduler or on previous versions of BFS.

Interesting and no doubt related to the cpufreq changes. Which governor are you using? pstates? I assume you have hyperthread enabled on your machine as well? If you know how to, try briefly setting the pstate governor to performance instead of powersave. Additionally try disabling hyperthreading in your bios temporarily. Neither of these is a solution, but to help me diagnose what the issue might be.

Using p-states. Unfortunately, this is the Core m5 suffering the issues so:A) Setting it to performance mostly just cripples the machine to the point Discord doesn't work anyway about as well. B) The BIOS is crap enough I can't disable hyperthreading. (It also crashes regularly on boot due to ACPI issues due to a rampantly out of spec ACPI table. I hate this BIOS so much.)

Sure. The other problem with testing the Discord thing upon reflection is that Discord worked great for about 40 minutes (sorry, I was in a hurry putting down the initial post so I didn't think to mention that) and then just completely broke due to the CPU starving it until I restarted the system into the regular setup.

Then are you sure you're actually blaming the right thing? Could be unrelated screwage with Discord. You could also try disabling interactive mode in the interim which won't require a new kernel.

After that, add the two patches in the testing directory. They add a tunable that will allow the cpufreq load estimator to test a variety of different mechanisms. You can set values from 0-5 in:/proc/sys/kernel/smt_load3 is how ck4 is set. Try 0 and 5.

Yeah, fair enough. I'll try the interactive mode thing the next time I've got a good long shot at using Discord. Sorry about the replies here, I just kind of jumped to blaming this given how the earlier 4.7 revision had turned out for me.

Con, kernel build fail with bfs497 and all the patches in pending applied when SMT_NICE=disabled (SMT_NICE=y builds fine).I managed to build successfully with the following modification. However I'm not sure it is correct.

On Haswell E here. I'm running with the performance governor turned on. Though not sure if it's getting initialized properly. In the boot logs I receive "ENERGY_PERF_BIAS: Set to 'normal', was 'performance'." Though, cpupower still shows the governor as performance.

The only thing I noticed with the freeze is it happens if I have a long running process taking up all of the cores, and startup another process that also requires heavy cpu usage.

This is how it always happens for me. CLion is loading a particularly large cmake file. (Unreal just uses one giant cmake file) Then in Virtualbox I start up a Windows 10 VM. After a few seconds I freeze. It also happens if Discord is running during heavy cpu use.