This version exhibits better throughput, better latencies, better behaviour with scaling cpu frequency governors (e.g. ondemand), better use of turbo modes in newer CPUs, and addresses a long-standing bug that affected all configurations, but was only demonstrable on lower Hz configurations (i.e. 100Hz) that caused fluctuating performance and latencies. Thus mobile configurations (e.g. Android on 100Hz) also perform better. The tuning for default round robin interval on all hardware is now set to 6ms (i.e. tuned primarily for latency). This can be easily modified with the rr_interval sysctl in BFS for special configurations (e.g. increase to 300 for encoding / folding machines).

Performance of BFS has been tested on lower power single core machines through various configuration SMP hardware, both threaded and multicore, up to 24x AMD. The 24x machine exhibited better throughput on optimally loaded kbuild performance (from make -j1 up to make -j24). Performance beyond this level of load did not match mainline. On folding benchmarks at 24x, BFS was consistently faster for the unbound (no cpu affinity in use) multi-threaded version. On 6x hardware, performance at all levels of load in kbuild and x264 encoding benchmarks was better than mainline in both throughput and latency in the presence of the workloads.

This is not by any means a comprehensive performance analysis, nor is it meant to claim that BFS is better under all workloads and hardware than mainline. They are simply easily demonstrable advantages on some very common workloads on commodity hardware, and constitute a regular part of my regression testing. Thanks to Serge Belyshev for 6x results, statistical analysis and graphs.

Other changes in this patch release include an updated version of lru_cache_add_lru_tail as the previous version did not work entirely as planned, dropping the dirty ratio to the extreme value of 1 by default in decrease_default_dirty_ratio, and dropping of the cpufreq ondemand tweaks since BFS detects scaling CPUs internally now and works with them.

Same issue here, but i can't reproduce it reliably (sometimes it happens after 20 mins or after 6 hours). It happens when i am using chromium: suddenly the browser freezes and i can't kill the process, top and pstree -p get stuck, ls /proc/* gets stuck too when it reaches the chromium pid, no messages in dmesg.

I've build the kernel with the most recent gcc 4.6 and the number of apps that actually run is miniscule. I get into xmonad just fine, but anything that apparently isn't urxvt or not limited to the console just won't start – no errors whatsoever, they just immediately freeze (incl. non GTK / qt apps like dzen – eclipse's and libreoffice's start screens do show some progress, but then freeze as well). The standard (Arch Linux) kernel works just fine.

thanks for you effort. 2.6.39 runs with ck patches here fine. I' ve added the BFQ disk scheduler too. As always, load most time over 1 (but this could be the result form the >80 open tabs in firefox and the nvidia driver ;) ). Most time the app switching works perfect without delay, but from time to time there is a delay from 5sec. and more under heavy load >5. No clue, where this come from, but could be an IO bottleneck from my laptop hdd. No matter, will live with that ;)

@Anonymous, no problem with VirtualBox 4.0.8 here and XP as guest, but I switched some time ago the IO APIC off, because users from VirtualBox forums had mentioned performance drawbacks with it. There is a tool, with allow the changing without bsod in XP, could be usable on WS2008 too.

Interesting. I'm unable to reproduce any of those problems here. There used to be a problem with ultra-low dirty ratio settings in the past but I thought they were fixed in newer kernels. Perhaps you're running into a variant of those? Tryecho 5 > /proc/sys/vm/dirty_ratioand see if it helps the problem.

Thanks, interesting. I don't use virtualbox so I'm not sure what you're seeing there, but I guess if I can find the time I'll give it a go. I tried chrome but it works perfectly fine here. Perhaps it's a config option? Can any of you having the problem link or email me your configs?

$ gcc --versiongcc (GCC) 4.6.0 20110513 (prerelease)Copyright (C) 2011 Free Software Foundation, Inc.This is free software; see the source for copying conditions. There is NOwarranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It happened again (after 20h of uptime) some seconds after i opened a new tab (process) on chromium. I don't know what could be the trigger on this (i tried to push chromium to the limit after a reboot but nothing happened) so i am currently rebuilding my kernel with BFS disabled to see if that patch is the problem (so i can discard the rest of the ck patchset as a cause).

Same problem here with gcc version 4.6.1 20110521 (prerelease) (Debian 4.6.0-8). I'm starting to wonder whether some part of userland is the real cause here though, since reverting to kernels that I know for a fact worked previously doesn't clear the issue up any.

When flinging crap at the wall to see what sticks in an attempt to fix this, I did just notice my XFS /home had some corruption from one too many power outages. Whether the issue will actually go away now remains to be seen.

I'm pretty glad this is happening to other people. I was starting to think it was something I'd done myself when I recently upgraded from 32-bit to 64-bit userland in-place without using any chroots or debootstrap or rescue media. It was quite the hack so I'm always looking over my shoulder for issues to arise from it.

The main advantage of your scheduler is that it lacks the heavy tail in the distribution. It is not about HZ, which is boooooring and trivial.So make this point about the tail clear and only talk about this in your announcement. Focus!

The second figure is the only one that is needed in the announcement. Explain it well. You should not talk about -j24. It is too technical and does not belong to the announcement. Link it and focus on your main point.

404-test2 definitely hangs here as well, just in different ways than regular 2.6.39-ck1. With the test2 patch applied, I don't even get to a desktop without the processes that are supposed to actually spawn the various GNOME processes hanging. Stuff like simple package upgrades with apt hung too.

I use XFS with delaylog mount option.When I applied test2, the application(delug) was hung within few minutes after staring.

Then I typed dmesg to see what happened.Actually, the following call trace repeated twice in dmesg:INFO: task flush-8:0:798 blocked for more than 120 seconds.Call Trace:[] ? submit_bio+0x48/0xd0[] ? schedule_timeout+0x12d/0x1a0[] ? bio_add_page+0x54/0x70.....

Thus I saw total 3 messages:task flush-8:0:798 blocked for more than 120 seconds...task flush-8:0:798 blocked for more than 120 seconds...INFO: task /usr/bin/deluge:2492 blocked for more than 120 seconds.....

It never crash the machine.If the app cannot write data to disk, the memory usage will grow up because there is a data queue.I check the value of dirty_ratio, and it is 20 already.

There is a little difference than before.a. use bfs 404, the warning message appears within few minutes. then check by iostat, the I/O rate is zero, and the memory usage of this application starts growing up.

b. use bfs 404+test2, it works well (I tested it overnight)

c. use bfs 404+test3, I check memory usage and I/O rate. When the I/O rate is zero than the memory usage starts growing up.However, the warning message doesn't always appear. I only caught it twice

Ah okay so test2 was good for you. Sorry it was another anonymous poster that had hung tasks with test2 and deluge. See I'm losing track of who has posted what. The warning does not actually imply failure, but that I/O is taking quite a while to commit. Interesting that it behaves differently to mainline at all.

I suspect there may be another bug somewhere in there. I can't see what else I can do for that plug flushing code. I need to review more of the changes going into 2.6.39-bfs. Is bfs (401 or 404) on earlier kernels okay for you?

Hi Con!I also experienced problems with chromium as already posted above. Then I tried the test2 patch, for a while it kept going, until the I/O issues appeared.The interesting things:1. Without test2 patch, when chromium hanged, if I launched htop to find and kill chromium it would lock up also, showing nothing but a blank console.2. With test2 patch, when I noticed the applications hanging (specifically: gentoo's emerge command) I tried to issue a "sync" command, which just locked up as if it couldn't sync the data.Trying to shutdown or reboot the system with some applications hanged will always result in a locked system while trying to stop the syslog-ng daemon (at least for me). At that point, I have to use the magic keys, sync, remount ro and reboot.Please note that I use "crazy" dirty ratio settings by default:vm.dirty_background_ratio = 95vm.dirty_ratio = 95vm.dirty_writeback_centisecs = 15000which is done to save my SSD's remaining life as much as possible. I had no problems with 2.6.38.6-ck3 and those settings.I tried to lower them while testing 2.6.39 and the posted patches, but not even the default values would work.Thank you for the great work and effort.

Thank you all for your testing and replies so far. Well this is all starting to seriously piss me off. As far as I can tell these problems started appearing after rc7, and I can't really isolate anything in particular, nor can I reproduce it locally. I may just have to pull 2.6.39-ck1 as a stable release and not support it till I can figure out wtf is wrong :(

Whoa, test5 already o.o. I just got another hung with chromium, i am not sure if this can be useful as i am still using test3 patch and SLUB but here is the sysrq(-t|-p) log http://pastebin.com/VPvM3PYf

I have to reboot now so i will rebuild my kernel with SLAB + test5 to continue the tests.

Did some testing, test3 and test5 don't work. I still get the very same issues (except that with test5 the lockup appears sooner than test2/3).After testing test3 I was curious whether these issues would appear also with CFS or not.So I did some more testing and found out that sometimes my main ext4 FS exhibits some concurrency problems (leading to BUG and machine lockup), and I'm quite sure that it can be tracked down to the changes done to the FS in the 2.6.39 cycle.A lot of work has been done in order to maximize parallelism and delayed allocation with filesystems in general.What I did notice is that when I build my kernel with BFS, the lockups occour only on the XFS filesystem that contains some other data (my root and home are on ext4).I'm thinking that reverting either one or both of these two commits should partially fix or hide the issues (hopefully not breaking BFS code again).http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7eaceaccab5f40bbfda044629a6298616aeaed50http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=b7ed78f56575074f29ec99d8984f347f6c99c914Basically, fsync and filesystem syncing have undergone substantial changes and rework, and maybe that is the root cause of all of the troubles.While CFS behaves quite well with these new implementations, BFS struggles (and I'm unable to tell why).But judging from the problems the ext4 FS is having, I guess this is not at all just BFS' fault...If you're wondering, I'm pulling all of my knowledge from here: http://kernelnewbies.org/LinuxChanges.I'm hoping in a very fast 2.6.39.1 as of now =)Hope to have helped a little.Keep up the good work ;)

Hey Neo2. Thanks for your comments. You are, of course, right in saying the new block plugging code is responsible for this breakage. Having a task go to sleep that still is plugged is the major problem here and I'm trying to find a safe way to unplug it. The fact that BFS rapidly and easily reschedules something on another CPU is what's biting me here, and the subtleties of how best to work with the unplugging code without causing deadlocks or dereferences are failing me. The fact that testX fixes one filesystem while testY fixes another and testZ fixes yet another workload, suggests I'm still not tackling this correctly. All of this is compounded by the fact that I've never been able to reproduce these problems myself.

As much as I hate to say this, I have to give up on 2.6.39 for now. I just don't have the time nor energy to fix this. I'm grateful for all your testing, but it's just going to have to go on hold and I'll have to support .38 kernels in the meantime until I have a revelation of some sort.

If you're having troubles reproducing - the only thing I've found which hangs is my build of Chromium. I've had days of uptime with BFS .404+BFQ so long as I don't run chrome. It hangs after a few minutes, unkillable even w/ kill -9. "pidof chrome" also hangs unkillable. Rest of the system appears unaffected until trying to shut down at which point everything hangs. No panics yet. I'll try debugging (having an unrelated issue building with symbols).

I wanted to ask whether anyone was building Ubuntu packages with the different test versions? I'm getting the strange hangs with 2.6.39-ck1, running on hardware. I'd love to test the different patches, but I have too much on my mind right now to rig up a build environment ...

This one could tentatively be a winner. test1 and test2 left me with the same Chromium problems initially reported and test3-test7 didn't even allow me to get to my desktop. With test8, it's been 15 minutes so far without any noticeable fuckups. Hopefully this is the one and you can put this frustrating mess behind you once and for all soon.

Well that's a VERY reassuring sign, thanks! At least I have a postulated mechanism for what's going on now, but this needs more testing. The final version will be a little cheaper too, but I'll just wait till I get more people testing first. I must have jinxed myself by saying 2.6.39 seemed pretty good :s