The bugs fixed in this version were to fix the Uniprocessor builds, fix the poor interactivity due to completely miscalculating deadline, and the suspend/resume regression. Please keep on testing and hopefully I can declare this one a stable release!

EDIT: I have re-tested and confirmed as reported by others, that 411 is indeed slower than 410 and actually slower than the stable 406. This is really disappointing as 410 was only faster by virtue of a bug that would favour throughput (and cause lousy interactivity). I've tried further optimising the code, but alas it appears that for our desktop workloads, skiplists are not the way forward.

I'm getting horrible hiccups with this version. It reminds me of this days of a computer with an ISA bus where more than one piece of hardware wants to use the same IRQ but not be very nice about sharing it.

Some examples:When I click on a input box in Firefox it can take a full second or more for the focus to go there and scrolling up and down on this page is not at all smooth.When I start gnome-alsamixer the application is not usable for five seconds, but the menu bar is responsive right away.

But maybe I didn't apply the correct patch. Is ck1-bfs406-411.patch meant to be applied to 3.0.4-ck1?

CK - there are some serious regressions introduced in v0.411 compared to v0.410. I show you data below for just my quad core machine. The benchmark is simply compiling linux v3.0.4 (bzImage modules) and timing the process. I have run 5 replicates for each version of BFS as you can see in the box plots. I have also repeated this experiment on a C2D and a dual quad machine (16 cores) and the treads are consistent: v0.410 > v0.406 > v0.411 > stock kernel.

You have any thoughts as to which patch you introduced that causes this regression?

Note: I will compiled v0.410 + the deadline patch and run it now. This way you can potentially eliminate the deadline patch as the cause of the regression.

Hrm interesting. Thanks. The deadline fix patch would lower throughput indeed. However it also is a correct fix for a bug in the design which would inappropriately favour throughput. Gotta keep watching...

So skiplists and the offset fix are mutually exclusive? What are my other option? In other words, what configure switch would need to be tweaked to avoid that "choppy" behavior with skiplists? ...or is this a forgone conclusion and was your last post a serious statement? :/

Then again, it's possible that I simply haven't optimised the code paths enough in the skiplists implementation. I'll try and shave every bit off before giving them up, because they really SHOULD be better.

Glad to hear it, CK! I will post my full analysis later today (probably 15-16 h from now after I have time to work-up and review the data). I'd also like to see someone else to repeat my experiments to verify reproducibility. I have provided the needed file and script to do so, read on.

What I did: 1) Configured up a linux-3.0.4 source with the default Arch linux .config (on x86_64) and used this as the test code set.

2) Booted a system into run level 3 with a minimal set of daemons enabled and run a little bash script that times how long it takes to do a "make -jX bzImages modules" and then repeats the process 5 times totally writing the results to ~/data.txt

It seems the problem I reported with the hiccups might be a false alarm. I upgraded to Xorg 1.11 at the same time that I upgraded to BFS 0.411. After running for a while on the official prepackaged Debian kernel, I have been experiencing these same hiccups. It is just not as severe on the Debian kernel. I'm going to try to see if I can still get the previous Xorg 1.10 packages that were in Debian testing and let you know how BFS 0.411 works with it.

I got Xorg reverted back to 1.10 and everything seems to be working just like it did with BFS 0.406. I did some reading and it looks like the issue I was having was with the nvidia binary blob. I read that the latest driver that is suppose to be compatible with Xorg 1.11 has some "drawing issues." Next time I'll try to confirm a bug on Debian stable before I send a bug report.

The original O(n) search was heavily optimised and only includes tasks queued but not running (i.e. not those already on CPU). The thing about O(n) is it tells you the complexity of the algorithm, but not really how fast it is overall. A fast O(n) algorithm can be faster than an O(1) algorithm even for all workloads, if the absolute speed of the O(1) is very slow. My guess with these skiplists is the overhead of the extra data structures and the O(log n) insertion is costing much more than any benefit at the search time. Both insertion and removal is more expensive with skiplists, and only lookup is faster.

I use the make -jx benchmark because it's easy to demonstrate and reproduce. Surprisingly it's also a very good make/break test. Every other more complicated benchmark I could think of throwing at the skiplists has also failed to find a single case where it's better. I really wanted it to be be, but it's not.