Tomek, yours problem isn't CK-patch related, rather. I use this patch on linux 4.0 on Arch (with some other patches) with SDDM and it works ok. It's possible, that you are using some linux 3.19 related files to build it, rather and - maybe it's most important - your NVidia proprietary driver isn't for linux 4.0. Try to build kernel with CK patch against of configs with Arch's linux 4.0 (in testing) and use it with nouveau or try to build nvidia-ck with patches for linux 4.0 (in testing, too and there is a thread on Arch's BBS about it, too).

Hi, I also got kernel panics using archlinux [3] with linux 4.0 + CK1. I was unable to retrieve any useful message from the system logs, but I was able to take some pictures when the kernel panic happened at boot time [1] [2].

With linux-ck-haswell (4.0-1) on Arch Linux, I see SATA bus related errors (failed command: WRITE FPDMA QUEUED) and SATA link resets which let all mounts stall. The kernel does not panic though; I can escape with Ctrl+Alt+Del (also no SysRq needed).

I have posted the relevant syslog part on the Arch Linux forums: https://bbs.archlinux.org/viewtopic.php?pid=1523519#p1523519.

A few days ago, i have installed a system-monitor tool. I've noticed that the CPU usage is strange with bfs. Core 1 is much less used than the others. The utilization of core 2 is better, but most of the work is done by core 3 + 4. As an example, the CPU utilization during the compression of a large file (left is with ck1, right is with cfs):

That is interesting. Are you using nice levels at all anywhere in your environment or are they used automatically at all by your applications in question? Alternatively is there anything that might be setting CPU affinity for your applications?

But now that you mention it, i remember, that i have played with /sys/block/sd*/queue/rq_affinity a while ago. And ***, it still stands at 2, i have forget to reset the value back. I'll test it again with the default value.

Also, i have irqbalance installed. I'll test it without. If that does not work, i will build a new kernel, without any patches, only BFS.

CK - A growing body of evidence seems to point to disabling NUMA as a cause of the panics under linux 4.0.x with ck1. I will report back once additional folks have a chance to test. At least 3 users have now reported no panics when NUMA was left enabled. More to come.

OK. Several users (five as I type this) have reported that when NUMA is enabled and they are running linux-v4.0.1 + CK1 + bfs462-1st-change.patch, the kernel panics are back: discussion thread with details.

Ok, I must do an alteration to my posting.It seems only, that these kernels with BFS were stable, but during heavy IO (and network IO), all my machines with BFS are crashing.First seen on my server, writing data over the network to the RAID5 NFS share leads to an crash. But it occurs also on my desktop machine, but more rare. Enabling/Disabling NUMA doesn't help. Only disabling BFS works. Tested with zen Kernel 4.00 .. 4.03.So my last working kernel with BFS (on my server) is 3.17.x. Starting with 3.18 the crashes were starting. After your 3.18 patch to resolve it, the problems were evidently gone (as written by my already on your side), or only not enough stress tested by me.

Maybe the problem with the actual kernel is located in these old changes from 3.18, but this is only my guess.

PS: I know, the kernel line 3.17 is out of support, but I prefer at the moment an old kernel with BFS over an actual with CFS ;)PPS: BFQ is enabled too in zen. But this doesn't affect the crashes (already tested)

Interesting. I was having some crashes recently while experimenting with btrfs commands on an external drive (mainly scrub or btrfs-convert). I was starting to think it could be due to bfs bugs and you have the exact same problem. Do you know if the rtmn patch fixes this bug?

Thanks for the patches. The users are reporting no effect with these two patches (ie still kernel panics) when NUMA is disabled. Just like before, if we enable NUMA, no one has reported a panic. I don't know if the NUMA status + CK1 is to blame for the panics or if it merely catalyzes them. We stand by to test any other patches you can offer up.

Thanks as always. As I've been trying to say, NUMA is just papering over the issue as I don't expect anyone should have to enable numa for an ordinary kernel to work. However I don't actually know what the issue is so enabling numa is a decent workaround till I happen to find whatever it is. There is no numa specific code in the latest kernel so it's sheer coincidence and so far the circumstantial evidence points to the assembly changes in do_fork for x86 being responsible somehow. What exactly, I don't know and finding time to go through this with a fine toothed comb is hard.

@grayskyFrom your thread, I have noticed that -3 test kernel with my -gc patches set but NUMA disabled seems to work? Right?If this is true, would you please try the -gc patches upon v4.0.2, if it was still confirmed true, please narrow the patch set down to this commit

4.0.1-3-ck has NUMA enabled and uses your patches --> no panics4.0.1-4-ck has NUMA enabled and does not use your patches --> no panics4.0.1-5-ck has NUMA disabled and uses CK's attempted patches --> panics

This trend was also confirmed in 4.0.2...4.0.2-1-ck has NUMA disabled and uses CK's attempted patches --> panics4.0.2-2-ck has NUMA enabled and uses CK's attempted patches --> no panics

So the only common thread I am seeing is NUMA disabled = panics :/

Do you still feel that using that commit + CK1 + NUMA disabled would be a worth-while experiment?

Hey ck; for your reference, on my uniprocessor setup, which happens to be using graysky's Arch AUR package, I tested using NUMA enabled to see if it also resolved my boot panic like SMP does, in case they might be related. But no...I'm running 4.0.3 now: NUMA off, SMP/HT on.

OK, this issue is solved for me now. And it's most probably not related to BFS/CK (maybe triggered more often, though).I was able to get rid of the unreliability of TuxOnIce {resume often hanging @"Doing atomic copy/restore"} by changing .config options related to my graphics driver:The only working combination is to compile DRM into the kernel and i915 as a module (not both into kernel and not both as modules).Tested on 4.0.4+BFS, 4.0.4 with -gc branch and 3.19.8 with -gc.

Thank you very much! It works well for me on top of Alfred's current -gc.

BTW, can someone check whether this commit is also worth applying to BFS/CK ?: "sched: always use blk_schedule_flush_plug in io_schedule_out"https://github.com/torvalds/linux/commit/22f546a33bac11aea8af5e570f296234ecdd60d4

Any progress on fixing the issues with the patch? I haven't had any problems with the patches because I've always had NUMA enabled (enabled by default on ubuntu configs), but I would like to see this issue resolved so we can move on to kernel 4.1.

Thanks for your reply. I have experimented a bit with git, but I don't use it frequently, so I tend to forget. ;) I do some rpm packages for PCLinuxOS, so it is much more convenient to have a versioned patch file that can be added to the src.rpm.Thanks for all of you work. I will likely just wait for the next official -ck release.

Sure. Nothing's happened for 3.1, and no progress has been made on fixing the non-numa build bug. As usual when I start working on it I will finish working on it shortly afterwards, but so far I've been too busy to do anything.

Send me a machine with 6-cores and I would be glad to test it for you :) The benchmarks I published nearly 3 years ago include a dual quad machine with and without HT enabled. If you are so including, you may use these underlying bash scripts to benchmark your 6-core machine and post the data here. I am happy to plot the results for you.