Unfortunately this dump would be quite hard to make heads or tails from.

Can you replicate the bug with a kernel that has CONFIG_KALLSYMS turned on, else you will need to get ksymoops run on this data with your System.map..._________________Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSDWhat am I supposed watching?

Initially, I thought it might be caused by CONFIG_PGTABLE_MAPPING. So I unset it and recompiled this kernel. I guess that was not the cause, as I got it again.

Unfortunately, I don't know what causes this bug and when it happens. I've had it a few times, but not everyday or every time I do something. I don't know how to replicate it.

CONFIG_KALLSYMS is not turned on this kernel. I need to recompile this kernel, after I'm able to replicate this BUG.

I don't know how to use ksymoops. But I found a redhat man page. Not sure I can get my head around it.

And I can't find ksymoops in gentoo repos. But I found this:

Code:

$ cat /usr/src/linux/scripts/ksymoops/README
ksymoops has been removed from the kernel. It was always meant to be a
free standing utility, not linked to any particular kernel version.
The latest version can be found in
https://www.kernel.org/pub/linux/utils/kernel/ksymoops together with patches to
other utilities in order to give more accurate Oops debugging.

Keith Owens <kaos@ocs.com.au> Sat Jun 19 10:30:34 EST 1999

I see .rpm and .tar.gz packages in there. Do I just emerge that with --usepkg?_________________"Growth for the sake of growth is the ideology of the cancer cell." Edward Abbey

Hi NeddySeagon, I could try 4.18, but I don't know how to make this BUG happen. For example, I have been using this laptop since I last had that oops and I posted this topic. I have rebooted a few times since. But it has happened again yet.

I don't know if this coincided with me testing nftables. I removed those modules from my kernel config, as I like to keep it clean and minimal. I don't know whether this matters either._________________"Growth for the sake of growth is the ideology of the cancer cell." Edward Abbey

Last edited by josephg on Thu Sep 27, 2018 5:25 pm; edited 1 time in total

At this point, since the problem isn't easily repeatable and ksymoops may not be readily accessible, might well just build a new kernel with the debug symbols built in, ready for the next event._________________Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSDWhat am I supposed watching?

should i rebuild my kernel with debug symbols for normal runtime all the time, or just for catching bugs?

this bug hasn't happened all day today either. so i'm not really sure when it will happen again. i know when it happens though, because this laptop kinda hangs for a brief while and then comes back to life again like everything's ok._________________"Growth for the sake of growth is the ideology of the cancer cell." Edward Abbey

I haven't used ksymoops for ages, got tired of keeping track of System.map -- all my kernels are built with the symbols -- makes oops/panics much easier to debug, and never have to deal with raw oops especially if it's not logged (e.g., "Not Syncing" or if failure happens before syslogd/journald starts).

Looking back, I think I got them after I started playing with nftables. I didn't get any further kernel bugs in my dmesg, after I removed all nftables from kernel and my system. I wonder if these could be linked. Perhaps not? So I enabled the nftables modules again in my kernel. And I caught this bug again yesterday, and the following one today

Interesting. This would indicate some problem with the memory manager / control groups and has no hints of network related issues. Whether it's a side effect of what you're using in your kernel has yet to be determined. The fact it's repeatable indicates your machine is building the kernel the same way.

Now I'd agree with Neddy and suggest trying a newer kernel, might be a bug in the particular kernel you're using that's being triggered by testing with nftables.

I wonder if it has something to do with the backpatch of PTI... :-( I have not been able to trigger this oops yet though I see some fairly similar oops on the goog._________________Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSDWhat am I supposed watching?

Interesting. This would indicate some problem with the memory manager / control groups and has no hints of network related issues. Whether it's a side effect of what you're using in your kernel has yet to be determined.

Shall I enable/disable either or both?

Code:

CONFIG_MEMCG=y
# CONFIG_MEMCG_SWAP is not set

eccerr0r wrote:

The fact it's repeatable indicates your machine is building the kernel the same way.

It's repeatable, but I am not able to replicate this behaviour with something particular that I might be doing.

eccerr0r wrote:

indicates your machine is building the kernel the same way.

Hmm I am using ccache.. suspect? On a slow machine, it is a boon. I'll try rebuilding this kernel without ccache, and see if it makes a difference.

eccerr0r wrote:

Now I'd agree with Neddy and suggest trying a newer kernel, might be a bug in the particular kernel you're using that's being triggered by testing with nftables.

I'm on btrfs, and suffered major filesystem corruption when I switched up from 4.9 to briefly tried 4.12 when it was labelled stable before it was labelled unstable shortly thereafter. I had debian, ubuntu, arch, slackware, and a few other distros on different subvolumes and I lost all of them. I wasn't too bothered as I was primarily using gentoo by then. I was able to recover most of my gentoo files (not the filesystem) and could resurrect it on a fresh new filesystem. And I lost most of my hair at that time. Now I'm reluctant to jump kernels without evaluating kernel changelog for btrfs. It seems there are major changes post 4.14 till 4.18 and if I jump that ahead I might not be able to backtrack to my currently most stable kernel 4.9.

Yes I have 4.9 as my rock stable backup kernel and no issues at all. I started using 4.14 recently, after 4.18 was announced as the next LTS. If I continue to have problems with 4.14, I'd move back to 4.9 rather than jump forward to 4.18. This is just from my previous situations having been burnt many times over on the bleeding edge. I don't mind if this was a test system, but I use this one is my daily driver.

eccerr0r wrote:

I wonder if it has something to do with the backpatch of PTI... I have not been able to trigger this oops yet though I see some fairly similar oops on the goog.

Is there something I could try to trigger this oops? I got another one last night, but none yet all day today. But then, I haven't started nftables yet._________________"Growth for the sake of growth is the ideology of the cancer cell." Edward Abbey

Then you may be okay to disable... thought that both openrc and systemd used cgroups to control resources during startup, maybe not.
It also will affect containers and if you're using it for portage, apparently. But it's not been needed (with consequences thereof) for years.

Well, the thing about the memory...how much memory is it using when the oops occurs, perhaps that is harder to gauge. Perhaps filling cache/buffers up can trigger the issue faster..._________________Intel Core i7 2700K@ 4.1GHz/HD3000 graphics/8GB DDR3/180GB SSDWhat am I supposed watching?

Then you may be okay to disable... thought that both openrc and systemd used cgroups to control resources during startup, maybe not.
It also will affect containers and if you're using it for portage, apparently. But it's not been needed (with consequences thereof) for years.

ok, i've been testing with MEMCG on and off, and i think this problem happens more frequently with it on. i ran without memcg for a few days, and no problems. today i turned it on, and kernel oops.

i think i can safely rule out nftables causing this, and i've been on nftables for the past few days without any kernel bugs popping up. my testing might not be very scientific, and i still can't make it happen.

eccerr0r wrote:

Well, the thing about the memory...how much memory is it using when the oops occurs, perhaps that is harder to gauge. Perhaps filling cache/buffers up can trigger the issue faster...

well, i have no idea when it's gonna oops up again. i ran dstat for a while, but it never happened at that time. i can run free after i notice it. but by the time i look at it, there's always loadsa memory free. i don't know how to catch it while it oops._________________"Growth for the sake of growth is the ideology of the cancer cell." Edward Abbey

btw i have had none of these problems on 4.9 series. i've gone back to my backup kernel 4.9.122 and no issue. i've thrown everything at it, including nftables, android-studio, sdk, etc. and all together too. no hiccups. it had iptables which i've removed now. maybe i'll stay with 4.9 a bit longer. it's not like i have the latest hardware, and since last month i feel like i have been road testing bleeding-edge and troubleshooting all the time. i actually wondered if something wrong with my hardware.

same config for 4.9 and 4.14 kernels from gentoo-sources. no extra use-flags either.