Randomly, but not later than after 10 hours of work, the laptop freezes. Just freezes. Knowing that not all hardware errors are reported back to the user I tried the netconsole. Unfortunately the netconsole also didn't output anything at the time of freeze.

The only correlation I've found is that usually, on next powering up (after forced powering off) the laptop consumes 10W more power. But I suspect it might by a concidence; this behaviour is not limited to the times, when system was powered on after the freeze. After 1.5 (on average) times rebooting, the power consumption is back to normal.

The problem persists across any of kernel 3.5.x - 3.7.1.

The problem persists with WiFi turned off.

The laptop works just fine with Windows XP (I've never tried 7 on it)

I never tried 32-bit Linux on this machine.

I use both VirtualBox and VMWare. The hanging happens also when no virtual machine is powered on, but I know that both programs insert some kernel modules.

I use btrfs, dm-crypt, Huwavei E220 modem, bluetooth and a ton of other stuff typical for a notebook.

...

I will paste whatever log/configuration file that you deem necessary.

What is the next course of action for troubleshooting this freezing problem?

Knowing exactly nothing about the causes of the problem, there is almost infinite number of combinations to try. But maybe some of you are more experienced with debugging hardware and can suggest some usual suspects?

UPDATE:

Suspecting that the non-standard Ubuntu mainline kernel is a culprit I did reinstall the whole system, this time with Mint14 which is based on Ubuntu 12.10, which in turn is based on the 3.5.x kernel family. Unfortunately, the same problem :-(

UPDATE 2:

The distribution of hanging events seems to be non-Poisson (i.e. sometimes more frequently, sometimes less frequently), but so far I don't know how to correlate it with any type of event. It happens both when laptop is used interactively or not. It happens both when the memory is used (and the system page is used - zram in my case) and when the memory is only 30% used.

It freezes during work, or in a time, where there is nothing to do after work? Did you enable SysRQ in /etc/sysctl.conf?
–
NilsDec 8 '12 at 20:58

@Nils no, but I will. Thank you for a hint. (BTW the «BUSIER» combination doesn't work when system hangs)
–
Adam RyczkowskiDec 12 '12 at 7:38

3

So it keeps freezing and SysRQ b does not boot the system. Just as a wild guess: Try noirqdebug as kernel-option. This will keep the system from inactivating IRQs it thinks that are not in use. I had this funny effect on some servers - where a CDROM and a NIC were on the same IRQ and the system shut down the IRQ-handler after the CDROM was idle...
–
NilsDec 15 '12 at 22:23

@Nils the distribution of hanging events seems to be non-Poisson (i.e. sometimes more frequently, sometimes less frequently), but so far I don't know how to correlate it with any type of event. It happens both when laptop is used interactively or not. It happens both when the memory is used (and the system page is used - zram in my case) and when the memory is only 30% used.
–
Adam RyczkowskiDec 30 '12 at 18:34

@Nils I've just had another freeze WITH the noirqdebug... With the very new 3.7.1 kernel.
–
Adam RyczkowskiDec 31 '12 at 12:47

1 Answer
1

Finally I got into something. I'm not 100% sure, but it seems, that it is a nasty malfunction in Intel GMA HD3000 integrated graphic card. The problem can be triggered, when using 3D-capabilities for long time.

The not-Poisson failure rate is explained by the fact that sometimes I was using compositing, and sometimes not. I just failed to correlate compositing with the hanging events.

The reason why I didn't get the problems under Windows XP is now obvious: Windows XP don't use compositing, and I didn't play games on it. So no 3D was in use. Once I started a game, after some hours the graphic adapter crashed. Fortunately Windows XP (unlike Linux) was able to more-or-less gracefully handle the problem and with minimal settings (16 colors, 480x640 resoultion) it informed me about the condition.

And now the puzzle pieces fit well, and retrospectively I think I can confirm that the problems in Linux were present only when compositing was turned on.

I will post another, separate question on how to diagnose graphic card error under Linux.