I followed the Migrating to Modular X Howto but I can't get past the xorg-server-1.1.0-r1 emerge. I was able to emerge all the packages beforehand however. I start the emerge and everything goes fine until the system becomes unresponsive. It stops responding to ssh and I have to manually reboot it. The same thing has happened now 5-6 times and I don't want to push it when it comes to hard-locking the system. The same situation happened once before when emerging kdepim-3.5.2-r2, however this got "pushed down" when the Modular X came out so it is not an issue at the moment.

I'm pretty sure /var (portage work directory is the default) overflowing is not the problem, since after I reboot /var is only 12% full with 870MB available.

It *could* but due to flaky hardware, however I don't think this is the problem since I only have had problems emerging 2 packages, namely xorg-x11 and kdepim. Here is what /var/log/messages looks like before a crash, there's a correctable ECC error and a kernel debug message (which I believe should be harmless). The ECC message is uncommon while the debug message pops up more frequently. Doing stressful tasks like compiling makes them crop up more often.

Does anyone have any suggestions for how to go about troubleshooting this problem? For example ways I can get more information about what is going wrong? Or is it possible to emerge xorg with less flags, making the compile less strenuous? Anything is appreciated! Thanks.

Last edited by rmrfslashstar on Fri Aug 18, 2006 9:04 pm; edited 1 time in total

I don't know for sure what your problem is, but I had very similar symptoms on one of my Alpha's when one of my SCSI drives started going bad... bad blocks on the disk. Could your swap partition be developing bad blocks? That would explain why this only happens when emerging large packages..... the bad blocks on your swap partition don't get touched until you really start paging. Like I said, I'm not sure this is your problem, but it is possible.

Thanks! Ok, so I have 3 swap partitions, /dev/sda1, /dev/sdb1, /dev/sdc1. I did

Code:

swapoff /dev/sda1
badblocks -wvv /dev/sda1

Passed with 0 bad blocks. Then to remake my swap I did

Code:

mkswap -c /dev/sda1
swapon -p 1 /dev/sda1

Rinsed & repeated for /dev/sdb1 and /dev/sdc1: passed each time with 0 bad blocks.

I also did a reiserfsck --check /dev/sda5 (after umount in single-user) my /var partition and it passed with no errors. This is the partition that gave me trouble last December, however a rebuild-tree appears to have fixed the problem...

Any other ideas? In the mean time I will try setting PORTAGE_TMPDIR to a directory in /usr/local and see if that helps.

Ok, I tried setting PORTAGE_TMPDIR=/usr/local/portage_tmpdir however the emerge still failed. Afterwards I did a reiserfsck --check of the /usr/local partition as well and it passed with no problems. This leads me to believe it's not a harddisk problem... are there any ways I can force emerge to give me more info. on whats going wrong? The hard-lock occurs somewhere in the compile but I'm not sure if it's the same place every time...

I couldn't figure out how to get SRM memtest to run but I downloaded memtester 4.0.3 and ran memtester 900 3 and it passed all the tests on all 3 iterations (using 900 Mb out of 1024 total). It looks like I can push it to 950, which I'll try now, but above that memtester crashes.

If anyone has experience with SRM memtest I'd be willing to give it another go, but for the life of me I couldn't get past the "invalid zone" error.

Update: it also passed memtester 950 3.

Edit:

It appears the problem was that the system was overheating. When I took off the case the emerge succeeded. Thanks for the replies - it looks like this issue is resolved for now.