I ran 2.6.11-gentoo-r7 with 4k stacks disabled for 58 hours before xorg crashed again. I am now going to try running xorg with DRI disabled in my xorg.conf file -- still using ati-drivers though.

This has to be the most irritating experience with Linux that I have ever had ... If I could just jump to a console, be able to actually kill xorg, and then restart my desktop, as opposed to doing a hard reboot, it would be a lot more tolerable. _________________Vim has excellent syntax highlighting for configuration files: emerge gentoo-syntax
Learn how to use Vim: vimtutor

I don't know if this has come up yet, but I noticed the other day that this bug only seems to happen when I'm highlighting text.
ie. I'm reading a long un-paragraphed message and highlight it via the mouse so i can easily see what part I've read up to.

When I was running gentoo with kernel 2.6.11-gentoo-r6, and latest xorg (april 30st), I had no lockups.
Interesting things here were that I wasn't able to install QT or GTK+. However, I was using the nvidia driver. Because I wans't able to install gtk+, I wasn't able to install all kind of things like firefox, gimp, ... So I was only using mc,links,nano, aterm,... textbased stuff.

because of my problems with gtk and qt, I switched to debian sarge (testing).
On this installation, I'm using a default kernel (2.6.8-2-386). Debian still uses xfree86. I also installed the nvidia drivers here. I can thell you that I had lockups with xfree too. I then installed xorg-x11 (Ubuntu hoary binaries), but the lockups didn't go away. The lockups usually happen with firefox, but it happened also with konqueror. So it's NOT xorg-related I think.
Another thing that I found interesting is that when I used k3B, and tried to erase a cdrw. as soon as I hit the start-button, I had a lockup, however, k3b (actually probably the backend program) just started erasing the cdrw and finished it without any problems.

I'm thinking that it might have to do something with accessing the hardware.
I think this because k3b is accessing the hardware.
On the ubuntu forum, someone mentioned removing hdparm. again, this has something to do with access to the hardware.
and the next also makes me think in this direction:
If I enable Option "RenderAccel" in xorg.conf, I get a lockup as soon as the kde splash image apears.
If I disable this option, I get less lockups, but they still happen.
If I switch to the nv driver instead of the nvidia driver, it also seems to work better for me, but I cant' confirm yet if they went away, I'l have to test it a bit more. On the ubuntu thread (which was mentioned in part 1 of this thread, one of the last pages) there were also a few xorg.conf options that were mentionned. 1 was switching AGP to a lower rate (1X instead of 2x, 2x instead of 4X,...), again, hardware related.

My guess is that somewhere a link between the software and the hardware isn't behavin properly which is causing a problem in all sort of programs
maybe a note in response to earlier questions about gkrellm, I don't have that installed, but I do have torsmo.
Then again, gkrellm also displays hardware info! (but torsmo does too, and it also freezes during a lockup).
for framebuffer, I have vesafb.

I hope my thoughts are of any help in solving this big mistery.
Once I've posted this messege, I'll take a look at my log-files to see if there is anything interesting in it. (I just don't want to risk a lockup and having to type all this again

I don't know if this has come up yet, but I noticed the other day that this bug only seems to happen when I'm highlighting text.
ie. I'm reading a long un-paragraphed message and highlight it via the mouse so i can easily see what part I've read up to.

Does this apply to anyone else, or just me?

No haven't had that. What I can't work out - is why some people get this frequently and others less so, I've been going for a good while now without a lockup, last one was whilst viewing a dvd using Xine._________________Like the Roman, I seem to see "the River Tiber foaming with much blood"

The fact that some people have more lockups than others, maybe these people tweaked there computer more, and thus making more use of there hardware capabilities (dma, 3D accelleration, more specific drivers). If it is indead related to what I stated above, this could more or less explain the difference.
Another thing that I noticed, It always seems to be related to graphics and drives, but never sound. or did I miss something on that...

So far, I have been able to erase cdrw without a problem, swithc tabs, scroll pages, select text without any poblems. Hope I'm released from the bugs. (I think in my case the solution was switching back to the nv-drivers and without all the 3D accelleration tweaking).
[edit]
another lockup, this time with firefox and konqueror in the background, an maximized aterm and it happened while doing an apt-get update...
[/edit]
Johan_________________http://counter.li.org - Registered Linux user #295636

Yet another thing....for me at least, I tend to be away on work quite a lot, and when I am I never get lockups...

I bring this up because I'm only online when I'm at home, though I still use all the same apps (including firefox)
when I'm at work, so for me at least, these lockups only happen in X while im online.
And possibly only when I'm highlighting text, have stopped that since I noticed though, I'll post back whether it's
a lasting solution or just luck...

I tried setting my AGP aperture in the BIOS to 32 MB. It was 64 MB. I couldn't crash the same Gentoo system running simultaneous large glxgears windows and banging around Firefox in a way that reliably crashed things before. Of course nothing is proven until I live with it for a week or so. Unfortunately I was a naughty debugger and changed two things simultaneously, because I also reduced the AGP speed to 2x (it was 4x).

I'm only contributing this post because I hadn't seen the AGP aperture mentioned. General advice is that too large an aperture can contribute to instability. Of course something new in our software is also contributing. Perhaps Xorg is more aggressively using video memory and accessing the system RAM through the aperture? Just a wild guess, but maybe it's worth a try. Is there a way to monitor how video RAM is used? I'm running a P4 system with an SIS chip set, an NVidia GeForce GTS w/32 MB , and Gentoo 2004.3 with the latest stable xorg and kernel. Sorry, I'm on Windoze right now and can't supply the exact versions.

Well, I'm back home and figured I would update on what has occurred. I started all over from scratch on the AMD machine - stage 1 tarball, minimal drivers, and USE= X kde alsa dvd cdr. After all was compiled and running, I emerged Xorg-X11 6.8.2-r1. This morning, after all was complete - with no errors, I configured X, using xorgconfig with the "nv" driver and did startx. Same problems as I mentioned previously! (Mind you NO high end windows managers i.e. KDE).

So I decided to fiddle with the settings in xorg.conf. I started by commenting out the nvidia related device and using the generic vga device.(I also changed the driver under the screen to standard vga) Of course that failed - only because it doesn't support a DefaultDepth of 24. I changed the default depth to 8 and lo and behold no problems. Obviously the resolution wasn't very good but it was a start. So then I tried uncommenting the nvidia stuff, commenting out the vga stuff (possible conflict?) and did startx. Same problems as before. So I thought well I will try installing the "official" nvidia driver. That didn't work initially either. Now it gets a little foggy. I set nvidia to autoload and according to dmesg it did but now I'm not certain if it did. After rebooting, I did startx and it returned an error saying that the nvidia driver wasn't found. So I played around a little bit and kept getting the same error. I then thought let's see what other nvidia stuff is available. So I emerged nvidia-settings-*. Tried to run that but got an error that it couldn't start. I figured that had to do with the fact that X wasn't running. So I did startx and the nvidia splash page popped up, then xdm popped up. I tried shutdown->immediately and by gosh it worked! So then I tried nvidia-settings, and IT worked! I exited out of there, tried nano /etc/rc.conf in xterm and it worked flawlessly! I tried changing resolutions (ctrl-alt-+) - perfect. Finally I mashed shutdown->immediately again and once again X shutdown. So the foggy part is I don't know if while playing around trying to get the nvidia driver to work that it actually started working - or if something that was installed while emerging nvidia-settings-* made it work. (I got sidetracked - had to take the wife shopping javascript:emoticon('') )

So here's where I'm at. Xorg.conf has all the vga related device stuff commented out, using the official nvidia driver with the appropriate settings and as I write this that machine is emerging kdebase for further testing. Seems like too simple a solution... I"ll post the results for kde when it is done.

Out of frustration, I just back up my stuff and reformated the entire harddrive. Then I installed Gentoo again. Turns out I still got the random freeze even after all that. So I just want to advise people if you think reinstalling the Distro from scratch will fix your problem your sadly mistaken. Instead of doing that just try to figure out and resolve the problem.

Just an idea.
to help new people encountering this problem, is anyone interested in making a 'little' summary of all the suggestions that have been submitted, and things that were shown not to be the cause of the problem.
Just so that not everyone has to read those 30 pages of posts...

I don't feel up to re-reading all those pages, but here's from memory.

- The problem appears normally in connection with a quick succession of redraw events. In many cases user interaction is required in the form of the system responding to keyboard or mouse clicks, but freezes can also be caused by video playback or in the most serious cases nothing more than the GUI loading.

- It's not KDE, Gnome, Firefox, etc. problem, but comes from a deeper level common for all.

- Both xorg and xfree are affected. Downgrading has proven unsuccessful.

- It's not an ATI or nVidia problem. nv is affected to a lesser degree than the closed-source nvidia. In some cases nv may be perfectly stable where nvidia is not. vesa is the only driver not affected, apparently because it is so primitive. The problem either comes from a deeper level or results from a common thing every advanced graphics driver does.

- It's not about overly aggressive compiler flags in xorg. It's not even about -fomit-frame-pointer. Xorg compiled with -march=something -O2 is enough to trigger lockups.

- Experiments with different versions of gcc have been fruitless.

- Downgrading glibc hasn't helped.

- Recompiling/reinstalling various programs or the entire system has had no effect.

- The problem isn't related to some specific kernel option being on or off and appears in multiple kernel versions. Downgrading to a formerly stable kernel (with no changes to the config) didn't make the system stable for the one person who tried it.

- The potential fault of Gentoo's newer profiles or baselayouts hasn't been properly researched. However it isn't just a Gentoo problem.

- Slightly incompatible/flaky hardware may have something to do with it. However the problem is not limited to one type of motherboard or graphics card.

- Various people have reported success with miscellaneus things like like setting noapic on the kernel's command line, nolapic, adjusting the AGP value, uninstalling hdparm... Other people have reported that these things were not successful for them. More research is needed to sort through the complexity.

I tried setting my AGP aperture in the BIOS to 32 MB. It was 64 MB. I couldn't crash the same Gentoo system running simultaneous large glxgears windows and banging around Firefox in a way that reliably crashed things before. Of course nothing is proven until I live with it for a week or so. Unfortunately I was a naughty debugger and changed two things simultaneously, because I also reduced the AGP speed to 2x (it was 4x).

I'm only contributing this post because I hadn't seen the AGP aperture mentioned. General advice is that too large an aperture can contribute to instability. Of course something new in our software is also contributing. Perhaps Xorg is more aggressively using video memory and accessing the system RAM through the aperture? Just a wild guess, but maybe it's worth a try. Is there a way to monitor how video RAM is used? I'm running a P4 system with an SIS chip set, an NVidia GeForce GTS w/32 MB , and Gentoo 2004.3 with the latest stable xorg and kernel. Sorry, I'm on Windoze right now and can't supply the exact versions.

Cheers,
Steve

I can confirm this helped, already running 2 days without a crash, I have P4 with VIA chipset, Nvidia GF4MX w/ 32MB, nvidia's driver, X.org 6.8.2 and 2.6.11-ck7 kernel. AGP aperture was 64MB, after setting it to 32MB it's rock solid._________________Veritas, Aequitas

First setting AGP apperture to 128 or 32 (64 is default) didn't help at all in running kde, this means there will be lockup some time later, sorry:(
I have a new idea - I'm using nvidia driver without RenderAccel and it is rock stable, kde starts, no lockups in gnome with firefox and so on. But today I found some new problem, I have had the perl rebuilder started in terminal, after some time I switched from X back to the terminal and it has gone in some powersaving mode (console blanking) or sth like that from which it could not wake up, I could go back to X with ctrl+alt+f7, but I don't have terminals :( no one...
And earlier I have found out, that with RenderAccel option I could start kde from the terminal if I switch back to terminal right after the nvidia logo. (no I don't have any splashes nor fb-compiled in).
This leads me to think that the problem is somewhere in console blanking,agetty or init - accordingly kernel, util-linux ot sysvinit.
Could someone try if the lockups persist if he uses qingy (for example) for all terminals (I don't have time right now)._________________"I knew when an angel whispered into my ear,
You gotta get him away, yeah
Hey little bitch!
Be glad you finally walked away or you may have not lived another day."
Godsmack

Last but not least, verify if your power supply and your motherboard voltage settings for your RAMs, AGP, CPUs work well. Otherwise try to increase them. In my case, i do have to increase RAM voltage to make my motherboard more stable. But, the counterpart you could damage them and ... peraphs your motherboard too (for the price of a new motherboard that does certainly not matter .

emerging lm_sensors could help you to find out if the voltage and the temperature didn't exceed the motherboard and CPU specifications.

If you cannot isolate the problem, i really don't have any idea what is going on

Heres my experience, I install gentoo, emerge xorg.. configure x with nv driver. Then i can run X without noticable lock ups. Then i emerge blackbox, startx (using blackbox wm) when i launch Xterm, system locks up but mouse still works (can't switch to other consoles though). I hard restart. Now whenever i launch X in the default wm (don't remember what it's called) System freezes. So for me it has to do something with emerging blackbox. Thats just my computer though...

I have been running without a crash/lockup for over 81 hours now -- just under 3.5 days. I am using ati-drivers-8.12.0, xorg-6.8.2-r1, and gentoo-2.6.11-r7. I am using the same kernel config as I have always used, includes adding back the 4K stacks that I reported removing in my last post.

Unfortunately, the "fix" (so far) has been to comment out the Load "dri" line in my xorg.conf file. This seems to me a better solution (if it actually is one and you absolutely don't need dri) than changing to 2X, reducing the agp apperature size, etc ... I suppose time will tell if I continue without problems._________________Vim has excellent syntax highlighting for configuration files: emerge gentoo-syntax
Learn how to use Vim: vimtutor

I have this kind of lockup for weeks now. When using the nvidia driver xorg starts normally. But after Login when this KDE splash screen appears X crashes. This happens always and not randomly. If i use the nv driver there is no Problem, no crash at all.
As reported elsewhere i can't use keyboard and mouse but ssh in an kill X.