/proc/mtrr and incorrect GPU memory size

I've been trying to debug some random lockups that I think are related to an Nvidia GeForce9500GT and found (http://www.gentoo.org/doc/en/nvidia-guide.xml) a guide that talked about uncacheable registers reported in /proc/mtrr.

Which (apparently) is "not good" because there are uncacheable regions and no write-combining regions. However I am completely over my head here and have no idea what any of this means. From elsewhere online I found posts about ATI cards having trouble if all registers are write-back/uncacheable. The suggested fix in the gentoo guide (change BIOS settings) doesn't work for me because my BIOS doesn't have the appropriate option. (it also doesn't let me assign an IRQ to VGA, meaning my nvidia card is sharing an IRQ with usb controllers, which could possibly be the cause of my crashes, but I when blacklisted ohci_hcd (the module servicing the interrupts) I still got a crash.. I'm still not ruling this out though). The mobo is a gigabyte MA770T-UD3P in case you want to avoid (I certainly will in future).

Anyway, this led me to see which registers the geforce is using. The 9500GT has 512MB of RAM. This is confirmed by the nvidia splash screen during boot. However lspci -v gives me the following:

I understand this to be reporting that the card has 16+256+32M of RAM. I am wondering whether this could be related to my lockups? Are there any other instances of lspci reporting incorrect memory sizes? It seems like the GPU memory isn't using the uncacheable register, but is this the whole story? Is the lack of write-combining registers causing problems?

As I said, I am in over my head at this point. Any help at all, be it links or personal experience, would be very much appreciated. I've had the ultimatum "fix your crashes or your moving to windows" from my boss, which needless to say is not somewhere I want to go!

Maybe the problem is something else or maybe someone may have a clue about that problem. Also you may want to try things with a new PSU, I've seen and experience many strange lockups that stopped happening after changing the PSU.

Re: /proc/mtrr and incorrect GPU memory size

Argh, still no luck. Had a lockup, with ati-agp unloaded, when loading a VirtualBox image. Nothing unusual in logs, had to turn off with the power button.

Edit: VirtualBox will crash <5 minutes after starting if the vboxnetadp or vboxnetflt modules are loaded. I can only guess that my kernel is configured wrong for my hardware and certain modules trigger this in some way. Removing ati-agp seems to have stopped lockups related to the GPU (i.e. when a graphics heavy screensaver like pixelcity or substrate is running), but the problem still comes up in others ways.

Re: /proc/mtrr and incorrect GPU memory size

Try recompiling the virtualbox modules and check if it still crashes.A kernel update might have changed something making vbox crash the system. Also it might have something to do with (kernel?) performance counters. Now I don't see anything in dmesg but with vbox 3.1.2 when loading vboxdrv it would issue a warning to disable that otherwise it might cause trouble.

reg00 is way too enormous to be anything useful and my vid card doesn't show up at all. I get a no writing combining, graphics may suffer at boot and they definitly do. I know the intel driver itself isn't super hot but KDE is downright clunky and dvd playback is grainy, out of sync and tears.

I've read a few posts regarding rewriting your own mtrr table but you have to be careful because you get a lot of hard locks.

Re: /proc/mtrr and incorrect GPU memory size

I left the computer running over night with various programs running (including VirtualBox). It reset itself 4 hours after I left and the login prompt was waiting for me when I arrived this morning. There was nothing in the logs relating to the lockup, I only know when it crashed because of the usual startup logs. I experienced a hard lockup shortly after logging in.

I have followed brebs instructions and nvidia is using MSI now. I will see how it performs throughout the day. Fingers crossed.

Re: /proc/mtrr and incorrect GPU memory size

Still getting hard lockups. In fact, I think I am getting more now than I was previously. Perhaps the MSI/PageAttributeTable change is responsible. Off to read somemore about exactly what these options do.

reg00 is way too enormous to be anything useful and my vid card doesn't show up at all. I get a no writing combining, graphics may suffer at boot and they definitly do. I know the intel driver itself isn't super hot but KDE is downright clunky and dvd playback is grainy, out of sync and tears.

I've read a few posts regarding rewriting your own mtrr table but you have to be careful because you get a lot of hard locks.

Add the following to the end of your kernel command line in /boot/grub/menu.lstenable_mtrr_cleanup

I use this on my Fujitsu laptop and it corrects the entire MTRR table automatically at boot.

Which seems to be just removing the last 2 registers. Still no write-combining sections, and it appears to be missing 768MB of RAM. free confirms that around 768MB is not being found. If it fixes my stability problems I will be happy to make the sacrifice though.

I am curious while its happened. I've read in several places that it occurs frequently on computers made by Dell. I've also read that mtrr isn't used anymore and PAT replaces it but if I touch my mtrr table my system tells me its using it.

Re: /proc/mtrr and incorrect GPU memory size

I believe I've tried that as well. I'll have to take another looks at it though. I've looked at a bunch of stuff regarding this and I really think the only way to do it is to manually rewrite the mtrr table - I would like to find out WHY this happens and I have no idea what my /proc/mtrr should look like I'm also wondering if it counts as a bug...