OK, enabling CONFIG_DEBUG_SPINLOCK_SLEEP=y seems to at least suppress error messages. Will there be any performance penalty?

zander

01-24-04 12:53 PM

The warning messages will have an impact on overall system performance if they occur frequently; in any case, the current (as of 01-09-2004) 1.0-5328 patch shouldn't trigger the condition.

caffeine

01-25-04 03:29 AM

My crashes have been getting more frequent. My last crash was such that it set the graphics card in a state where my system wouldn't post.

So, I started unplugging monitors, and discovered that the left monitor's plug (the one with the flickering) will actually spark when it touchs the PC case. I discovered this by accidently holding the plug and touching the grounded case and getting quite a shock. Is this normal? Should the monitor plug be carrying enough voltage to throw a spark? Maybe that's been screwing with my card? Well, I've unplugged that monitor so I'll find out I guess.

[edit]I just discovered the fan on my north bridge wasn't running! Perhaps this is my problem?[/edit]

In my log files now:

Code:

Badness in pci_find_subsys at drivers/pci/search.c:132

caffeine

01-25-04 03:43 PM

Still no stablity. I'm trying the open source "nv" driver to see if I can narrow it down to the nvidia stuff.

zander

01-25-04 04:17 PM

The pci_find_subsys warning messages are after-the-fact symptoms of severe error conditions. There are numerous possible error sources, most common among them broken AGP configurations (hardware, software or both), ACPI and APIC bugs (again in hardware, software or both), fbdev drivers (vesafb, rivafb), insufficient cooling and defective RAM; unfortunately, there are many more. The best I can suggest is that you try to identify the root cause of the failures you're experiencing via experimentation, i.e. check if disabling AGP helps, check if disabling ACPI helps, ... .

caffeine

01-26-04 12:35 AM

Hi Zander. I appreciate your help.

Using the nv module has been stable overnight. My latest theory is that's something to do with 2.6.1 / 2.6.2 + nvidia module as 2.6.0 had been quite stable. (I'm currently on 2.6.1 btw)

I've tried setting NvAGP to 0 and 1 to no effect, I've also tried enabling/disabling Side band addressing and Fastwrites using

I've seen two methods of killing ACPI, "pci=noacpi" & "acpi=off", although I'm not sure which is the correct method. I don't have any ACPI or APCI compiled into my kernel.

For what it's worth, I've got an ABit KT7A-RAID.

zander

01-26-04 04:47 AM

The XFree86 nv driver interacts very differently with both the installed graphics hardware and the operating system; if you find it to be stable and don't require features it doesn't offer, there's no reason why you shouldn't use it (aside, possibly, from performance considerations), but the fact that it works (or doesn't work) allows no conclusions to be drawn with respect to the nvidia driver.

Both AGP SBA and FW are frequently causing stability problems, you will want to make sure they are disabled (/proc/driver/nvidia/agp/status), which they are by default. Setting NvAgp to "0" prohibits use of the AGP port; you will want to do this for now. The mem=nopentium kernel parameter was an interim workaround for a Linux kernel bug and is no longer required or even desirable with recent kernels; don't pass the parameter to Linux 2.6 kernels. pci=noacpi disables ACPI PCI IRQ routing, acpi=off disables ACPI support alltogether; if your kernel wasn't configured to feature ACPI support, neither option will modify system behavior. Check /proc/interrupts to determine if an APIC is used; if so, the noapic kernel parameter will disable it.

These are only some of the possible error sources, however, and while I named a few others (fbdev, other hardware problems, ...), you should browse the archives of this and similar forums to get additional suggestions.