Bug Description

On the IBM T20 laptop using the Savage drivers, when gdm starts X, the screen stays black with a "_". The sound is initialized (I can hear the startup sound) and I can login through SSH. The screen gets initalized every 3 to 4 times of startup but I can't decern a pattern. This has always worked in previous versions of Ubuntu.

Workaround: Add this to the Device section of xorg.conf:
Option "BusType" "PCI"

The problem of lockups or video corruption as a result of 3D rendering is [not] well-known/understood. There appears to be a bug in AGP on Thinkpads interacting with Savage graphics cards. For reference, there was a recent discussion of this problem on the Linux-Thinkpad Mailing List. If you are experiencing lockups related to 3D rendering (using DRI) and you are using an XOrg version more recent than 6.9/7.0, you may need to add one or both of the following lines to the Devices section of your xorg.conf file:

I have the same problem, using a Thinkpad T22. It will predictably freeze if I start Ubuntu after a reboot. However, if I turn off the laptop and remove the power plug for a moment before booting it always works fine. Probably this is related to some PCI configuration issue.

I do notice two related issues that will (also) ruin one's experience trying to get gdm to load at boot:
-X.org socket cruft in /tmp doesn't appear to get cleaned by the boot process (possibly fixed already?), and
-When /tmp is missing, it may get recreated with the wrong permissions. (I'm pretty sure I did -not- do that manually, but, again, haven't been paying enough attention, don't have enough time to devote to the problem right now.)

After correcting those issues, I somehow managed to get X.org and gdm to come up on my last boot; for a while, it was equally toast even while undocked. I'm not sure if this has cured things or bought me a ticket in the one-time-in-four type crapshoot.

I confirm this on my own laptop IBM T20, I fixed this issue by reconfiguring several times X.org with dpkg-reconfigure. There is still this bug in the Beta Live-cd but only when you select the first option ("Start Ubuntu"), it works fine if you select "Start ubuntu in safe mode graphics".

Note: This report was sent in using reportbug by martin.ferrari at gmail.com

Hi, a week ago installed Dapper, and when it finished, I stared at a
blank screen. Later I found out that X was crashing the video adapter.
Not even soft reboot would recover it, only power cycling. I learned
this is a rare problem, solved with the following option in xorg.conf:

Option "BusType" "PCI"

Now it works, but I still have Oopses in the savage kernel module, and
the occassional freeze (as it's a total freeze, I cannot know what
causes it, but I'd bet on the savage). Maybe those symptoms are related.

Here's a summary the different workarounds for this problem that people have suggested in this thread, which of these is my best bet, if I'm willing to reinstall Dapper on my laptop and try to workaround this problem?

- Use vesa module instead of savage. How do you do this? Just replace savage with vesa in xorg.conf?

- reconf xserver-xorg using savage but deselect DRI => DRI seams to be the problem!

- Add Option "BusType" "PCI" to xorg.conf

- Add

Option "BusType" "PCI"
Option "DmaMode" "None"

to the "Device" section in xorg.conf

- reconfiguring several times X.org with dpkg-reconfigure

I think when I get round to it, I will probably attempt to just use the vesa module, that sounds simplest.

I re-installed Dapper from CD on my laptop that was experiencing this bug. The bug has now disappeared. In my case at least ubuntu seems to have special-cased my laptop all on its own. I installed from a boot of the live session in safe graphics mode, as normal mode failed to boot due to this very bug. After installation, it has set my video driver in xorg.conf to "vesa" rather than to "savage" as the previous install did. I'm pretty sure the previous install was an install from CD, not an upgrade to breezy. So I don't know what caused the change in driver choice, unless maybe it was the safe graphics live session, but to be honest, I think I was probably using that last time too. It's a mystery.

Anyway, I can confirm that switching Driver="savage" to Driver="vesa" in xorg.conf appears to fix the problem. At least, I haven't seen it in two days since I re-installed.

The use of "vesa" works but brings an unacceptable graphics performance. Easy to see if u scroll a website on your browser.

I found ann interessting experiment without any work: Download and start the newest version of Knoppix (5), that uses XOrg 7 and the savage driver.
The website scrolling there looks much better and you also have full hardware 3D-GL support. I already tried to use the xorg.conf from Knoppix in Ubuntu 6.06, without any success.

NO, have to correct myself: Also Knoppix shows sometimes the same behavior and does not start the xserver.
Incredible, that the savage driver sometimes works and sometimes not.
And I also tried out Efty Edge 6.10 without any improvements.

I have seen some anecdotal reports that the vga16fb module and the x.org Savage driver may be incompatible.

Unfortunately, I'm not well-versed in the Linux boot process. Please demonstrate the appropriate incantation to ensure the vga16fb module is never touched (and either vesafb or a dumber console driver is) and I will test this theory on my T21.

I can confirm that this bug is still present in the Edgy beta. I tried out the savage driver again, graphics performance is **much** better than with vesa, but it has trouble initialising X.

The vesa driver graphics performance seems to have deteriorated a lot between dapper and edgy. In dapper, I didn't much notice the difference between savage and vesa, in edgy, vesa's rendering is almost unbearably slow.

I am impressed and happy to know that there is still Thinkpad users working on resolving this famous S3 Savage problem. I am a T22 owner since about 3 years and I used Linux on it since 1 year. I tried 3-4 different distros. However, with each distros, I always had problems with DRI enabled.

I tried many ways to make DRI fully functionnal and at max performance if possible. Because of the 8 Mb video RAM, I run at 16-bit colors to get 3D. I used Fedora for about 8 months and I always make it perfectly worked with DRI enabled but with the "BusType" "PCI" and "DmaMode" "None" options in xorg.conf. Great, everything worked but with limitations, slow performance (~200 fps in glxgears) and some 3D applications didn't render any textures or any graphics at all. I installed openSUSE 10.3 few weeks ago. What a distro!! Anyway, I had the same problems but I had more luck sometimes. By default, on installation, openSUSE don't detect 3D acceleration capability on the Savage (I learned that it was maybe a detection bug). However, to enable 3D acceleration, I had only to add

Load "dri"

in xorg.conf modules section. I only did this change and I reboot. Wow, everything boot fine and DRI was enabled. glxgears worked with no lock and many other 3D applications. I powered off the laptop, turn it back on and damn, the desktop wasn't initialized, with a "_" in the corner. I heard the start-up sounds, everything, but no image. Suddenly, I heard the hard disk stopped, the computer was totally locked. I powered it off holding the power button 4 seconds. I reboot in failsafe mode, modified xorg.conf to add the "BusType" "PCI" and "DmaMode" "None" options, reboot and everything was fine, desktop back. However, still the same slow DRI performance as with Fedora.

But I was curious why it fully worked during one session? I searched a lot on google and I found this thread. I already tried all xorg.conf modifications solutions and with no success. However, someone (taylorjh) noticed a "cold" "warm" boot problem with it. I was curious about this because I remembered whent I used Fedora that I had such problems. I remembered I installed Fedora one night and get black screen after installation and after 2-3 reboot. I only powered it off and went to sleep. Next morning, I powered it on and, voilà, everything shows up and worked fine! I could edit xorg.conf to make the bus type PCI and everything else to make it worked with DRI but I never thought that it could be associated with the boot after a night sleep. Now, today, I decided to make a little experience with my Thinkpad T22. Here's all my steps and what happened (not to decrease battery life, I always run my laptop with no battery installed and only AC adaptor):

1- made a fresh openSUSE 10.3 install with KDE
2- first boot in KDE, added the Load "dri" line in xorg.conf
3- reboot, xorg is black screen, hear start-up sounds
4- powered off the laptop, unplugged everything, let it cool while I watched 4 Simpsons episodes :P
5- replug power, powered it on, desktop boot fine! run glxgears at 497 fps with no lock during 10 minutes, make a copy of Xorg.0.log and xorg.conf
6- safely shut down the laptop via KDE for 2 minutes
...

With this experience, we could think that there is an overheating problem or some persistence in a chipset memory (AGP, BIOS, video...) that is "cleaned" with time or with complete power down. Maybe I am completely wrong... or maybe not.

Hmmm, you are still trying to force the driver to use PCI if it is in combination with the T22 Intel AGP bridge? I thought someone could try to fix the source of the problem and try to get back AGP. Forcing the driver to be with PCI, that would be an ugly patch. Is there someone here who is trying to get deep in the AGP problem?

I'm not looking to get in deep with this driver; it sounded like incorporating the PCI enforcement would be quick and easy to do. Since it seems some are not in consensus with doing that, I'll cease my efforts. If someone else comes up with a better patch, I'd be happy to review/upload though.

Where is the consensus being questioned? AGP is often so broken on savage cards (especially if you use hibernation and sleep) so forcing all savage cards to PCI would be a good thing IMHO. Even better, it should default to PCI and let people experiment with AGP if they want and know how.

I have been fighting and reporting AGP trouble for like 4 years now (ever since DRI was available) and given the age of these cards and the developer interest I don't have high hopes for seeing it fixed. I would not be surprised if there's a hardware bug in them, also XP was unstable the short time I ran it on my laptop (with Savage TwisterK).

Tormod Volden wrote:
> Where is the consensus being questioned? AGP is often so broken on
> savage cards (especially if you use hibernation and sleep) so forcing
> all savage cards to PCI would be a good thing IMHO. Even better, it
> should default to PCI and let people experiment with AGP if they want
> and know how.
>
> I have been fighting and reporting AGP trouble for like 4 years now
> (ever since DRI was available) and given the age of these cards and the
> developer interest I don't have high hopes for seeing it fixed. I would
> not be surprised if there's a hardware bug in them, also XP was unstable
> the short time I ran it on my laptop (with Savage TwisterK).

I agree wit this approach. T20's are now at age where they are dying
natural deaths (mine already died) and there will never be more
produced. While it's too bad this doesn't work well, the number of
developers and users with these models will only continue to decline.

Ok, I agree with you. This card is now too old to support. Maybe the PCI enforcement should be implemented then. However, like Tormod said, there should be an option to enable AGP and let some people test it.

Unassigning myself, I have no further plans to put effort into this driver (#98 points out the hw is long dead anyway).

Here's my final take at this patch. Last time only Tom Shaw tested my patch and it didn't work, so I'm probably wasting my time. If it happens that the patch works well and there's a stronger consensus favoring adding it let me know and I can upload it. Otherwise, maybe it'll give someone else some ideas for a better patch. Good luck everyone.

I'm about to attempt an install of Intrepid on a Thinkpad T22 and a T23 for some friends, so I disagree with the "this card is now too old to support" statement. What are you doing of Linux's reputation of running on old computers better than Windows XP? Nothing indicates these computers are dying, and my friends may be using them for the next 5 years. Will there be 10 more versions of Ubuntu with this bug?

In conclusion, since apparently we are so close to a fix (or a least an automatic workaround), I strongly suggest that this patch and all alternative solutions be reviewed and implemented in an update to Intrepid. For instance, in time for the inevitable 8.10.1.

actually, now that I thought about it a bit more, all of these laptops have a 1400x1050 screen, which means that when using 24bit depth DRI is not used, so that's why they are stable. If 16bit is used, 3D-progs tend to hang the machine.. I haven't tried forcing PCI, but I take your word on it and support the idea of forcing it by default.

> I'm about to attempt an install of Intrepid on a Thinkpad T22 and a T23
> for some friends, so I disagree with the "this card is now too old to
> support" statement. What are you doing of Linux's reputation of running
> on old computers better than Windows XP? Nothing indicates these
> computers are dying, and my friends may be using them for the next 5
> years. Will there be 10 more versions of Ubuntu with this bug?

Gabriel,

I expressed those sentiments because the laptops are gradually disappearing,
especially among developers with time and interest to volunteer to fix the bug.
I expect Ubuntu and the broader Linux community would welcome you to fix the bug
yourself or hire someone else to fix it for you.

Demanding that someone else volunteer to fix it is unlikely to help, especially
when the potential volunteers no longer have access to the laptops.

I love the ThinkPad series. I currently use a T23, my wife uses a T22 and a
friend clings to a T21, no run with an external screen because the monitor
died. Of course I'd like to see the support improved for them and extended as
well. I see it is a reality of the volunteer-driven lifecycle of open source
software. You are welcome to alter cycle with your own time or financial
resources.

Bryce proposed a patch in April, but since none of you have bothered to test it, don't expect much to happen. When developers don't have access to the hardware, they are depending on the affected users' participation.

@Mark: If I was a Linux system programmer, I'd be already fixing the bug. If I had money to spare, I would probably have already donated a Thinkpad T22 to a developer or paid him to fix the bug (or bought my friend a new computer with a preinstalled OS, if I didn't care about the environment). But frankly it's rather insulting to get the canned answer "do it yourself or pay someone to do it or shut up": how does saying that to someone help moving things forward?

You should rather test the patch since you have two of the computers at home.

@ Tormod: What do we have to do to test the patch, anyways? It seems an earlier version was provided in .deb format, but I assume for this one we have to get the src deb, patch, then hopefully a debian command can auto-compile and package the whole thing? If you give simple instructions I can follow in 15min, I'll try and test it when I do my test installs on Saturday. But as much as I'd like to, I can't spend hours digging through wiki pages.

I'm concerned about the number of manual fixes that might be required to get Intrepid working on this T22. Will I be able to tell it's user to apply critical updates when they come up, or will I constantly fear that the updates overwrite my fixes?

I have a T20 which is still in use and going strong. I came accross this during a fresh install and have tried the patch out but there were 2 problems:

1. The VENDOR_ID/DEVICE_ID calls need to be compared against.. well.. the vendor & device IDs not the subsystem IDs

2. drmAgpVendorId and drmAgpDeviceId calls both return 0 because the AGP fd has not yet been initialised.

I was able to fix 1 easily, but I'm not sure whether there is a good way to get the Agp device details before it has been initialised.

I've attached an updated patch with the Agp bits commented out and IDs corrected. With these, it works for me but I guess it needs the Agp bridge test again to make sure it doesn't any working hardware.

@Gabriel M:
> @ Tormod: What do we have to do to test the patch, anyways? It seems an earlier version was provided in .deb format, but I assume for this one we have to get the src deb, patch, then hopefully a debian command can auto-compile and package the whole thing? If you give simple instructions I can follow in 15min, I'll try and test it when I do my test installs on Saturday. But as much as I'd like to, I can't spend hours digging through wiki pages.

How to build a driver is summarized on https://wiki.ubuntu.com/XorgOnTheEdge . I appreciate you don't have time to read wiki pages, but I won't repeat here what I already wrote in the wiki page.

> I'm concerned about the number of manual fixes that might be required to get Intrepid working on this T22. Will I be able to tell it's user to apply critical updates when they come up, or will I constantly fear that the updates overwrite my fixes?

If you have customized your xorg.conf, it won't be overwritten by a package update.

@Chris:
Thanks for trying the patch. IMO we should just default to PCI for the savage driver, and let people enable AGP in their xorg.conf if they want.

I second the motion to default to PCI and allow people to switch to AGP if they want.

I just spent 3 evenings after work trying to get Ubuntu/Xubuntu installed onto my old T-20. Once I found this bug it only took 30 minutes to get Xorg working.

These old thinkpads are great, I got a replacement motherboard for 25$ to bring mine back to life. Before the motherboard died, I installed one of the older versions of Xubuntu, back when 6.x something was the current release. I wasn't expecting this much trouble because Ubuntu had worked in the past.

Control-X to save the file, and then type "exit" at the shell prompt to return the recovery console menu. You should be able now select "resume" and have the issue solved for good. (as long as the xorg.conf file isn't changed)

I would appreciate if you test with a live CD. Boot with the "text" kernel parameter so X will not be automatically started. Then log in and update the driver (having it downloaded to an accessible drive*, or 1) configure xorg.conf to use the vesa driver 2) start X using startx 3) download the driver and log out again 4) remove vesa from xorg.conf again) and run startx.

Once beta is out, it will be much more difficult to get changes like this pulled in.

*) If your "live CD" is a USB stick, the easiest is to copy the driver there, it will be accessible under /cdrom.

Ok, I had given up on this a week ago because nothing would get X not to lockup but then I discovered bug #319210 and realized it was something else causing the lockup.

Anyway, after I got the network up I updated the system and I tested with the savage snapshot package above, a default xorg.conf with no driver or special settings specified and X starts up just fine on my T20. It's defaulting to BusType PCI, PCI DMA and using a default depth of 16 so 3D is enabled and these are the expected best settings for this hardware.

Please make sure to forward this patch upstream so they're aware of it. I suspect this works around the problem for many, but may cause regressions for anyone for whom AGP does happen to work properly.

Perhaps what would be a better solution would be to implement a quirking system for those cases where AGP is failing (I did something similarly for -ati a while back). But upstream will know best how to handle it.

I already discussed this patch on #xorg-devel (although Alex did not respond). The responses ranged from "who are using savage cards nowadays" to "fix AGP yourself". I am not sure upstream would like to disable a feature they have invested in just for the sake of stability and usability. I will post it on the list anyway, if only for other distributions to cherry-pick.

For quirking, you already proposed some patches earlier in this report, but the response was not conclusive. Anyway, the "regression" in terms of performance is hardly detectable IMO, for those things you can do with this card.

I can add, I will make a quirk system if I get proof that anyone having DRI enabled and working can resume correctly with AGP (I haven't heard of any) and if they can measure a substantial performance drop going from AGP to PCI :)