Running drm, mesa, and xf86-video-intel from git master causes the entire screen image to periodically jump to the size for a fraction of a second and then return. This can happen several times a minute. Eventually, the screen itself will blank. This condition cannot be cleared by restarting X or running Xrandr and thus requires a full reboot.
The problem is discussed further on the list in http://lists.freedesktop.org/archives/xorg/2007-November/030147.html.

Created attachment 12704[details]
Log file from crashed X
Just now the machine froze again, this time with a blank orange screen. I was able to switch to a VT, but the machine apparently locked when I attempted to switch back to X. This might be another issue but then again it might be related. Attached is the log file from the crashed X session

To quote my original mail on the xorg mailing list regarding this issue:
----------------------------------------------------------------------
I tried the git version ab2055ebb20aa6de121fa377e488ce91913035ae of
intel_drv.so with an Xorg server version 1.4. The computer is a Mac
mini Core Duo with i945 graphics. A 1680x1050 TFT is connected via
DVI-D.
Every few seconds, the whole screen jittered for some tenth of a
second. After a few minutes of normal work, the whole picture suddenly
became black. Restarting the Xorg server or rebooting via kexec didn't
help. I had to do a hard reboot to get a working Xserver again.
Both of the above bugs already happened with a git version from a few
months ago. I can't tell what exact version that was, but the date
could be the 12th of September.
----------------------------------------------------------------------
This was with EXA disabled.

Ah ok, thanks for the clarification. I was suspicious of FBC since it rescans the framebuffer every 15 seconds or so, but if you had EXA off, it should have been disabled (you would see messages in your log about it if it was still on for some reason).
Can you build the intel_reg_dumper tool (src/reg_dumper in the git tree) and try to get register dumps from while things are working and then after the screen blanks? If we're lucky, something will have changed and that might give us a clue. Otherwise it might just be a bad mode that the LCD rejects after awhile?
Adding Hong to the cc list since he's been looking at LCD mode programming lately.
Hong, any ideas?

I use a modeline in my xorg.conf to get a usable mode at all with version 2.1 (fixed meanwhile in version 2.2, which I would use this this bug would't be present). So I guess that 2.1 and 2.2 use the same mode.
Another detail: a reboot via kexec won't bring back a usable output, the screen stays black. I have do trigger a real reboot without kexec to get back working graphics.

(In reply to comment #14)
> I use a modeline in my xorg.conf to get a usable mode at all with version 2.1
> (fixed meanwhile in version 2.2, which I would use this this bug would't be
> present). So I guess that 2.1 and 2.2 use the same mode.
>
So which modeline is working with your card?
And would you please attach your xorg log with "modedebug" turned on?

(In reply to comment #7)
> Any input on this bug? Even advice as how to approach debugging the bug myself
> would be welcome. I'm fairly competent at debugging driver issues but without
> an error I don't even know where to start.
>
Hi, Ben
Would you please turn on the "modedebug" option and attach your xorg.log?
I am not sure whether your 1920x1200 (requires 162M pixel clock) is OK for our LVDS mode programming?
Thanks,
Hong

I just attached a log from Xorg starting with the ModeDebug option enabled. Note that I too run the kernel in-tree DRM modules (presently from 2.6.23). Additionally, it appears that framebuffer compression is enabled in my case.

Created attachment 12921[details]
blacklight fixes
Hi, I send this weekend this patch to Jesse Barnes and Lukas Hejtmanek. They are the authors , I just compile what I think is the best .
The patch solve (almost) all my problem, could you give a try , and feed back please

(In reply to comment #21)
> Created an attachment (id=12921) [details]
> blacklight fixes
>
> Hi, I send this weekend this patch to Jesse Barnes and Lukas Hejtmanek. They
> are the authors , I just compile what I think is the best .
> The patch solve (almost) all my problem, could you give a try , and feed back
> please
>
I have yet to try the patch you provided but unfortunately I doubt this will fix my problem. I'm fairly certain that the backlight on my machine (a Dell Latitude D820) is not controlled by the graphics controller. I'll give it a try anyways though.

Yeah, sounds like the backlight code isn't the problem. Unfortunately, I've been having difficulty reproducing the problem lately. I don't believe it has happened in the last few days. Has anyone else found that the incidents have stopped wit recent git revisions or have I just had a lucky streak? I never was able to figure out a way to reliably produce the blanking, so maybe it's just been luck that it has stopped. Has anyone found a way to force the problem to occur?

Created attachment 12983[details]
Register dump during blank screen
I finally managed to get the bug to recur. This time the screen turned straight white. This time I managed to SSH in to the machine and grab a register dump as well. Moreover, I can confirm that suspending and resuming the machine fixes the issue. Additionally I know for a fact that backlight control is not the issue:
1) When I allowed the machine to sit idle, I could see gnome-power-manager eventually dim the display
2) My machine does not rely on the GPU for backlight control.
I hope this helps. If there's anything else necessary for debugging, just ask.
- Ben

(In reply to comment #29)
> I just tried the latest git (d9df93578b74785c08ba860b4c9aa23b0c89c91c). The
> jittering is still there. However, no black screen so far.
>
Would you please try to add some new modes toggling the polarity of HSYNC VSYNC to see if there is a mode working?
Thanks,
Hong

(In reply to comment #30)
> (In reply to comment #29)
> > I just tried the latest git (d9df93578b74785c08ba860b4c9aa23b0c89c91c). The
> > jittering is still there. However, no black screen so far.
> >
>
> Would you please try to add some new modes toggling the polarity of HSYNC VSYNC
> to see if there is a mode working?
>
> Thanks,
> Hong
>
Have there been any changesets that might fix the issue? I didn't see anything too promising in gitweb. Unfortunately, I won't be able to do any testing for a few weeks. Yesterday I dropped my laptop, triggering all manner of hardware failures. I'll definitely report back when I have my machine back, though.
P.S. Be careful about marking this as fixed. My screen behaved fine for nearly 72 hours once before blacking out four times in a matter of 10 minutes. It's a tricky one...

I already use a modeline (because I won't get a correct output without a modeline before 2.2.0).
Modeline "1680x1050_60.00" 146.25 1680 1784 1960 2240 1050 1053 1059 1089 -hsync +vsync
This modeline works fine with 2.1.1, so I think that it should be no problem with 2.2.0, too.

(In reply to comment #32)
> I already use a modeline (because I won't get a correct output without a
> modeline before 2.2.0).
>
> Modeline "1680x1050_60.00" 146.25 1680 1784 1960 2240 1050 1053 1059 1089
> -hsync +vsync
>
> This modeline works fine with 2.1.1, so I think that it should be no problem
> with 2.2.0, too.
>
Hi, Tino
Since you can workaround your problem with your specific modeline, maybe there are something wrong with your monitor's EDID data. Would you please attach your xorg.log with modedebug turned on?
For Ben's problem, I don't have any clue now :(
Thanks,
Hong

(In reply to comment #33)
Well, I should have a new computer by Friday (albeit with an i965) so hopefully I'll be able to reproduce it at that point.
- Ben
> (In reply to comment #32)
> > I already use a modeline (because I won't get a correct output without a
> > modeline before 2.2.0).
> >
> > Modeline "1680x1050_60.00" 146.25 1680 1784 1960 2240 1050 1053 1059 1089
> > -hsync +vsync
> >
> > This modeline works fine with 2.1.1, so I think that it should be no problem
> > with 2.2.0, too.
> >
> Hi, Tino
> Since you can workaround your problem with your specific modeline, maybe there
> are something wrong with your monitor's EDID data. Would you please attach your
> xorg.log with modedebug turned on?
>
> For Ben's problem, I don't have any clue now :(
>
> Thanks,
> Hong
>

Hi, I've also been experiencing this problem (jittery external DVI display, occasional external display blanking). Note that the laptop (Dell D620) doesn't hang, and the Laptop display (which was still enabled through xrandr) is still working. DVI is the only pipe that causes the problem btw (DVI is exposed through a port replicator too- that could be important).
I would love to get this issue fixed. The 2.1 driver didn't suffer from this problem in my experience. 2.2 definitely does. I run the debian packaged from sid.
Christian

(In reply to comment #36)
> Hi, I've also been experiencing this problem (jittery external DVI display,
> occasional external display blanking). Note that the laptop (Dell D620) doesn't
> hang, and the Laptop display (which was still enabled through xrandr) is still
> working. DVI is the only pipe that causes the problem btw (DVI is exposed
> through a port replicator too- that could be important).
>
Hi, Christian
Your problem is not the same as Ben's. Would you please open a new bug and attach the Xorg.log with modedebug turned on?
Thanks,
Hong

Created attachment 13717[details][review]
enable ssc for lvds dual channel
Hi, Ben
Would you please try this patch? And please attach the xorg log with modedebug turned on (with this patched applied) if problem is still there.
Thanks,
Hong

I used a slightly adapted version for the current git (origin/xvmc branch), that did not solve the problem, although the log says:
(WW) intel(0): enable SSC clock for LVDS
(WW) intel(0): enabling SSC clock for LVDS, setting dpll_b

I think I'm seeing this too, xserver-xorg 1.4.1, -intel 2.2.0, on a Thinkpad X60.
When I work on the LVDS, the display is fine. When I switch to VGA and disable the LVDS, I get occasional jitters of the screen and sometimes (one every few days) the VGA output turns itself off. If I remove the VGA cable and blindly xrandr --auto, the LVDS turns back on but I can't enable VGA again.
My monitor appears to emit broken EDID, or the intel driver doesn't read it properly, so I need a custom modeline:
Modeline "1680x1050" 149.00 1680 1760 1944 2280 1050 1050 1052 1089
Is this the same issue, or should I file another bug? Can I contribute anything useful?

I planned to try a git bisect to hunt this bug down. But this week I tested the 2.2 git branch and first thought that the bug is gone, because I didn't notice the jittering and got no blank screen after an hour or so. However, I left the computer running and when I came back from work I had a blank screen. Then I rebooted and after a while I got the jittering again.
So it looks to be hard to reproduce under certain conditions. I saw no procedure to trigger the bug or make it occur faster. Or did I miss something?

I wonder if disabling and re-enabling the display is the culprit? If you leave it for awhile, the DPMS callbacks should disable both the outputs and the actual pipes. If you do that a few times can you reproduce the problem?

I am experiencing this bug (I believe) on my laptop (i945 chipset). With framebuffer compression enabled, on average every twelve hours (but with a wide variance - sometimes it happens within an hour of rebooting, sometimes over a day), one or both displays is 'blanked' (actually not blanked, but set to displaying all pixels of a single color, usually black but occasionally grey or white or even another color).
For a long time I was running release 2.2.0 in clone mode on the internal display (LVDS) and an external display (VGA). The external display would be blanked, but never the internal display. I recently switched to git as of Sunday, and a side-by-side configuration (VGA right of LVDS). The two times the issue happened in this configuration before I disabled framebuffer compression, *both* displays were blanked.
As others have stated, neither restarting X nor fiddling with xrandr, disconnected and reconnecting the affected head etc can fix it - only a system restart.
Of note is that I did not see any display 'jitters', that I recall.
I will update to today's git, re-enable framebuffer compression, and see if the problem continues to occur...unless this is fixed in the driver, I'm recommending our X maintainer to disable framebuffer compression by default for the next Mandriva release (2008 Spring, due in April).

(In reply to comment #53)
> hmm, per comment 36 / 37, mine may not be the same bug. If not, can someone
> please direct me to the appropriate bug? I can't find it.
>
Adam, if you can confirm disableing FBC will relieve your pain, it might be the same bug.. we will wait for your test result of the latest git tree. thanks.

I tested the xf86-video-intel-2.2-branch branch as of commit 2c8f87be99957e0e18d8bcda46bd8706ab374253 on my Mac mini using LVDS connected via DVI. I still get the jitter sometimes, and yesterday I also got a blank screen. So the bug is still present for me.

I'm running the packaged intel driver in Debian Sid (2.2.0.90 with commit 2c8f87be99957e0e18d8bcda46bd8706ab374253 merged) on a ThinkPad X60 (945GM/GMS) and with frame buffer compression on I get screen jitter on the VGA output (not on LVDS).

Created attachment 14432[details][review]
Change FBC idle mode
Can you give this patch a try? It should (hopefully) change the idle mode for FBC to be a little more bus friendly. It may also be that FBC_CONTROL2 can't be written at all on pre-i965 chips.

I'm seeing the jitter too. I've been seeing this on Fedora rawide (F9 beta). The jitter occurs every 5-10 mins on the internal lcd (1680x150) but during a normal 8-10 hour day I don't normally get a lockup although I have seen it when running for extended periods +24hrs. Interesting thing is I don't think this use to happen on Fedora 8 on the same machine. But I had seen it on F7 with the later releases of the driver on a Dell D620.
The fedora rpm is xorg-x11-drv-i810-2.2.1-15.fc9.x86_64
Device is a HP Compaq nx7400
lspci
00:02.0 VGA compatible controller: Intel Corporation Mobile 945GM/GMS, 943/940GML Express Integrated Graphics Controller (rev 03)

Just an update. Its looking a lot better with the fbc turned off. I don't think its perfect but the random flickers seem to have gone, and touch wood I'm yet to see a lock up after 3-4 hours like I was seeing previously.

Created attachment 15590[details]
/var/log/Xorg.0.log
I am not sure whether this is relevant, even though I have FramebufferCompression set to "no" there is periodicall blinking (I am not sure whether it is exactly regular, but it feels like every minute or so) when whole display blinks for a fraction of second like when visual-bell is set on (but there is no reason to beel set off).

I seem to have this same problem, using Ubuntu hardy package xserver-xorg-video-intel 2:2.2.1-1ubuntu12.
I have not tried changing the FramebufferCompression option yet.
One thing I have noticed is that the problem does not start for me until *after* the first time the screen is blanked by the power management. I don't know if that is useful information, but I haven't seen it mentioned before. Once it starts, it happens pretty regularly every 2 minutes or so (but I don't believe it's a fixed pattern).
Actually, I only see lines referring to 'fbc enabled' after the first power management screen blanking, so maybe that's the connection. Sorry, but I'm not really familiar with what the framebuffer compression is supposed to be doing.
I'm seeing this on an external Samsung SyncMaster 220WM 22" LCD at 1680x1050 connected to a Dell Inspiron e1405 with Intel 945GM graphics.
Thanks...

Can I get the following questions answered by the people who have been seeing this problem?
Does it occur with FBC disabled?
Does it occur on the builtin LCD panel?
Does it occur when only one display is attached (builtin panel or otherwise)?
At this point, I suspect the memory arbitration or FIFO settings may be incorrect in some cases, and FBC exacerbates the problem (but it still might happen in other, non-FBC configs).
I updated the register dumper in git master so that it will dump the FIFO watermark regs, I'd appreciate register dumps from problem configurations.
In the meantime, I'm trying to reproduce locally by abusing my memory arbitration & FIFO watermark regs...

In my experience,
-- No, it does not occur with FBC disabled.
-- No, it does not occur on the built-in panel.
-- Not sure if the built-in panel is meant to be considered permanently "attached", but the problem does happen when I am using only the external monitor and have turned off the built-in panel with 'xrandr --output LVDS --off' (in fact this is the normal case where I see the problem).

Created attachment 16394[details][review]
Add FIFO debug code & default values
Success (maybe). I was able to reproduce the flickering as described by allocating less FIFO RAM to plane A, and eventually the display blanked. The debug code included in this patch caught the underrun status and printed an error in my log, and once the display blanked I needed to reboot for things to work again.
The patch also includes basic FIFO settings, which will override the BIOS values and may prevent the problem, but I'm still interested in getting answers to the questions I posed earlier.
Thanks,
Jesse

(In reply to comment #81)
> Does it occur with FBC disabled?
Have not seen it in the month+ since I disabled FBC.
> Does it occur on the builtin LCD panel?
That's the only place I saw it as...
> Does it occur when only one display is attached (builtin panel or otherwise)?
... I generally only run with the LVDS active.

- I don't get jittering with FBC disabled. I do get occasional screen blanking still.
- I only see jittering on VGA output to a LCD panel, not LVDS
- When I see jittering the LVDS is disabled, so there is only one output
I just noticed that my previous xorg log after a screen black/reboot contains this:
(WW) intel(0): PRB0_CTL (0x0001f001) indicates ring buffer enabled
(WW) intel(0): PRB0_HEAD (0xb0613474) and PRB0_TAIL (0x000135c8) indicate ring buffer not flushed
(WW) intel(0): Existing errors found in hardware state.
That said, I'll update to git drivers shortly.

So no one wants to test the last patch? Another good default to try in case the stock patch doesn't work would be to set DSPARB to 48, that should split the FIFO RAM equally between planes A & B, which are the only ones the current driver uses.

To answer the questions:
- Jitter only happens with FBC.
- It happens only on the VGA display. (But that is because the LVDS is always on plane B, due to my DRM being the one from kernel 2.6.25 and not from git master.)
I have tried the patch. It did not fix the issue at all (though it may have reduced how often it occurs). Nevertheless, you seem to be on the right track: Whenever a flicker occurs, an "underrun on pipe A" line appears in my log.
I will now play with the DSPARB register. When you said 48, did you mean that the register should be set to 0x2FB0? (Just to be sure I understand the documentation correctly.) Should I care about display C / overlay?

With DSPARB set to 0x2FB0, it is still as bad as before. But with 0x2FC0, the VGA display is finally stable!
Once I have a fixed DRM, I will try again with 0x2FB0. (Since the DRM will set the LVDS on plane A hopefully, the clock rate will be lowered on the fifo of plane A.)

(In reply to comment #88)
> So no one wants to test the last patch? Another good default to try in case
> the stock patch doesn't work would be to set DSPARB to 48, that should split
> the FIFO RAM equally between planes A & B, which are the only ones the current
> driver uses.
>
I am testing the patch...
I've tried it with 2.3.1 from debian/unstable and flickering and coloured
blanking gone. BUT, Totally black screen with huge amount of "(EE) intel(0):
underrun on pipe A!".
After that I've tried 2.2.1 from Ubuntu/Hardy and it's still flickering and probably will get blank very soon.
Do you want me to supply any additional info or perform test?
P.S.
I am using GM945 on Mac Mini with LCD connected to DVI-D.

Excellent, thanks a lot for testing. One other thing I'm hoping you can try: set DSPARB to (95 << 7) | (48)? That should remove most of the FIFO entries for display plane C, and hopefully keep things stable...

Guillaume, yeah you shouldn't have to care about display plane C, you can allocate 1 or 0 FIFO entries for it since we don't use it. Thanks a lot for trying this out, we really need to set DSPARB better. An even split between pipe B and A might not be appropriate in some cases, e.g. running 1920x1200 on pipe a and 640x480 on pipe B, so we'll have to do some bandwidth calculations and set it...

Bandwidth calculations may not be needed. Since the underrun seems to happen only with FBC, and since FBC is only enabled when only plane A is on, a solution would be: when enabling FBC, set DSPARB to 95/95; when disabling FBC, set DSPARB back to the default value 28/59.
Moreover, the FBC could perhaps be disabled for very high resolutions. (But is the chip fast enough to drive them?) At least in my case, 64/95 is fine for 1600x1200x4 at 60Hz. So I guess 1920x1200 should be fine with 95/95.

I don't know why it worked yesterday. (I'm sure it worked, since I verified with reg_dumper that the FBC was on.) But today, it does not work anymore: The underruns are back while I'm still using the 64/95 split. There must be something missing.

The current situation is really surprising: Switching VTs was enough to stop the underrun flood (I didn't even have to restart it). I have now been running X for several hours without a single flicker, while the FBC is still enabled. So I'm not even sure my 64/95 split makes a difference, perhaps the 48/95 would have been sufficient. Could there be some kind of race condition when setting the fifo or FBC that would confuse the chipset?

Maybe you're running into the DSPARB programming limitations? It's only supposed to be written when either all pipes are off or only one pipe is active with all planes directed at it... I think you could move the write of DSPARB into EnterVT after the output & crtc "dpms off" calls.

Created attachment 16667[details][review]
Fixup FIFO underrun on modeset
Oops, please try this patch instead. It fixes up the FIFO underrun status bit during mode setting as well, to avoid false positives (FIFO underrun is normal during mode setting, but in other cases indicates a bug).

Hi,
I tried git bd137a19dc29dd466eac030e040f729ed0807e3f from the master branch and still get jittering and a blank display after some time. This build includes commit 2e1425246ccc75216247b0c2fa6fce2635db472b which should fix this issue, right? At least for me, it doesn't.
In the Xorg log file, I still see those "(EE) intel(0): underrun on pipe A!" messages.
This is a Mac mini Core Duo with Intel i945, DVI, frame buffer compression enabled. I'll now disable it again.