When I'm running some game (in this case Paintball2 (it is 3D
FirstPersonShooter based on Quake2 engine)) and go to some very "hard-
renderable" (there is very low fps) place (in that game), it quits with error
"Error: Rage 128 timed out... exiting".
I found out that this error is in r128_ioctl.c file in function:
void r128WaitForIdleLocked( r128ContextPtr rmesa )
So i did some changes to see what is wrong and edited code a little:
do {
i = 0;
do {
ret = drmCommandNone( fd, DRM_R128_CCE_IDLE );
+ fprintf( stderr, "ret=%d, i=%d, to=%d\n" , ret, i, to);
+ if (ret) DO_USLEEP( 100 );
- } while ( ret && errno == EBUSY && i++ < R128_IDLE_RETRY );
+ } while ( ret && i++ < R128_IDLE_RETRY );
} while ( ( ret == -EBUSY ) && ( to++ < R128_TIMEOUT ) );
if ( ret < 0 ) {
+ fprintf( stderr, "(ret=%d, i=%d, to=%d, errno=%d)\n" , ret, i, to, errno);
drmCommandNone( fd, DRM_R128_CCE_RESET);
And when I run the game and go around the level looking on the wall everything
if OK, it prints out:
ret=0, i=0, to=0
(with every frame, I think)
So IDLE call is successful instantly (in the first iteration).
But when I look's on some "hard-to-render" place (many polygons or something),
game hangs and it prints out:
ret=-22, i=0, to=0
ret=-22, i=1, to=0
ret=-22, i=2, to=0
...
ret=-22, i=32, to=0
(ret=-22, i=33, to=0, errno=22)
Error: Rage 128 timed out... exiting
Is seems that something causes the IDLE call can't be done successful, anymore.
Also I tried to add line with DO_USLEEP(500000), but it didn't help.
Error number 22 is EINVAL (invalid argument) and is thrown when ioctl's
"request" or "argp" arguments are not valid.
It should throw error 16 (busy) at most...
I'm not so expert in Mesa's DRI and DRM code to find the problem.
(ps. this problem I had with some other games and with Blender(3d modeling).
and some games eg. Quake3Arena doesn't do this problem...
maybe it occurs only when rendered scene is very very full of polygons...
win32 version of Paintball2 doesn't hangs in this scenes but have big spf
(seconds per frame))
Thank your for helping.

Created attachment 8627[details][review]
patch to disable span locking
I found similar problem in R300 drivers and I also found patch which should solve it (but I don´t know if its real solution).
So I rewrote this patch for r128 with little change - disabling span locking by default.
Set R128_ENABLE_SPAN_LOCKING if you want to temporarily disable this patch.

I'm afraid the patch is not acceptable in this form. If anything, it should default to the span locking being enabled, with the possibility of disabling it via an environment variable, as is done in the r300 driver. However, AFAIK the latter hasn't actually needed it in a while (certainly not in my experience), so it would be even better to identify and fix the actual cause of the hangs. It sounds like other parts of the driver may not handle the locking correctly.

> does mesa commit 25f21b5331d27225b1f6b7aaf2c9bf3f32764d91 (see bug 7994) help?
Unfortunately no, it doesn't.
armagetron still crashes. In System Settings->Display Settings->Detail Settings, if I set "object textures" to "off", it runs fine when I start a game. In game, if I hit escape and change the "object textures" setting, the program crashes right away with stderr of:
Error: Rage 128 timed out... exiting
kernel log:
[drm:r128_cce_idle] *ERROR* r128_cce_idle called without lock held, held 0 owner ffff810052cc0680 ffff810052cc0680
Another problem that's still present (and was mentioned in this bug re: kpovmodeler) is the lockup of the whole server until the client program is killed. This is https://bugs.freedesktop.org/show_bug.cgi?id=3493. If you have access to any r128 hardware, progs/demos/engine is the best test case for it. It locks up, with 100% repeatability, before it can even draw its window the first time (the previous contents of that screen area are still visible inside the window frame). And killall engine restores everything to normal. The behaviour of the "engine" lockup is completely unchanged by that commit. It still says:
[drm:drm_lock_take] *ERROR* 3 holds heavyweight lock

(Sorry for the double post, I'm a first-time bugzilla user.)
re: [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
I have attached some heavily formatted log output, which shows drm debug messages at the time of the hang (the messages come from the kernel log, obtained using "modrobe drm debug=1". You can see that after the cce_idle ioctl call, context 3 tries to lock again without unlocking first. I recompiled the Mesa DRI driver with the DEBUG_LOCKING flag set in r128_lock.h, and found out that this double locking behavior occurs in the depth buffer calls in r128_span.c. Basically
/* These functions require locking */
+/*
#undef HW_LOCK
#undef HW_UNLOCK
#define HW_LOCK() LOCK_HARDWARE(R128_CONTEXT(ctx));
#define HW_UNLOCK() UNLOCK_HARDWARE(R128_CONTEXT(ctx));
+*/
/* 16-bit depth buffer functions
*/
fixed the problem for me. Apparently, the lock is taken before these functions get called (AFAICR, r128SpanRenderStart() does the job, but I'm not sure).
re: *ERROR* r128_cce_idle called without lock held
drmP.h has a LOCK_TEST_WITH_RETURN macro, which returns -EINVAL along this error message when the lock isn't held. Because of this, the infinite loop in r128WaitForIdleLocked exits with ret == -EINVAL, so the actual error message is quite misleading in this case. So, the real error is that r128WaitForIdleLocked got called without a lock held.
I modified DEBUG_LOCK in r128_lock.h to print all lock operations to the standard error output, and modified the code to call DEBUG_LOCK on unlocks too. r128WaitForIdleLocked was also replaced with a macro to show where it's called from. It turned out that sometimes r128SpanRenderFinish() is called without calling r128SpanRenderStart() first -- so no lock is held during the call to r128WaitForIdleLocked. I think this may be a problem with the software fallback. Armagetron exhibits this behavior, while gl-117 calls RenderStart and RenderFinish in pairs; lock operations also.
Using kernel 2.6.22 with drm and mesa from the freedesktop git.

Created attachment 13855[details][review]
patch to fix various span locking issues
Here goes my proposed patch for fixing some locking issues with the r128.
I think that most of the problems are caused by some primitives which ended up in the software rasterizer without setting up the software part first. The defined HW_LOCK() and HW_UNLOCK() macros mentioned in comment #11 caused "double locking" ("*ERROR* 3 holds heavyweight lock"), if the correct fallback path was taken, and r128SpanRenderStart was called before reading/writing buffers. Removing these #defines as in the radeon source - leaving the job solely to ...SpanRenderStart and ...SpanRenderFinish - in turn left the software rasterizer without *any* locking for some primitives, which resulted in early application termination (because now eg. r128WaitForIdleLocked could be called without acquiring the locks -- hence the "*ERROR* r128_cce_idle called without lock held").
I added two new fallback flags based on error logs - after removing the #defines, blender crashed shortly after enabling GL_LINE_STIPPLE, and armagetron had some problems with GL_POLYGON_SMOOTH. (armagetron also has GL_LINE_SMOOTH, but it didn't crash when enabled, so I didn't add a separate case for it.)
Another issue is caused by projective textures (seen in armagetron). In r128RenderStart, when the routine determines from the vertex format (texture coordinate size) that software rendering should be enabled, it is already "too late" - some of the function pointer changes that _swsetup_Wakeup makes (particularly tnl->Driver.Render.Start, which is changed to _swsetup_RenderStart in _swsetup_Wakeup, which in turn would call r128SpanRenderStart to acquire proper locking) will only take effect (if at all) at the beginning of the next frame. So, once again, no locking would take place for that iteration. I added a check to see if R128_FALLBACK_PROJTEX was actually set in FALLBACK, and a call to run tnl->Driver.Render.Start if needed. This part is commented out, because I'm not sure if this is the right way to handle this; so, for now, projective textures are disabled by this patch.

Comment on attachment 13855[details][review]
patch to fix various span locking issues
I tried to run more games after applying the patch, and some of them terminated with a segmentation fault and/or put multiple "r128_cce_* called without lock held" messages into the kernel log. So it's not working properly.

Created attachment 24949[details][review]
span locking patch
The previous patch left three calls to the software rasterizer without locking (these are called when both hardware accelerated and software rendered primitives are drawn), this one wraps them correctly.
Texture projection is still commented out, because not too many OpenGL applications have an option to disable it (if I remember correctly, armagetronad recently had this setting removed), and rendering these kind of primitives in software hits the frame rate quite hard. Since only the first and second texture coordinate components are left in place, leaving it this way (=unhandled) can cause rendering artifacts (some texcoord transformation magic could solve this, but it could only occur at the vertex level, and the texture would still be linearly interpolated pixel-wise).
The attached patch applies to git commit dc1153ce83041a397b1d1815db4133ce8c53eaa1 and mesa_7.4-0ubuntu3 source package from the ubuntu jaunty repository, but was tested only with the latter; to test, start a free flight in gl-117, leave the camera in first person mode and fly your plane into the Sun :)

Created attachment 27576[details][review]
patch to add support for projective textures
Enables hardware-accelerated rendering of primitives that have vertices with 'q' texture coordinates used in projective texture mapping. It must be applied after attachment 24949[details][review].
The patch is generally based on the "ptex hack" found in the S3 Savage sources (which uses the standard vertex format, then shuffles values around to their correct position just before sending the vertices to the DMA buffer), modified to support two sets of texture coordinates; the vertex layout used by the r128, along with the additional vertex format flag needed, were obtained from earlier Mesa/DRI sources (eg. [1][2]). The performed calculations are essentially the same in both cases (savage & r128).
Both patches apply against the mesa_7.4-0ubuntu3.1 source package and git commit eb33c0ab8b3594f0b1d58534a13a26e3fb050cff, however only the former has been tested.
(ps. I'm unsure which vertex size the SAREA structure should store, but I also couldn't find any piece of code where that member gets referenced, so it may not matter either way.)
[1] http://www.koders.com/c/fid0D7829772F2499BEA37AC17768976CD64AD3459C.aspx?s=rhw2#L62
[2] http://www.koders.com/c/fidDF4ACA3257B4F85D79E4E734EA089EC50D1DBB7F.aspx#L79

Comment on attachment 27576[details][review]
patch to add support for projective textures
Well, apparently it *does* matter, or something else is amiss, because after further testing I managed to hardlock the machine and got a few related segfaults in the process. I will try to look into this as time permits.

Guys, thank you for all your work here. I also spent good times hacking r128 drivers.
Unfortunately, I gave the video card away four years ago, so I can't participate on this bug anymore.
Closing. *** drying nostalgic tear ***