xserver:
More Xati hacking. Fixed the driver to allow acceleration of copies between pixmaps of matching bpp, not destination pixmaps that match the screen bpp (which meant broken copies when source bpp != dest bpp). Fixed r128's alignment of pixmaps, which had resulted in some corruption. Now I can't produce corruption on Xati, except for a very brief moment when opening menus in mozilla with xcompmgr running. It's the same in Xvesa, and may be something fixed in trunk.

I've also spent time working on getting the Composite solid-fill case accelerated using the driver's solid fill hook that already exists. XFree86 code helped out quite a bit here, since I'm relatively unfamiliar with the internals of this stuff. It appears to be working, but I'm going to wait to commit until I can get some review or more thorough testing, I think. Then I'd like to do cleanups and merge my DRI branch, where I've been doing this work, to trunk. While in the area, I caught a bug I had introduced to kaa when starting on Xati which wasted 0 to 75% of offscreen memory depending on bpp.

I've also spent more time the last few days on SiS and Radeon Composite acceleration using 3d, but to no avail so far.

xserver:
Committed r128 Composite acceleration support for blending with no mask, no transform, and no repeat. It was a diff bounced around between myself and andersca, with andersca having written some of the most important bits. Unfortunately this doesn't cover any of what xcompmgr wants to do.

Took a look at whether xvideo could be done using the scaler like the Composite accel is. Although the scaling parts should be OK, it looks like the formats supported by the hardware don't match the formats desired by Xv. Or I'm failing to understand YUV formats, which is quite likely.

Spent a solid chunk of the day working on Radeon Render acceleration. I think I do have enough information in the end to do Composite cleanly on Radeon in Xati. The code is written, but nothing's being rendered yet. Not sure what's going on.

xserver:
Spent the night hacking on render accel. Helped andersca to get his diff for some specific render accel on r128 working. I've been trying to get his diff working on my r128 and cleaned up for the same test here all night, with no luck at all. I'm compositing something vaguely resembling the image onto the screen, except that it looks like the pitch is very wrong. Or something. It's definitely being blended, because I can see the root window under the scribbling outside of the test's window.

Moved back home from school for the break. It's great to be at home with family, eating good food and doing family activities. And in theory I've got more time to dedicate to hacking. However, today I had to drive my vehicle-less friend around as he tried to scrounge enough money to get his car back from the shop. I ended up loaning him $200. I hope this turns out well, and not like all the cases in all those "judge shows" that everyone else in the world seems to love to watch. I trust him, it's just what everyone else I've mentioned loaning money to has brought up. Finished off the day watching my sister at her baton performance, though, which is always fun.

xserver:
Managed to get a bit of hacking on glx done. The XFree86 code is in place in the tree, but now I need to figure out how this mess is supposed to all tie in to a DRI driver (well, I can actually answer that: it wasn't ever supposed to. I just want it to at this point).

xserver:
Continued work on render acceleration. It turns out I don't have quite enough information to complete acceleration for Radeon. I need more info on the CP, or need to make the accel code significantly more complicated and probably slower. So, I tried to see what I could do with SiS 300-series hardware. The initial attempt was a 100% 3d driver that would use the 3d hardware to implement both render acceleration and the standard solid fill and copies. Well, I wasn't getting anything rendered at all, and I realized one of the common ops (copies within a single pixmap) probably wasn't doable in 3d, so I tried the 2d hardware. That worked for copies and solid fill (for the most part, there's something wrong with synchronization I think), so I committed it. Now I need to go back to the 3d code and find what's wrong. Hopefully in the meantime keithp will fix up whatever he decided was wrong in kaa's Composite that prevented pixmaps from migrating to onscreen, and thus being possible to accelerate.

kdrive:
Spent the day working on Render acceleration for Radeons in the Xati driver. Quite a bit of time was spent just figuring out what Render's Composite operations do exactly (not to be confused with the composite extension). I think I've got it figured out, and one block of code should be able to cover the most important cases (the things used by xcompmgr, and what's used for subpixel antialiasing of fonts, along with many others). The question is what are the appropriate hooks to make to the driver -- do we make a collection of hooks for specific things to accelerate, or basically just hand off the PicturePtrs from the arguments to Composite if we manage to push the pixmaps into offscreen memory, and fall back if the driver doesn't handle it? At least for radeon, the second option will result in much less code for many more operations accelerated, at the expense of higher overhead for fallbacks. But I guess fallbacks are slow enough anyway.

Worked on my hook for the "(ARGB8888 IN A8) OVER screen" composite, one of the common ones in xcompmgr. It (as will be the case for most hardware) will be implemented using the 3d hardware and treat the pixmaps as textures. What appears to be required for most operations (xcompmgr's operations included) is non-power-of-two textures, wrapping for POT textures at least (for 1x1 textures), two texture units, and the standard GL_BLEND-type alpha blending. I suspect more ops could be accelerated using more complicated texture blending instead of GL_BLEND, and I would bet the NPOT texture requirement could be avoided by using scissoring. Anyway, most of the 3d setup is done, and I just need to set one more register and then write code to actually emit vertices. I hope. :-)

Note that this doesn't cover trapezoid acceleration, which is something that will be used by many consumers of Render (cairo, for example). It's just to get some of the very common uses at the moment (AA text and xcompmgr) accelerated.

kdrive:
Spent a lot of time with the ati driver today. Got the drm-using ATI driver working in both DMA (CCE/CP) and MMIO modes. Haven't done any benchmarks. It's just a basic conversion of the drivers for r128 and radeon from XFree86 to xserver. It's still lacking, in that the kernel isn't sleeping while waiting for the accelerator to idle, and DMA isn't used for image read/write between card memory and system memory. There are also rough edges (AGP isn't detected yet, and it's probably not working on r200s).

Applied the patch from Michel Daenzer I had worked on for fbdev backend in Xati. It's a little unstable for me when I have fbdev loaded, but I've heard rumors there are issues with radeonfb on 2.6.0-test

kdrive:
Continued hacking on the atidrm driver for kdrive today. After some very helpful input from keithp, it initializes the DRI successfully on the Rage 128 (haven't tested the Radeon yet). 2d acceleration is broken after that, but it isn't using the dma engine yet anyway so there's still work to do. Out of the 886 lines of code in ati_dri.c, I'd say around 100 are device-specific (radeon/r128 split), at least so far.

I'm stuck in the FreeBSD port of the kdrive server. I'm getting a panic in vm86 mode when using VM86_INTCALL, which I was hoping would be the easy solution. I don't understand how to use the normal VM86 mode in this situation, either.

kdrive:
More work on Xati. Added the Rage 128 PCI IDs. Fixed 24-bit accel, using a cute hack (the cards don't support acceleration for 24bpp, so you put the accelerator in 8-bit mode and multiply). Got offscreen pixmaps working, improving performance. There seems to be some bug with composite, though, which I've reproduced with Xvesa. Not sure if it's xcompmgr's fault or the server's, but in some cases things don't get redrawn when dragging a window back in from offscreen. Seen some resizing issues, too. However, dragging xterms over the top of a mozilla is beautiful -- no flicker while the window underneath refreshes. I can't wait to see what things are like when we get render accel.

Issues that I know of with it, hmm. Radeon with acceleration and 1400x1050 mode doesn't work right because of limitations in pitch -- we need to do our own mode setup (hopefully we can figure out how to do that in a clean manner) or use radeonfb for that. Lacks render accel (need help from keithp for infrastructure). Lacks xv (needs to be done using 3d hardware instead of overlay). Hardware cursor not supported (insignificant, at least with standard X cursors). Doesn't use the DRM. The DRM is my next task, fixing up that poor r128drm code I had done to be used with Xati. I have this hope, and I think it's pretty realistic, that a lot of code can be shared between r128_dri.c and radeon_dri.c, so I can have a single ati_dri.c.