The irregular Nouveau-Development companion

Issue for July, 7th

Intro

Hello again, this is number 23 of our TiNDC. Once again we want to give you an insight into our work.

Before we begin, a little correction / clarification on what we wrote in the last issue:

"NVidia cards offer a PCI memory object and a AGP memory object. [...] PCI object and AGP object differ in one major point: the PCI object points to a list of allocated memory pages, while the AGP points to an address and has a size."
This is not 100% correct, a better phrase would have been:

PCI and AGP objects only differ in one major point, they set bits which say "please do a PCI DMA transfer" or "do an AGP DMA transfer", both can have page lists (even the video ram object can have one). You could even use a pointer to a continuous memory chunk in the PCI object. However, it is not easy to create such a chunk for PCI and using a page list with an AGP object would kind of defeat the advantage of using the AGP aperture... (Thanks to Darktama for pointing this out).
After some time, Phoronix.com came up with a "status of Nouveau" article, describing the current status of our project and doing some tests on FC with different cards (mobile chipsets, 6600GT and 8500 GT). http://www.phoronix.com/scan.php?page=article&item=765&num=1 In the forum airlied answered a question about randr1.2 support. Currently work has suspended for various reasons and he expects to get back into action in September. Please note that the article itself has two problems:

it tests on Fedora Core with the supplied package which is very old according to ajax, the packager.

It claims working glxgears and shows a screen shot. Unfortunately, the screen shot proves that the card isn't accelerated but is using software rendering (MESA).
As there is nothing else noteworthy to say in this section this time , let's get going with ...

... The Current status

After Darktama's updates to the drm pmdata noticed lockups. Both tried to find out the reasons for this lockup and shortly after the last issue was published Darktama noticed some faults in the NV1x code and issued a patch for pmdata to test. Pmdata reported back one day later that it worked.

With this problem eliminated, darktama pushed all his changes on Friday, 29th to the repository and finally breaking the API and much fun was had.

pmdata still saw PGRAPH errors when running glxgears on his NV15 but this seemed to be related to incorrect init of the card's context. After analyzing a MMio dump he realized that the driver must wait for the current operation to finish before context switches can be done. He created a small patch to try out and reported success (http://people.freedesktop.org/~pmandin). The patch should probably be cleaned up a bit (it could theoretically lock up) before inclusion. But context switches now work as can be seen in this screen shot:

a) eliminate some of the older EXA bugs we already covered in earlier issues.

b) give the whole thing a speed boost.
Technically, we saw some clean ups like using notifiers for (the end of) DMA transfers and a slightly better ?UploadToScreen function (EXA).

EXA will work for all other cards too, but will mostly use software fallbacks and no hardware based acceleration. For NV3x there should be not much to do to get EXA accelerated though. Earlier cards however will need their own implementation.

Downside of this patches is that even 2D rendering may now not work on NV41, NV42, NV44,NV45, NV47,NV48 and NV4C because the init voodoo is missing. In this case please report back in #nouveau and be prepared to do an MMioTrace.

Just when we thought that EXA was yet another sub system we had checked in bug free (ahem), AndrewR came in with bug report 11425 (http://bugs.freedesktop.org/show_bug.cgi?id=11425). Other reports trickled in too and lead Darktama to a bunch of further fixes which addressed all then known display issues. Initial feedback on that changes have been good so far.

Ahuillet first patch got Xv working for AGP with DMA, most people who tested it reported success. The CPU consumption seems to drop in the best case from 80% (Xorg and mplayer combined) to 20%. This case was extreme, but still some enhancement was seen in every test case.

Coming back after his tooth surgery ahuillet was poised to get PCI DMA working. This drive was aided by Airlied, darktama and marcheu by given ahuillet the information he needed to proceed and understand the task at hand. First tries and patches lead to crashes on his card though.

First tries were greeted with DRM errors due to bad parameters for the DMA object allocation. Later on X crashed / hung although the system itself kept going. After some diagnosis work with Marcheu it seems that the kernel space addresses for PCI were not correctly mapped to user space addresses. So user space and kernel space were using different addresses leading to confusion. After his fixes, PCI DMA started working as we can see in these screen shots:

[[!img http://www.ping.de/sites/koala/TiNDC_23/pci_dma_Xv_1.png] [[!img http://www.ping.de/sites/koala/TiNDC_23/pci_dma_Xv_2.png]
This version is using the blitter (as otherwise we would see only a blue rectangle instead of the movie). If you compared the performance of Xv on nouveau to the performance on nv, nouveau seems to be much slower. Left the first running version and to the right a few hours later.

So next up on ahuillet's todo list was a performance comparison of nv and nouveau regarding PCI / AGP DMA plus blitter / overlay.

This investigation wasn't done until 05.07.07 and it showed that what was much slower were the DMA transfers! AGP and PCI DMA made Xv much slower than CPU copies. Interleaved blitting and DMA (for the next frame) didn't gain us much but getting rid of the temporary buffer allocation gained a little bit. A more complete overview can be found here: http://people.freedesktop.org/~ahuillet/xv_benchmarks.pdf Still even the best case is clearly slower than the nv version. Ahuillet expressed dismay but started with further investigations.

Darktama's patches to the DRM and DDX (nv50-branch) were tested on another G84. They worked albeit very slow in both default and "?MigrationHeuristic greedy" mode. According to Darktama that was to be expected, because no acceleration is used to speed up rendering on G8x yet.

The createdump.sh utility now checks whether libSDL exists on the target system. However pq rightly complained about the test using locate (not up to date, possibly not avaible) and on what the script checked (libSDL.so). The script was changed to check for sdl-config.

pq decided it was time to get his NV2x card working with glxgears. Based on the patch by pmdata he traced MMioTrace results and accidently fixed some bugs by fiddling with reading the PTIMER registers. You can find the current patches here: http://jumi.lut.fi/~paalanen/scratch/text_n_timer/

Getting brave and trying to get a second glxgears running, pq got a hard system lockup (no serial IO, no SYSRQ, no ssh, simply dead). As this was already working in January (with static garbage in the window), there is still a lot of tracking and debugging to do. Next on pq's agenda was the decoding of the error messages from DRM to a human readable form. A day later, pq reported success.

No news about the Software Freedom Conservancy admission yet.

Help needed

As noted above, we need MMioTraces for NV41, NV42, NV44,NV45, NV47,NV48 and NV4C. Please make yourself known in our channel if you can help us out.

If you don't mind, please do test ahuillet's patches at git://people.freedesktop.org/~ahuillet/xf86-video-nouveau and give him feedback. However, be prepared for problems, misfeatures and crashes as this is definately a work in progress!