If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Comment

- what performance-related features are enabled in r600g relative to r300g at this time ?
- what different design decisions were made between r300g and r600g ?
- are there differences in hardware architecture between the pre-r600 and 600+ generations which could require different design decisions, and were those decisions made ?

well, that's what i meant (sort of)... maybe i should dress in black to see if my questions would be sharper

I hope you didn't take my post as an offense of any kind, but i'm really interested in those low-level subjects, i just don't have the skills to help

Be sure, i'm thankful for your work and your insightful posts on this forum

There are some tools but they don't generally help as much as you would expect. GPU driver programming involves very long pipelines with many things going on at once, so things that run slow are frequently related to code that ran some time earlier. Performance work usually ends up more like :

- stare at all the things you mentioned
- get an idea
- rewrite a bunch of code and see what happens
- repeat until you have to work on something else

That said, I believe the main work right now is finishing the enablement of "known" performance-related features, ie the ones which are enabled in r300g but not enabled in r600g. In most cases I think code exists and works on many configurations but not enough to enable by default yet.

Comment

@Bridgman,
Ah I remember that 1:1 compiler post on Phoronix a long time ago that made the r600 actualy work. But that was an ugly hack, right? And this hasn't been fixed?! Isn't this top priority work? =o

Comment

There are some tools but they don't generally help as much as you would expect. GPU driver programming involves very long pipelines with many things going on at once, so things that run slow are frequently related to code that ran some time earlier. Performance work usually ends up more like :

- stare at all the things you mentioned
- get an idea
- rewrite a bunch of code and see what happens
- repeat until you have to work on something else

That said, I believe the main work right now is finishing the enablement of "known" performance-related features, ie the ones which are enabled in r300g but not enabled in r600g. In most cases I think code exists and works on many configurations but not enough to enable by default yet. [emphasis mine]

You mean like HyperZ? What about rendering using both GPUs of a dual-GPU card? Crossfire? These things would probably help a lot, but only if we eliminate the most severe of the existing CPU bottlenecks. The biggest issue is that the pipeline stalls for a very, very long time; it's not that the GPU has any problem handling the requests it does get.

Other problem I see is that certain workloads (some 3D apps) are constantly throwing errors inside DMAR / DRHD subsystems -- several times per frame. The kernel is protecting itself from data corruption and potentially system-crashing issues by detecting DMA remapping faults -- so that part of the kernel is doing its job. But clearly DRM is not doing its job, or the faults wouldn't occur in the first place.

While the fault prevention of DMAR/DRHD is great, the downside is that each fault is very expensive. It ends up generating an interrupt each time. If this is happening dozens of times per second, then no wonder we're getting crap FPS.

I wouldn't be surprised if several of the programs Michael tested in this article have brought out this behavior. It seems to only occur with mesa 7.11-dev, which he's using. Revert back to mesa stable, and although you lose a lot of features, mesa doesn't use libdrm in such a way as to trigger these constant faults, so the stack is much less preoccupied with handling a constant stream of invalid DMAR requests, and FPS wins. This appears to be closely related to the IOMMU.

In fact I remember FPS being much more competitive in past articles Michael has written pitting r600g against Catalyst. I would be surprised if this particular problem isn't to blame for several of the tests Michael used.

Edit: (this editing thing is cool!) -- Then again, maybe Michael doesn't have the same problem. You can clearly see whether you have the problem by looking for this in dmesg while rendering (the numbers are irrelevant as this is just an example):

From my (limited) understanding here, the IOMMU actually resides on the motherboard chipset, so it is not directly controlled by either the CPU or GPU manufacturer. Therefore maybe this is an isolated problem that is only buggering up on my specific motherboard chipset. That's entirely possible; I have an early first-generation Intel X58 chipset (ASUS P6T Deluxe v1 is the specific make/model). It was the first enthusiast / desktop Nehalem Architecture motherboard to market. With an Intel CPU and an AMD GPU, who knows if I've just got bad luck and the IOMMU hardware on the mobo doesn't perform to spec?

Regardless of whether or not that's true, I must insist that this is an issue that can be handled in software. Otherwise they would probably recall the motherboard, and the Catalyst drivers wouldn't work properly for me on Windows or Linux. As it stands, I can play all the big AAA titles on Windows just fine, so maybe the Catalyst team already discovered and squashed this bug.

Or I'm way off the mark and the hardware is fine but there's just a software bug in DRM. Sorry for over-speculating.

Comment

- stare at all the things you mentioned
- get an idea
- rewrite a bunch of code and see what happens
- repeat until you have to work on something else

That said, I believe the main work right now is finishing the enablement of "known" performance-related features, ie the ones which are enabled in r300g but not enabled in r600g. In most cases I think code exists and works on many configurations but not enough to enable by default yet.

I just wish for the driver to deliver some usable performance. :/
Its seems I am the only person in universe who wants performance opensource drivers on 2gb 5870 or equivalent amd card. :/
Even if I switched to gtx260sp216, every time I see awesome amd hardware, I have heart ache. :/
When I see how slow opensource driver is, I nearly have infarction ://

Comment

Wow, the HD4670 Gallium3D performance has regressed hard since the last batch of tests! The Catalyst performance on this ASIC has stayed rougly the same. Could this be due to what allquixotic was talking about in comment #16?

Comment

again a useless comparison. my card is way faster tha in the test. vsnc colortiling swap buffers and so on are disabled. it does not matte what is current standard. it does matter whats acually possible with the cards and driver stack. double or triple the r600 bars and u get real results

I too thought that a hd 4670 was way faster a few weeks before than what I see here.

So here is what to benchmark next on phoronix: How much speed improvements the mentioned features bring, both individually and combined.
(color tiling, page flipping, swapbufferwait off etc.).

Comment

Similar studies have been done before and the conclusion is always the same. Having a higher percentage of the developers dress in black leads to better drivers.

snip!

- the shader compiler for r600+ started off as more of a 1:1 IR-to-hardware translator than a real compiler, although recent work may have improved that a lot (haven't had time to look)

About two years ago I bought a Sapphire 4770, it looked really good on paper (still does) but the stock Fedora sw doesn't impress me particularly, are you able to say what is the state of open drivers for this card?