If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

"A VMA cache appears unavoidable thanks to compiz and an excruciatingly slow GTT pagefault, though it does look like it will be ineffectual during everyday usage. Compiz (and presumably other compositing managers) appears to be undoing all the pagefault minimisation as demonstrated on gen5 with large XPutImage. It also appears the CPU to memory bandwidth ratio plays a crucial role in determining whethergoing straight to GTT or through the CPU cache is a win - so no trivial heuristic."

im wondering what Chris means and implies here ?, is he saying that the compositing managers are stealing all the CPU cycles gains because they are simply not being benched and re-factored to minimise their overall impact often enough!

Comment

"A VMA cache appears unavoidable thanks to compiz and an excruciatingly slow GTT pagefault, though it does look like it will be ineffectual during everyday usage. Compiz (and presumably other compositing managers) appears to be undoing all the pagefault minimisation as demonstrated on gen5 with large XPutImage. It also appears the CPU to memory bandwidth ratio plays a crucial role in determining whethergoing straight to GTT or through the CPU cache is a win - so no trivial heuristic."

im wondering what Chris means and implies here ?, is he saying that the compositing managers are stealing all the CPU cycles gains because they are simply not being benched and re-factored to minimise their overall impact often enough!

No, it is a limitation in how the rendering is split between X and the DRI compositor. In order for all rendering performed by X to be seen by the compositor, the ddx must flush its queues before broadcasting the damage to clients. The ddx only knows when X is about to reply to a client, but we don't know if we're sending a damage report so we need to assume the worst and flush the rendering before every reply to any client. This means that when a compositor is in use, or more generally when we have exported GEM buffers to other DRI applications i.e. games, the ddx can only batch little amounts of rendering and so throughput suffers and cpu overhead increases. In this particular instance, PutImage is buffered onto a system copy of the pixmap and normally flushed in time for vblank, however with a DRI compositor we end up flushing the pixmap after each call to PutImage, causing many more small uploads rather than one big one. Prior to the commit, the GPU buffer would be mmapped on each upload. The commit introduces a caching scheme so that those mappings (which themselves are a precious resource and have costs associated with keeping them open) are preserved between uploads.

This is also one of the major changes inherent in the design of Wayland; the clients push the damage to the compositor without any unnecessary round-trips, updates are always atomic, fast and only when required.