If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Cairo 1.12.4 Brings Worthwhile Changes

10-05-2012, 03:50 PM

Phoronix: Cairo 1.12.4 Brings Worthwhile Changes

Taking a break from his crazy activity on the Intel driver and SNA acceleration architecture, Chris Wilson released today Cairo 1.12.4. There are some worthwhile changes and new features to this release making it worth the upgrade...

Comment

The xlib backend rasterizes on the CPU, but uses the GPU (via XRender and EXA) for filling, copying, and compositing (which tend to be more frequent operations). I think it may also use XRender for path rendering after tesselating to trapezoids (which isn't hw accelerated by any driver I know of now, but could be in theory).

Comment

The xlib backend rasterizes on the CPU, but uses the GPU (via XRender and EXA) for filling, copying, and compositing (which tend to be more frequent operations). I think it may atlso use XRender for path rendering after tesselating to trapezoids (which isn't hw accelerated by any driver I know of now, but could be in theory).

That sounds right, but what is the difference with the image backend?

Comment

cairo-xlib tessellates the high-level paths from the user into trapezoids and sends those to the Xserver. The ddx then rasterises the trapezoids into a mask and composites that onto the destination. Both Nvidia and glamor use trapezoid shaders to avoid rasterising with the CPU, SNA uses the same high speed scanline rasteriser as cairo-image (both try to eliminate the intermediate mask), and EXA uses the slow pixman trapezoid rasterisation routines and the extra compositing step. (For -intel the CPU is faster at generating the RLE opacity mask and sending it as geometry to the GPU than the current GPUs are at executing the branch heavy trapezoid shader. The ultimate question is whether we can tolerate using MSAA and have GPUs sufficiently fast enough...)

cairo-image rasterises directly from the general complex polygon computed for the path (convert the curves into straight lines, convolve with a pen etc). This essentially folds the two passes peformed by cairo-xlib into one and eliminates the very computationally expensive Bentley-Ottmann routine for tessellating trapezoids. On the downside, cairo-image only uses a single core (and no GPU offload) for its rasterisation. Also, more work can be done for cairo-image to process the path without requiring an intermediate polygonisation (e.g. walk splines within the scanline rasteriser, use a hairline renderer for thin pens, compute offset curves, etc).

The next step to speed up cairo-xlib would be to eliminate the trapezoids and send paths directly to X - fix the protocol to be more useful for cairo, and also coincidentally would enable separate render threads within cairo. For Nvidia, they would then couple up their driver to use their existing NV_path acceleration, and I would do something similar for SNA (as usual, look at the early experiments in cairo-drm) if the GPU was not the bottleneck.

Comment

cairo-xlib tessellates the high-level paths from the user into trapezoids and sends those to the Xserver. The ddx then rasterises the trapezoids into a mask and composites that onto the destination. Both Nvidia and glamor use trapezoid shaders to avoid rasterising with the CPU, SNA uses the same high speed scanline rasteriser as cairo-image (both try to eliminate the intermediate mask), and EXA uses the slow pixman trapezoid rasterisation routines and the extra compositing step. (For -intel the CPU is faster at generating the RLE opacity mask and sending it as geometry to the GPU than the current GPUs are at executing the branch heavy trapezoid shader. The ultimate question is whether we can tolerate using MSAA and have GPUs sufficiently fast enough...)

cairo-image rasterises directly from the general complex polygon computed for the path (convert the curves into straight lines, convolve with a pen etc). This essentially folds the two passes peformed by cairo-xlib into one and eliminates the very computationally expensive Bentley-Ottmann routine for tessellating trapezoids. On the downside, cairo-image only uses a single core (and no GPU offload) for its rasterisation. Also, more work can be done for cairo-image to process the path without requiring an intermediate polygonisation (e.g. walk splines within the scanline rasteriser, use a hairline renderer for thin pens, compute offset curves, etc).

The next step to speed up cairo-xlib would be to eliminate the trapezoids and send paths directly to X - fix the protocol to be more useful for cairo, and also coincidentally would enable separate render threads within cairo. For Nvidia, they would then couple up their driver to use their existing NV_path acceleration, and I would do something similar for SNA (as usual, look at the early experiments in cairo-drm) if the GPU was not the bottleneck.

Thanks so much for the clear and detailed explanation!
Do you happen to know how Microsoft has managed to accelerate 2d operations so effectively with the gpu? As you point out, the branch heavy code seems as if it would be a problem for them as well (I'm assuming they don't use the cpu for that).