SharpDevelop Community

WPF text rendering performance

We have spent one day during #d^3 trying to make the SharpDevelop text editor (AvalonEdit) faster. Give the latest version a try - it should be much faster in most of the cases.

As you might know, WPF is being rendered using DirectX. The managed part of WPF prepares a retained visual tree which is then rendered using native engine with DirectX. We, of course, do not have access to the source code and the code is native so ILSpy can not help us either. However, we can use PIX to find a little bit about what is going on under the covers. PIX captures all DirectX calls made by the application and allows the user to analyse them. Is it pretty much like a debugger for the graphics card.

Here you can see the capture right in the middle of rendering.

First of all, note that the rendering is not complete yet. PIX shows us the state of the frame buffer at the given time. The next thing to be rendered (using the DrawPrimitive command) will be "00". You can see the black and white "00" stored in the temporary surface (texture). You can see the quad which will be used to render it. Let's go through the commands that follow. Once the "00" is rendered, the colour for the following text (",") will be set (SetRenderState, SetPixelShaderConstantF). Then the surface (which lives in system memory) will be locked, filled with the bitmap for "," and unlocked. Once this is done, the surface can be copied to GPU (UpdateSurface) and finally rendered (DrawPrimitive). We move on to the next segment and so on.

The rendering is nice and simple, but it definitely is not what I would have expected. It just seems too slow.

First of all, WPF does not really seem to be rendering the text in hardware. The black and white bitmap is created in system memory for each text segment and then copied to GPU where it is copied again to the appropriate position in buffer. The GPU is using pixel shader so it does a little bit of additional processing but the main bulk of work seems to be done on CPU.

DirectX calls are expensive because they need to be passed to the driver in kernel. It is therefore important to have as few calls as possible. Game engine programmers (my day job) go to great lengths to minimize the number of such calls. And yet WPF uses 8 calls just to render single text run. The total number of calls for this particular page is 22493. It should be possible in just couple of dozen by batching draw calls together. State changes are equally evil (SetRenderState, SetPixelShaderConstantF). (Direct2D&DirectWrite does some batching)

I am really worried about the surface Lock/Unlock. GPU is usually lagging behind the CPU - sometimes even by several frames. The code modifies the first surface which is in system memory, issues a copy command from CPU to GPU, issues draw command and then tries to lock it so that it can put the next text in it. However, it can not overwrite it until the previous commands are finished, so the CPU has to wait - potentially long time. I do not know whether DirectX does some tricks to avoid this.

There will be exactly one memory copy from CPU to GPU for each text segment. It might make more sense to just copy several at a time into one big surface. It would also make sense to keep some texts around as cache so that they do not have to be copied next time again. The natural approach would be to just copy the whole alphabet into a texture and use that.

So how does this help us to make AvalonEdit faster? The most important observation is that WPF issues a lot of calls even though we have prepared every line into WPF TextLine and invalidate it only when the line is changed. We knew that creation of the line was expensive, but it turns out that just rendering it is quite expensive as well. There does not seem to be any caching.

WPF does not repaint the whole window. It only repaints the visuals that have been modified. In our case the visual is the AvalonEdit text layer so the whole text is repainted. To fix this, we have used one DrawingVisual for each line and invalidate it only when the line changes. This separation means that lines which were not changed will not be repainted. Obvious, but not so simple to achieve in WPF. The performance improvement depends on the actual text in the editor. In our (very demanding) test case we saw an improvement of factor of 20 (on the average case it will probably not be that impressive).

Here is the example of just a single line being rendered (just after the initial Clear method).

Comments

That's one of the things I don't like about WPF. It's a very powerful framework, but yet too complex and, as this example shows, not well optimized. I guess the best approach would be to just use a good old window class that paints a DirectX surface and use managed wrappers for Direct2D/DirectWrite to handle all the painting stuff on your own - a lot of work, WPF makes it easy for you but the price is control and performance (I don't have much experience in graphics stuff, so I could be wrong).

...ahhh, forget DirectX, do it in OpenGL! :D Then it will be portable.

Still even though we can't fix WPF, doing this analysis gave us some new ideas to improve rendering performance.

In SharpDevelop 4.1.0.7907, we added DrawingVisual, which makes WPF cache the unchanged text lines better (even if they are moved and need re-rendering, e.g. while scrolling) - we still aren't sure why this helps so much, as we already cached the TextLine before.

In SD 4.1.0.7908, we made sure that unchanged text lines don't get re-rendered at all (if not scrolling) - this is what the separation into multiple visuals was intended for. This gave us another performance boost, but that one wasn't as big as the unexpected boost from DrawingVisual.

And then in SD 4.1.0.7913 I changed the color of C# punctuation from dark green to black. This reduces the number of color changes in code such as "int x = a.b.c;" - with colored punctuation there are 9 TextRuns, with black punctuation there are only 2 TextRuns. As WPF is doing a separate DirectX command for every TextRun, this also boosts performance in cases where the above caching isn't useful (e.g. fast scrolling, or rectangular selection).

Together, these changes make the SD 4.1 text editor feel much faster than 4.0 or 4.1 Beta.

It's still not as fast as non-WPF editors, but the difference should

DirectWrite would be nice - almost same feature set as WPF (e.g. we get support for bidirectional text almost for free), but much much faster. Unfortunately it's currently not an option as 40% of our users are still on Windows XP.

Interesting, and nice to see some efforts in this direction. Any effort to speed up the docking library? it is just too slow and buggy - still it is close to impossible resizing auto-hide window by its splitter on 120 dpi screen when not docked.