Week 29: Performance and bugs

This week I continued to work on fixing Krita. I started with outlines. When you fix something, you break something other sometimes. It happened this time. Duplicate brush known as clone tool in Gimp is somehow special regarding the outlines. It is showing outline even if you are not in outline mode. It is also the only brush engine, that needs to know the position of the input device (mouse/tablet) and my new API with QPainterPath did not included this information. I fixed it, changed the API and also use the position of the device for other brush engines. That resulted in more simple code in the freehand tool. Also I simplified the way you code outlines. Before you had to specify the area of the outline in separate function so that it is erased correctly when redrawing the canvas. Now I use bounding box of the QPainterPath. One bonus of my work is this one: When you wanted to duplicate some area, first you have to setup origin point with CTRL+click. This was outlined by cross. Now it is outlined by the same outline as your brush. You can more clearly see what you are duplicating.

You can see the duplicate area instead of just cross

Then I had to fix outlines for predefined brushes. That is brush which consists of brush mask saved in format like gimp brush or adobe brush and is stamped as you paint.When you scaled with the brush in the preference, the outline was wrong. Now it is fixed.

Then I spent one day just with profiling the predefined brushes. Various problems occured. The benchmark with QtTestLib did not work correctly. First it ignored the command line argument -iterations n. I needed that one because I wanted to profile just one iteration. QBENCHMARK_ONCE was not solution in that case. The benchmark time was not real. Then Sven Langkamp helped. He was asking if I check if the resource has been loaded (grb/gih brush in my case). I did not, I thought that the resource manager will not continue until the brushes are loaded. And that was the mistake. The code continued. So I added 3 seconds sleep to the benchmark so that I wait for the manager to load the brushes. Then I was able to valgrind my benchmark and provide the log.

We use QImage in Format_Indexed8 as storage for gray-scale masks. We need to scale them and interpolate between them on-the-fly as you paint your stroke. And that was slow.
I started with simple fix of the performance for the case when the size of the brush is constant. Our sensors had curve which was sampled with wrong count of samples, and then some rounding errors have showed. This just avoided a little scaling and improved the performance a little.

Cyrille Berger then pointed out that scaling code is slow due to calls to QImage::width, QImage::height and QImage::scanline. I noticed that too in the log but I did not pay attention as I thought QImage::width would just return m_width or something. I was wrong. It is doing also one check. I cached it and it was success! The performance has improved, the speedup was 1.33 (The benchmark dropped from 16 seconds to 12 seconds).

Regarding scanlines, QImage::scanline calls detach due to implicit sharing of data. We had some code to access single pixel like

quint8 valueAt(){
return m_qimage.scanLine(y)[x];
}

in scaling and interpolation code and that is not very good for performance. First I though that I will cache the pointer to the last accessed line and line below. In scaling code we access 4 pixels, 2 are on line and 2 are on line below. It was not bad, but still the result was not good. Then I spotted QImage::bits(). I decided, let’s try to cache the pointer and access the data in old way dataPtr+(y*width+x). We know the format, so it is safe. It did not work. I was surprised, the image was somehow distorted. Solution was QImage::bytesPerLine(). I did some little test and the conclusion was that QImage::bytesPerLine() != QImage::width() for QImage in Format_Indexed8 (one pixel – one byte). Caching the pointer to the data and accessing the pixel like dataPtr+(y*bytesPerLine+x) is great performance solution. The speedup was ~2.2. Test dropped from 16 seconds to 7.3 seconds.

I managed to fill the target of the week, speed up predefined brushes. I decided that one day I will spent on drawing lines routines. I thought that I will focus on optimizing the way how we draw the QPainterPath, but then I got source code of scanline algorithm for drawing lines with nice endpoints from my cousin, who is great graphics coder. The code was written in Delphi long time ago. I ported it to Krita API with his permission. The direct port was little slow, it drown 100 lines on 4096×4096 in 4.5 seconds. I optimized it and my time was 2.5 seconds for 100 lines. It is still slower then KisPainter::drawThickLine() which has ugly endpoints, this routine is good for drawing 1-2 pixel wide lines as it is fast. If you need some wider lines, you can use the new routine. Drawing lines with QPainterPath is still the slowest way. I plan to try to optimize it when some time will left. Lines can be used e.g. in hatching or sketch brush. Thanks to my cousin for his code 😉

I noticed that our Autobrush is somehow slower compared to what we had when it was freshly optimized. I profiled and spotted that feature called randomize is not used but computed. Random generator is used and it was expensive. I added one special case condition and now the brush is 1.5x faster. It is used in various brush engines, so everybody can benefit.

In my hobby time with Krita I worked on sketch brush engine. I still has to prepare some presets, add few options and then I will blog about it more. It is new exciting brush engine and was already used for creating art!