Arthur's dev blog

How I optimized HTML Renderer and fell in love with VS Profiler

Managed HTML rendering is a pain in the ass for a really long time, a really good solution I was able to find is this HTML Renderer hosted on CodePlex, though the project seems to be dead and I'm not sure how I can contribute… Anyway, It's 100% managed code and has nice HTML 4 and CSS level 2 spec cover.
So I decided to check it out and after a little playing around it felt a bit heavy so I decided to look into its performance using built in visual studio profiler, to ruin the ending I have managed to reduce average render time from 282 msec to 24 msec (91%).
The final code with most of the issues fixes raised here and here can be found here, you can use it under the same license the original code is under. Also, if everything goes well, I might be adding features to the project including text selecting for copy-paste support.

The setting:

The source comes with a nice demo project that has 12 html showcasing the ability of the renderer, so I used those as they are complex enough and use what should be all the renderer has to offer. A simple 12 iterations loop rendering 12 different htmls, 144 total, using 'Stopwatch' to measure the time, using .NET 3.5.

Baseline

Without changing anything from the downloaded source run the test:Total: 40,679 mSec (the total time it took to render 144 htmls)SingleAvg: 282.49 mSec (total / 144, the average time it took to render single html)

The project is compiled and executed so the main form of html renderer appears, in the background the profiles samples the running application.
Hit the test button, wait for it to finish and close the form (or you can make it run automatically).
VS has detected the app is closed finishes profiling, do some calculation and shows the "Sample Profiling Report".
The profiling report data visualization is quite powerful with 10 different views, range filter, compare and more. I won't go into all it has but focus on what I have used.

In the "Summary" view there is two super useful visualizations "Hot Path" and "Functions Doing Most Individual Work":

Clicking on any method navigates to "Function Details" split view showing the calling and called function with respected costs. In the bottom of the view there is the actual code with the hot path highlighted in red and other expensive code highlighted in yellow:

Those two views is the only thing I needed to know what code needed optimization.

Optimization iterations

Basically each optimization iteration is:
1. Run the profile
2. Find slow code
3. Improve it
4. Run the test to get the improved time
5. Back to 1

Here we go, the iteration I did with little explanation on the optimizations:

Cached regexes (66% improvement)

Total: 13,885 mSecSingleAvg: 96.42 mSec
Html renderer uses regex to parse CSS and HTML string into "CSS blocks", there are about 14 different regexes used for this. But for each use a new 'Regex' instance was created and 2/3 of the time the CPU was busy in 'Regex' ctor. A simple cache to create each regex only once is responsible for this huge save.

Removed reflection default value init (72% improvement)

Total: 11,345 mSecSingleAvg: 78.78 mSec
Html renderer support 72 different CSS style properties! To save coding the developer used attributes to set default value for each property and then reflection code the set the values on runtime. Removing reflection with 72 'set' lines of code did the job.

Font cache and Regex (75% improvement)

Total: 10,109 mSecSingleAvg: 70.20 mSec
Replaced one of the regexes with simple 'IndexOf' style code and added font cache so not to create new font object for each CSS box. Replacing the regex made the code a bit more messy though.

Font size cache (83% improvement)

Total: 6,799 mSecSingleAvg: 47.21 mSec
Apparently calling 'GetHeight' method of 'Font' class is quite expensive. Added cache of font object to the height of the font, this cache with the fonts cache work nicely together.

Remove more reflection (86% improvement)

Total: 5,566 mSecSingleAvg: 38.65 mSec
There was more CSS properties reflection code to handle inheritance and merging between different CSS boxes. Replacing the reflection code with long switches made the code a bit less nice to the eye but the performance gain worth it.

Remove more regex (87.5% improvement)

Remove empty paint background (91% improvement)

Total: 3,625 mSecSingleAvg: 25.17 mSec
Final significant code optimization, the paint background code has inefficiency in executing 'FillRectangle' even if the background color is empty, ignoring those did the trick.

Build in release (91.6 improvement)

Total: 3,395 mSecSingleAvg: 23.57 mSec
Not my optimization, nice to see that the compiler can pitch in.

.NET 4.0 (92.5 improvement)

Total: 3,026 mSecSingleAvg: 21.01 mSec
Just for fun wanted to see if CLR 4.0 has something nice in store, 10% optimization just by changing the target framework, very nice.

It's not real unless it's in a chart:

Know when to stop

Obviously I didn't have any desired performance goal I wanted to reach, I don't really believe developer can really have those, I stopped when the code performance felt right for what it does and the profiler showed code that was either not easy or impossible to optimize.
I do believe Html Renderer can be optimized more as its algorithm breaks the html into words, computes layout and draws each word separately. Merging words into block by style will probably improve performance but it will require going much deeper into the code and gaining 10 msec for that is just not worth it.

Summary

Visual Studio build-in profiler is really nice, not as powerful as PerfView , but the simplicity and code integration is a real treat. Additionally it’s the only profiler (except PerfView) that didn't crashed or hang when I tried to profile Outlook add-in.

Simple code changes can have huge performance impact, we all know that regex and reflection are expensive so we should think about it when we using them.

There is no alternative to running profile to find bottlenecks, it should be part of regular developing cycle, and with the nice VS profiler its easy.

Improving average html render from 282 msec to 24 msec (91%) is really nice and it was fun doing it.

2 comments on “How I optimized HTML Renderer and fell in love with VS Profiler”

[…] they were bugging me, also I was hoping it will lead to some performance improvements as from my last optimization session text draw/measure were the most time consuming operations. The resulted improvement exceeded my […]

[…] they were bugging me, also I was hoping it will lead to some performance improvements as from my last optimization session text draw/measure were the most time consuming operations. The resulted improvement exceeded my […]