OpenSCAP XSLT performance improvements for faster SSG builds

As I contribute more and more patches to SCAP Security Guide I got increasingly frustrated with the build speeds. A full SSG build with make -j 4 took 2m21.061s and that’s without any XML validation taking place. I explored a couple of options how I could cut this time significantly. I started by profiling the Makefile and found that a massive amount of time is spent on 2 things.

Generating HTML guides

We generate a lot of HTML guides as part of SSG builds and we do that over and over for each profile of each product. That’s a lot of HTML guides in total. Generating one HTML guide (namely the RHEL7 PCI-DSS profile from the datastream) took over 3 seconds on my machine. While not a huge number this adds up to a long time with all the guides we are generating. Optimizing HTML guides the first thing I focused on.

I found that we are often selecting huge nodesets over and over instead of reusing them. Fixing this brought the times down roughly 30%. I found a couple other inefficiencies and was able to save an additional 5-10% there. Overall I have optimized it roughly 35-40% in common cases.

During the optimization I have accidentally fixed a pretty jarring bug regarding refine-value and value selectors. We used to select a big nodeset of all cdf:Value elements in the entire document, then select all their cdf:values inside and choose the last based on the selector. This is clearly wrong because we need to select the right cdf:Value with the right ID and then look at only its selectors. Fixing that make the transformation faster as well because the right cdf:Value was already pre-selected.

EDIT: I found more optimization opportunities, latest data as of 2016-08-10:

real 0m3.399s
user 0m2.986s
sys 0m0.409s

I won’t be redoing the entire test-suite and all the graphs but the final savings are much better than it shows in the graph. Generating all RHEL7 SDS guides takes less than 2 seconds on my machine after the optimizations.

Transforming XCCDF 1.1 to 1.2

It took 30 seconds on my machine to transform RHEL6 XCCDF 1.1 to 1.2. That is just way too much for a simple operation like that. Clearly something was wrong with the XSLT transformation. As soon as I profiled the XSLT using xsltproc --profile I found that we select the entire DOM over and over for every @idref in the tree. That is just silly. I fixed that by using xsl:key and using the very same @idref to element mapping for all lookups. This saved a lot of time.

The numbers were similar for the RHEL7 XCCDF 1.1 to 1.2 transformation.

Final results for the SSG build

I started with 2m21.061s and my goal was to bring that down to 50%. The final time on my machine after the optimizations with make -j 4 is 1m4.217s. Savings of roughly 55%. Most of those savings are in the XCCDF 1.1 to 1.2 transformation that we do for every product.

The savings are great on my beefy work laptop (i7-5600U) but we should benefit even more from them on our Jenkins slaves that aren’t as powerful. I have yet to test how much they would help there but I estimate it will be 10 minutes for each build.

Correctness

When I suggested to deploy these improvements on our Jenkins slaves, Jan Lieskovsky brought up an important point about correctness. We decided to diff old and new guides and old and new XCCDF 1.2s to be sure we aren’t changing behavior. Please see the attached ZIP file for a test case I created to verify that we haven’t changed behavior. During the process of creating this test case I discovered that I have accidentally fixed a bug mentioned above 🙂 To silence the diffs I have introduced just this bug into the new XSLTs I used. This made the performance slightly worse so keep that in mind when looking at the numbers.

UPDATE: Jenkins build times (2016-08-12)

Here is a graph of Jenkins build times, you can see how the build times gradually went lower as optimizations got onto the Jenkins slaves. There are occasional build time spikes caused by load when multiple pull requests were submitted at once but overall the performance has been improved.

I cannot seem to find an answer anywhere but when running OpenSCAP scan on my Oracle Linux 7 build (using the RHEL 7 STIG profile) I get quite a few (122) NOT SELECTED items. Looking at the actually scan things as basic as checking “World Writable Files on the system” was listed as Not Selected which I am almost certain is a STIG check. Any help would be appreciated.