Speedier export of rich text cells

Background

It was brought to our attention that the performance of saving documents to ODF spreadsheet format had been degrading quite noticeably. This was especially true when the document contained lots of what we call rich text cells. Rich text cells are those cells that contain text with mixed format spans, or text that consists of multiple lines. These cells are handled differently from simple strings internally, and have slightly more overhead than the simple string counterparts. Because of this, saving a document full of such texts was always slower than saving one with just numbers and simple strings.

However, even with this unavoidable overhead, the performance of saving rich text cells was clearly going in the wrong direction. Therefore it was time to act.

Long story short, after many days of code reading and writing, I brought it to a state where I can share some numbers.

Measuring export performance

I measured the performance of exporting rich text cells in the following steps.

Create a new spreadsheet document.

Type in cell A1 3 lines of ‘libreoffice’. Here, you can hit Ctrl-Enter to move to the next line within the same cell.

Copy A1, select A1:N1000 and paste, to replicate the content of A1 to all cells in the range.

Save the document as ODF spreadsheet document, and measure its duration.

Results

I performed the above measurement with 3.5, 3.6, 4.0, 4.1, and the latest master (slated to become 4.2) builds, and these are the numbers.

It is clear from this chart that the performance started to suffer first in version 3.6, then gradually worsened over 4.0 and 4.1. The good news is that we have managed to bring the number back down in the master build, even lower than that of 3.5 which I used as the point of reference. Not just slightly lower, but much, much lower.

Tested your test on 4.0 and it took 74 seconds on my little bit older PC (without any other task on PC running at the same time – checked Windows Task Manager). So you got to run this test on some recent computer. Could you write some PC specifications you run tests on.

Yes, I know this… I just looked at the graph v4.0 were is 39 seconds and compared to my v4.0 were is 74. So I am comparing the same version of software, so wondering what was the PC the test were run on.

Thank you for the great work. I just got bitten by this bug today and am happy I could download a nighty. On OS X, there is a significant improvement with the nighty. Save times went down from around 10 to only 2 minutes.

The first 85% of the progress bar is completed withing a second but then the last part still takes minutes.

If you are interested, I could share with you my data. It is semi-confidential and as it contains email addresses I do not want to disclose it on any public forum.

File size: ~300K
1 sheet, 1910 rowns and 26 columns. No formulas. (Yes, I should have used a DB instead)

I also don’t like Times New Roman as default. I have to change this font to Arial where ever possible. Special don’t really understand why this font is default inside tables, obviously difficult to read data with this font…

I would say both. The algorithm was not optimal, plus it was doing too many memory allocations and deallocations that were entirely unnecessary. The whole design of the code in question was not architected with performance in mind.

Windows 8.1 / Office 2013 user here. I keep a tab on open-source software because change may be coming. Also, I often install LO on family and friends’ computers. I read the release notes for version 4.2 and wanted to thank you for your many contributions. Thanks to you, and all the other volunteers, LO is making strides! Keep it up!