Signing existing PDF files in LibreOffice

TL;DR: see above — it’s now possible signing existing PDF files and also
verify those signatures in LibreOffice 5.3.

The problem

LibreOffice already made it possible to digitally sign PDF files as part of
the PDF export, so in case you had e.g. ODF documents and exported them to
PDF, optionally a single digital signature could be added as part of the
export process. This is now much improved. First, thanks to the Dutch Ministry
of Defense in cooperation with Nou&Off who made this work
possible.

A user can already use an other application to verify that signature or sign
an already existing PDF file. The idea is to allow doing these from inside
LibreOffice, directly.

Results

As it can be seen above, now the Digital Signatures dialog not only works for
ODF and OOXML files, but also for PDF files. If the file has been signed, then
the dialog performs verifications of that signature. Signatures are also
verified on opening any signed PDF file.

I’ve also extended the user interface a bit, so that signing an existing PDF
file is easy, similarly how exporting to PDF is easier than exporting to a
random other file format. There is now a new File → Digital signatures →
Sign exiting PDF menu item to open a PDF file for signing:

When that happens the infobar has a dedicated button to open the Digital
Signatures dialog, and also going into editing mode triggers a warning dialog,
as going read-write is not needed to be able to sign a document:

And that’s basically it, after you open a PDF file in Draw, you can do the
usual digital signature operations on the file, just like it already works for
previously supported file formats.

Details

What follows is something you can probably skip if you’re a user — however if
you’re a developer and you want to understand how the above is implemented,
then read on. ;-)

PDF tokenizer

The signing feature in ODF/OOXML is implemented by working directly on the ZIP
storage in xmlsecurity/. This means that in the PDF case it’s necessary to
work on the PDF file directly, except that we had no such PDF tokenizer
ready to be used.

Code under xmlsecurity/source/pdfio/ now is such a tokenizer that can
extract info from PDF files and can also add incremental updates at the end of
the file, this way we can make sure adding a signature to a file won’t loose
existing content in the file. This is fundamentally different form the usual
load-edit-save workflow, when we convert the file into a document model, and
work on that.

Verification of signatures

Previously LO was only able to generate signatures, not verify them. I’ve
implemented PDF signature verification using both NSS and CryptoAPI, so all
Windows, Linux and macOS are covered. I have to admit that the initial verification
was much easier with CryptoAPI. Until I hit corner-cases, I could use an API
that’s well-documented and is higher level than NSS. (I don’t have to support
different hash types explicitly, for example.)

When I added support for non-detached signatures, that changed the situation a bit:

1 file changed, 15 insertions(+), 11 deletions(-)

was the NSS patch, and

1 file changed, 104 insertions(+), 8 deletions(-)

was the CryptoAPI patch.

Signing existing files

Signing an existing file means tokenizing a document, figuring out how an
incremental update should look like for that file, writing an incremental
update that has a placeholder for the actual signature (a PKCS#7 blob, where
the input is just the non-placeholder parts of the document as binary data), and
finally filling in the placeholder with the actual signature.

For the last step, I could reuse code from the PDF export (modulo fixing bugs
like tdf#99327).
For the other steps, the tokenizer remembers the input offset / length for the
given token, this way it’s relatively easy to create incremental updates. You
can add new objects or update new objects in such an incremental update, and
this source tracking feature allows copying even the unchanged parts of
updated objects verbatim.

PDF 1.5+

Everything becomes a bit more complicated once I started to handle not only
LO-generated PDF-1.4, but also newer PDF versions. I think this is important,
as Adobe Acrobat creates PDF 1.6 by default today, which has a number of new
features (I think all of them were actually introduced in PDF-1.5) that
affects the tokenizer:

xref stream: instead of an ASCII xref table ("table of contents") at the end
of the file, it’s now possible to write the binary equivalent of this as an
xref stream. Because the binary version can describe more features we must
also write an updated xref stream (and not an xref table) when the import
already had an xref stream.

object streams: it’s now possible to write multiple objects inside the
stream section of a single object in binary form. The tokenizer is necessary
to be able to read these objects and also roundtripping (source tracking)
should work not only with physical file offsets, but also inside such
compressed streams where the offset is no longer just a number inside the
input file. (It’s OK to write the updated objects outside object streams,
still.)

stream predictors: this is a concept from the PNG format, but also used in
PDF when compressing the xref stream. See the spec for the gory details, but
in short it’s not enough that instead of plaintext you have to deal with
binary compressed data, you also have to filter the data before actually
parsing the file offsets, and the filter is defined not in terms of object IDs
and file offsets, but in terms of adjacent pixels, since it’s documented in
the PNG spec. :-) (To be close to the Adobe output, we also apply such
predictors when writing compressed xref streams.)

User Interface

In addition to be UI changes already mentioned above, one more improvement I
did is that now the Digital Signatures dialog has a new column to show the
signature type. This is either XML-DSig (for ODF/OOXML) or PDF.

Testing

I’ve added an integration test in the existing
CppunitTest_xmlsecurity_signing to have coverage for the small new code that
calls into xmlsecurity/ from sfx2/ in case of PDF files. But fortunately
because all other code in xmlsecurity/ was new, I could do unit testing in
CppunitTest_xmlsecurity_pdfsigning for the rest of the features.

Needless to say that invoking the PDF tokenizer + signature creator/verifier
directly is much quicker than loading a full PDF file into Draw, just to see
the signature status. ;-)

Summary

If you want to try these out yourself, get a
daily build and play with it! This
work is part of both master or libreoffice-5-3, so those builds are of
interest. Happy testing! :-)