Document conversion with Collabora Online, JODConverter and unoconv

JODConverter (for Java OpenDocument Converter) is a widely used tool that automates document conversions. unoconv is a Python tool with a similar purpose. You can read here details about why you should consider switching to JODConverter’s Collabora Online backend or talk to Collabora Online itself.

Supported formats of these tools include OpenDocument, PDF, HTML, Microsoft Office formats (DOC/DOCX/RTF, XLS/XLSX, PPT/PPTX) and many others. They can be used as a Java/Python library, a command line tool, or a web application. Newer versions have a JODConverter backend that uses Collabora Online instead of LibreOffice directly.

What are the benefits of using Collabora Online for document conversion?

Improved performance compared to startup-convert-shutdown approach

The REST API is more reliable than starting LibreOffice in server mode and communicating via remote UNO

More secure because the conversion happens in an isolated environment and this layered approach protects your infrastructure (from outer to inner layers):

It is easy to run it in a Virtual Machine / Docker Container

Document data isolation into per-document chroots

Seccomp-bpf: inside that chroot (almost) no system calls are allowed

Extremely sparse filesystem inside the chroot: no shell etc.

Benefit

JODConverter

unoconv

Collabora Online

Many file formats

Yes

Yes

Yes

Single startup cost

No

No

Yes

Standard REST API

No

No

Yes

Easy isolation into VM / docker

No

No

Yes

Document isolation

No

No

Yes

Syscall filter

No

No

Yes

Sparse filesystem

No

No

Yes

This means you get both improved performance and better security when converting documents with Collabora Online.

Performance

The first chart shows how Collabora Online performs compared to JODConverter’s LibreOffice backend and unoconv when we consider threading and measure the number of documents converted during a second:

You can see that Collabora Online not only has an initially superior performance, but it also scales better as you use more threads. (We compare curl invocations for Collabora Online with java commandline invocations of JODConverter and python commandline invocations of unoconv.)

Building

If you want to try out JODConverter with its Collabora Online backend:

Running

The URL is your Collabora Online server URL, it is the https:// value from the installation guide.

Using the Collabora Online REST API directly

In case you are not using JODConverter already, you can use the REST API directly, for example:curl -F "data=@test.txt" https://localhost:9980/lool/convert-to/pdf > out.pdfcurl -F "data=@test.txt" https://localhost:9980/lool/convert-to/png > out.png

Alternatively you can use the HTML forms to specify the format, for example:curl -F "data=@test.txt" -F "format=pdf" https://localhost:9980/lool/convert-to > out.pdf