Update January 2018

We're currently preparing performance tests of the PDF to book function. We should know more in early February.

Update September 2017

Our current PDF rendering service, the offline content generator (OCG), is no longer maintainable. Simply put, it's breaking down. The Reading team at the Wikimedia Foundation has been working towards replacing it for months. OCG has been running on outdated code which may introduce security vulnerabilities and other major issues in the future. Over the last three months, we’ve had banners on the PDF creation page asking for feedback on the prototype for our new renderer. The new renderer will have improved capabilities from OCG – it will be able to print tables and infoboxes and will contain styling focused on better readability. We've gathered a lot of good feedback on the prototype and are working on making the required updates to our new PDFs.

Later addendum: Turning PDF book rendering OFF for the short term

Unfortunately, major issues with our old renderer (OCG) will require us to remove it as a rendering option prior to completing the necessary updates for the books feature. This is earlier than we wanted. By the time we remove OCG, the work for rendering of single articles will be completed. However, the rendering of books will be paused while we evaluate and complete the necessary work. Our initial choice of renderer for the replacement, the Electron rendering service, is not capable of supporting PDFs of larger sizes and fails when attempting to render a book with multiple articles. We will be working to select a new rendering system for books which can handle the size of the files and support our requirements. This is not how we planned to do this. We never aimed to temporarily remove the book PDF functionality.

In addition to this page being updated, this will be communicated in a banner on PDF creation page, in Tech News and on some Wikimedia mailing lists.

Wstęp

Our current PDF rendering service, the offline content generator, is no-longer maintainable. Simply put, it's breaking down. Originally created by a third party, it currently runs on outdated code which may introduce security vulnerabilities and other major issues in the future. If we're to have the PDF functionality, we unfortunately have to replace it, or we might suddenly find ourselves in a situation where we'd have to take it down without having planned to do so.

Additionally, it does not support a number of rendering requests from the community, the main one being the ability to render tables. We have selected a new service, the electron rendering service, as a suitable replacement. Our next step is to duplicate the functionality provided by OCG using the electron rendering service. Below, we will describe the main portions of the functionality we have identified as necessary. We would like to invite conversation around what is missing or what is superfluous in the provided list. We would also like to highlight over our future plans for PDF rendering to gather initial feedback.

Userbase

The following table shows a sample of traffic to the Electron "Download as PDF" service for over a 6 hour period. The traffic is broken down by operating system (OS), browser, and the browser major version (e.g. Windows 7, Chrome v61.*).

Note well that the majority of our traffic appears to come from Windows based machines.

OS

Browser

Browser Major Version

% of requests

Other

Other

-

14.38

Windows 7

Chrome

61

12.42

Windows 10

Chrome

61

8.83

Windows 7

IE

11

7.33

Windows 7

Firefox

56

6.59

Windows 10

Firefox

56

3.82

Windows 10

Edge

15

3.24

Windows 8.1

Chrome

61

3.07

Windows XP

Chrome

49

2.2

Windows 10

Chrome

59

1.53

Windows 10

IE

11

1.51

Windows 8.1

Firefox

56

1.31

Windows XP

Firefox

52

1.22

Windows 8

Chrome

61

1.15

Windows 8.1

IE

11

1.15

Mac OS X

Safari

11

0.9

Windows 7

Firefox

53

0.89

Windows 7

Firefox

52

0.78

Ubuntu

Firefox

56

0.78

Windows XP

IE

6

0.7

Windows 7

Chrome

55

0.68

Windows 7

Firefox

55

0.62

Mac OS X

Chrome

61

0.62

Android

UC Browser

11

0.6

Windows 10

Edge

14

0.59

Windows 7

Opera

48

0.53

Android

Chrome Mobile

61

0.49

Windows 10

Opera

48

0.44

Windows 7

Chrome

60

0.4

Windows Vista

Chrome

49

0.39

Windows 7

Yandex Browser

17

0.37

Windows 10

Firefox

55

0.37

Mac OS X

Safari

10

0.36

Windows 10

Chrome

50

0.34

Android

Android

4

0.33

Mac OS X

Firefox

56

0.33

Windows 10

Chrome

60

0.32

Windows 8.1

Chrome

43

0.3

Android

Amazon Silk

60

0.29

Windows 7

Sogou Explorer

1

0.27

Windows 8

IE

10

0.26

Windows 7

IE

8

0.26

Windows 7

IE

9

0.25

Windows 8

Opera

12

0.25

Linux

Firefox

52

0.25

Mac OS X

Firefox

53

0.24

Windows 7

Firefox

45

0.24

Windows 10

Firefox

57

0.24

Windows 7

Firefox

38

0.22

Windows 10

Firefox

47

0.21

Current Functionality Requirements

The following is a list of the current requirements for PDF rendering for single-article PDF's and for books. The requirements different from the current implementation are displayed in bold.

Multiple issues with OCG are identified, including complaints from the community around OCG's inability to render tables.

Rendering of tables ranks as number 9 on the German-speaking Community Technical Wishlist.

Wikimedia Deutschland begins on working on a solution for rendering tables in PDF's, and introduces Electron. They do this planning to run it alongside OCG, not to replace it.

At the same time as Wikimedia Deutschland is working on the Electron service, the responsible maintainers of the OCG service at the Wikimedia Foundation come to the conclusion that OCG has to be replaced.

The WMF Reading Team takes over the responsibility for the long term maintenance of PDF rendering begins plans on implementing table rendering across all projects.

The Reading Infrastructure and Web teams begin scoping the working necessary to port OCG functionality over to the Electron service.

Update After Consultation

Proposed PDF and print styles based on feedback from consultation

We launched a consultation on the current implementation of the PDF renderer in early June, 2017. After reviewing the consultation responses, we have made the following observations:

A larger number of users preferred the single-column format over the double column format

Users which prefered the double-column format highlighted that their preference was based in the styling and look and feel of double columns. Some users also expressed concerns with font size and wasting paper when printing PDF's in the single-column option

Based on the feedback, we have incorporated the following into our new print styles:

hyperlinks

article information

smaller font and book-like styling

The remainder of the requests above will be postponed until the second iteration of the PDF renderer, in which we plan to build a settings mode that will allow for customization of the available options.

Proposal

The following is a proposal for the scope of functionality necessary for PDF rendering:

Individual articles will be rendered to PDF using the "Download as PDF" link in the sidebar

Multiple articles will be rendered to PDF using the Book Creator tool

All articles will contain attribution for text and images

All PDFs rendered will be able to print tables

Users will be able to customize the layout of their PDF (optional)

Differences between current and future implementation

OCG

New Service

Notes

Rendering individual articles

Tak

Tak

Rendering multiple articles using the book creator

Tak

Tak

Contains table of contents for multiple articles

Tak

Tak

Renders tables

Nie

Tak

Attribution

Tak

Tak

Open question: location of attribution within the new service

Styling

Latex

New styles

N-column layout

Tak

Nie

Default 2-column layout

Tak

Tentative

Default one column or two-column layout will be chosen based on feedback and quantitative and/or qualitative testing

Output format

PDF, Plaintext

PDF Only

Design

The new PDF styles will be designed for increased readability. Based on community feedback and qualitative or quantitative testing, support for a 2-column layout may be built for the book creator and/or for individual PDFs.

Examples of new PDF output - Styles will be updated based on feedback from the ongoing consultation

Development and Deployment Roadmap

The following is a rough outline of the development and deployment roadmap. It is subject to change.

Kwiecień - Maj 2017:

The Reading team builds back-end support for functionality identified above

Communities are consulted on expanding or shrinking proposed functionality

Qualitative test performed for styling

Czerwiec - Lipiec 2017:

New styles implemented

First iteration is launched along with OCG on all projects and performance is compared

Iterations based on consultations and identified edge cases

sierpień 2017 - wrzesień 2017

Additional changes made if necessary

Październik 2017

Second iteration launched without OCG on all projects

Single Articles

A PDF for a single article will be created by selecting the "Download as PDF" link

Upon selecting "Download as PDF", the PDF file will be generated. To download the file, users will select the "Download the file link"

Each PDF file will contain the following:

Article title and text

Infobox (if any)

Tables (if any)

Single-column layout

Page number

All article images and captions

Links to pages linked from the article (blue links and external links)

Text and image sources, contributors, and licenses

Phabricator Tracking

All PDF-related changes including sunsetting OCG, replacing the Electron PDF renderer, and any updates to books or the collections extension are tracked under the phabricator project Proton. The project page will display any recent updates for all tasks related to PDFs.

Books

Functionality available in October, 2017

Note: no changes will be made to the current book creator workflow at this time

User will launch the books creator by selecting "Create a book"

This will navigate to the current book creation page

To download a book, users will select the "download" link from the books page

Users may only download books in PDF format

Books will contain all elements from single article format as well as:

Book title page

The references for each article from the book will appear at the end of the article

Each article will begin on a new page

A single section for text and image sources, contributors, and licenses, that contains the collected contributions from all articles

Functionality available in November - December, 2017

Books will contain a table of contents with page numbers

Selecting a section from the table of contents will navigate the user to the corresponding section within the book