The PDF file format: A work in progress

With almost every sector of the economy facing a digital transformation, businesses must find new ways to get their information and data online. No longer does it make sense to have documents stored on paper. To keep up with the ever-changing times, more and more businesses are turning to the Portable Document Format (PDF).

“Due to proliferation of new platforms, devices, and technologies, providing a quality PDF solution is more challenging than ever before,” said Catherine Andersz, director of PDFTron. “So far the PDF format stood the test of time, but it’s facing challenges due to fragmentation and poor implementations of the standard as well as relevance in the new world of small devices.”

The biggest benefits to moving to PDFs are that businesses can guarantee their documents will be accessible, viewable, and printable by everyone at any time, according to Gerald Holmann, founder and president of Qoppa Software.

Today, PDF viewers are available across browsers, operating systems and applications, making it ubiquitous, according to Matt Kuznicki, CTO of Datalogics. However, as more users take interest in the technology, there will be a wider range of industries that PDFs have to address.

“The PDF format contains a huge set of features and functionality designed for different audiences, and understanding the needs and capabilities of different workflows is now more important than ever,” he said.

PDF 2.0…
The PDF file format was once a proprietary format owned by Adobe systems. Today, it is an open standard maintained by the International Organization for Standardization (ISO). The last version of the PDF standard Adobe put out was version 1.7. As part of PDF 1.7, Adobe added supplementals incorporating features that came out after the release.

Since the standard was handed over to ISO, the organization has been working to integrate those features into the upcoming main standard, PDF 2.0. Notable features include redaction annotations and how to burn them into documents, new encryption algorithms, and 256-bit encryption, according to Holmann.

“Since the ISO committee is about to publish PDF 2.0 next year, companies will need to think [about] how to ensure that their solutions and workflows can handle the new format correctly,” said Andersz. “Many PDF technology providers such as PDFTron supported key features that are part of the upcoming standard already for years. For this reason we expect that the transition will be relatively smooth. However, there were also many other changes in areas such as tagging, color management, signatures, and measurements, that are still not well supported but are expected to gain more attention in the coming years.”

The upcoming release of the standard also puts significant focus on clarifying confusing areas and specifications of the standard. According to Kuznicki, the previous standard included technical language that was hard for users and implementers to understand.

“While the update has been many years in the making, in most respects it is an incremental update to the PDF specification rather than introducing significant or breaking changes,” said Kuznicki. He hopes the standard will be finalized and made available by early next year.

The 2.0 standard will also provide more flexibility to objects that are already in place, and add new features around encryption, digital signatures and new fonts, Holmann explained.

“The reason it is taking so long mostly is that it has become a distributed process,” he said. “Instead of having one company or a few people working on it, it is a bunch of companies doing different parts of the format. It in a way gives the format a lot more stability and a lot more reliability. The industry knows it is not going to be changed at the whims of a company or for a company’s benefit only. ISO has to take into account everyone that is using the format.”

…and beyond
Work in the industry and by the ISO will go beyond the PDF 2.0 standard. The software and technology industries are constantly changing, so the expectation is that the PDF format will continue to add features as new things come up.

For example, Holmann explained how fonts use Unicode, a standard for encoding and handling text. A new Unicode format will be coming out with more characters, and the PDF standard will have to incorporate it. In addition, there are always new font formats being released with new features, so as time goes on new versions of the PDF standard will have to adopt them too.

“Perhaps a victim of its own success, as PDF became more widely adopted, so did spotty and partial implementations of the standard,” said Andersz. “The trend is already eroding the core value proposition of the format and some people in the industry are worried. At the same time connected users have leverage over companies these days, and businesses have tougher time providing substandard or half-baked software. Any crashes and annoyances…quickly lead to unsatisfied users and negative ratings. So software quality and reliability will be a key factor in helping products stand out of the crowd.”

Kuznicki expects to see user experience improved with a focus on those with vision impairments; easier access to information within a PDF; machine learning and business intelligence capabilities where software automatically understands and acts upon information in a PDF; and advances on how traditional PDF readers access information.

Holmann adds there has already been discussion on how to make the PDF format more malleable. For instance, PDFs don’t flow like a Microsoft Word document. When you remove a word from a paragraph in a PDF, you can’t reflow the paragraph. You end up with a blank space in the middle of the paragraph, just like a paper document.

“The intent of the PDF format was not to be an editable document like Microsoft Word,” said Holmann. “The intent was more of a paper format, so in that sense it lives up to its intent. But of course when people are trying to use PDFs in their own applications and their own set of processes, they want it all.”

Choosing the right PDF solution
Different audiences and workflows have different needs, despite the versatility of PDF. When choosing a PDF tool for a particular audience, developers will have to choose wisely, Kuznicki stresses.

“Print and pre-press, archival, engineering, and business document workflows all have different expectations of PDF and different feature sets that deliver value for their situations,” he said. “That said, different PDF toolkits and SDKs vary in their abilities to meet the needs of these different audiences.”

“It is important for developers to communicate their needs, use cases and expectations so that the vendor can understand all requirements that may be applicable to the developer’s business and users,” according to PDFTron’s Andersz. “This will help ensure that there are no surprises down the road when it comes to meeting the developer’s needs.”

There are many PDF toolkits claiming to do everything, and many times budgetary or time pressures lead developers look to quickly pick a solution based on these factors. However, it is still important to evaluate the toolkits to ensure the one they select is able to handle different types of PDF files and truly support the features they are looking for, according to Andersz.

“Many times a toolkit will work fine with basic or certain types of files, but causes crashes, errors or inconsistent rendering and performance with more complex files” she said.

When picking a PDF tool, some things to keep in mind include:

Knowing a solution must meet developers’ needs without requiring them to be experts in PDFs and dive deep into the details. However, Kuznicki notes a good tool allows developers to go behind the scenes if they need or want to.

Because the PDF format is so complicated, the vendor should also provide developers with prompt and expert responses if something goes wrong or is misunderstood, Holmann says.

Also while immediate requirements are usually most important, looking at the long-term road map in terms of the ability of the vendor to support their future needs may be just as important for developers when picking a toolkit, according to Andersz. For example, a developer might be looking only for an iOS PDF toolkit at the moment, but may need to support PDF processing as part of their web app in the future. So instead of having to purchase another toolkit from a different vendor down the road because the toolkit they initially licensed does not support a certain platform or feature, they could use one toolkit, not only saving them development resources, but also providing them the same performance across all the platforms they need, she explained.

In addition, since the PDF specification is so complex, partnering with a vendor that has in-depth knowledge and understanding of the format and related technologies will be beneficial. This usually comes with writing the code you license. For example, vendors relying on open-source or proprietary tools for rendering will undoubtedly have a harder time building and supporting new features, or ensuring their technology stays conformant with new versions of the PDF specification.

“There are a lot of things developers need to be aware of nowadays that requires [them] to dig a little deeper, beyond just the feature list or price,” said Andersz. “With everyone claiming to be a ‘leader,’ developers need to ask more questions to better understand the differences between the vendors they are comparing and their respective technologies.”

A solution must be able to work with a wide variety of PDF files from all kinds of sources and creators. “We have found in our experience that there are many different ways, each correct but slightly different, to represent PDF file contents,” Kuznicki said. PDFs include different fonts, colors, images and more, so you want full feature coverage in a tool, according to Holmann.

“I always recommend that software developers look for PDF technology providers who are PDF experts and can provide expertise when needed, who have longevity and experience, and who can package their experience and expertise into solutions where developers do not need to become PDF experts,” Kuznicki added.

The importance of the PDF format
As the ISO puts enormous amounts of energy and effort into clarifying and evolving a format that has been around for more than two decades, we can’t help to wonder how PDFs became the dominant electronic format over the years. To understand the landscape, SD Times talked to some experts in the industry to understand why there is so much interest in the PDF format.

Matt Kuznicki, CTO of DatalogicsOver the past 20+ years, the PDF format has proven to be the most reliable means to present information to readers in the way that the author of that information intended, and that can be reliably given to others and read later in time and in different locations. Other formats in common use either introduce ambiguity in how information is presented or have limited support across different readers.

The PDF format has evolved over time to encompass a wide variety of functionality required by different audiences, always with the goal of allowing the author to reliably convey content to readers in the manner and appearance of the author’s choice. No other format contains the variety of available features that PDF contains.

It’s always tempting to consider PDF as a settled technology, when in fact it has evolved constantly. The community of PDF technology providers has worked very hard to keep compatibility and interoperability between different solutions over time, so that the ultimate users of PDF technologies do not have to think about these changes and evolutions. A lot of expertise is required to keep up with this continued evolution.

Gerald Holmann, founder and president of Qoppa SoftwareIt is not necessarily better than other formats; there are a few alternative formats that have similar features, but it covers a wider breadth of features than any other format that we are aware of. (The only exception might be Microsoft XPS format.)

The PDF format is not like any format where you just define pixels. It is a collection of objects or graphical elements that are put together in a certain way on a page. The way that it is designed, it can take just about any type of image, it can take any type of font, and it can take any type of text. It can work with different color spaces for the printing industry or for displaying on the screen. It can do digital signatures to lock down the documents. It provides encryption with passwords. It just has this wide breadth of features that can really replace paper for most document processing, document storage and document needs.

At some point it became the de facto electronic format essentially in the world. Everybody knows PDF. Everybody uses it. Everybody has a PDF viewer on their desktop, laptop or mobile device, so it has become a universal format. If you are thinking about doing any electronic processing, PDF is going to be the first thing that comes to mind because you know that all your customers are going to support it, all your vendors are going to support it, and all your internal processes are going to support it.

Catherine Andersz, director for PDFTronToday, as the most commonly used format, the PDF is the de facto standard for electronic documents, and is vital to business and organizations around the globe. Everyone needs to be able to access their PDFs and work with PDF content in various ways (whether filling out a form, signing documents, redacting information, etc.). The more the PDF format becomes ingrained, the higher the user expectation becomes for everything with PDFs to just work.

And, on top of that, the technology landscape is constantly shifting, introducing new requirements… We have been going mobile for quite some time, and the web is the next big thing. PDF technology providers will need to ensure that PDFs get reliability processed across various devices, and that users have a consistent experience, whether in a traditional desktop/server environment, or working with their PDF documents in a mobile or web app.

A guide to PDF-management toolsDatalogics: Datalogics provides best-of-breed PDF technologies for developers. The Adobe PDF Library is a multi-platform API offering a wide range of PDF manipulation and printing capabilities, with Adobe’s staple color and font accuracy. PDF Java Toolkit is a pure Java API with robust support for PDF forms and digital signatures.

PDFTron: PDFTron provides powerful cross-platform PDF APIs enabling app development for desktop/server, mobile and web apps, with consistent, high-quality output, as well as top-notch performance on even the most complex files. PDFNet SDK APIs can be accessed from any language/platform (Xamarin/C#, JavaScript, C++, Java, Objective-C, etc.), providing support for annotation, collaboration, forms, digital signing, editing, printing, file conversion, redaction, and more. PDFTron’s WebViewer technology enables viewing and embedding PDF, Office and other formats in any HTML5 app on any device. Also, announced at the last PDF Tech Conference in San Jose in 2015 as the first complete PDF toolkit for the web, PDFNetJS is the latest addition to PDFTron’s web-based technologies, enabling to view, annotate and edit PDFs directly in any modern desktop browser.

Qoppa: Qoppa Software offers an extensive suite of PDF libraries and visual components that cover all PDF processing needs. PDF functions include creation and modification, assembly, conversion to images and HTML, automated printing, encryption and digital signatures, form fields, viewing and markup, optimization, and a lot more. Qoppa products provide the highest level of performance and reliability and are 100% Java, so they run on all servers and desktop operating systems.

Accusoft:PDFXpress is a full-featured PDF SDK that makes it fast and easy to enhance your application with a broad range of PDF features including file creation, editing, text and image extraction, and standard PDF security using easy-to-implement, concise code. Users are empowered to rapidly render large PDF images and files. Apply customizable compression settings, and perform lossless compression to reduce file size without sacrificing render quality.

ActivePDF: Over 14 years, ActivePDF has developed and refined a comprehensive collection of PDF automation tools that make development easy. ActivePDF helps avoid delays, downtime and headaches. More than 23,000 satisfied customers have chosen ActivePDF, from startups to Fortune 100 companies.

Adobe: A company defined by its market-leading PDF technology, Adobe offers Adobe Document Cloud for document management across mobile devices and PCs. The Document Cloud features the Adobe Acrobat DC PDF solution, which provides a touch interface for document management through native mobile apps.

Amyuni: Amyuni provides developers and system administrators with high-performance PDF conversion and processing tools. Certified for Windows desktops and servers, Amyuni PDF Converter enables developers to easily integrate powerful PDF and PDF/A functionality into their applications with just a few lines of code. Amyuni PDF Creator produces optimized PDF documents and is available for .NET, WinRT and ActiveX.

Aspose: Aspose creates file format APIs that help .NET and Java developers work with documents. Aspose.Pdf for .NET and Aspose.Pdf for Java are APIs for creating, editing and converting PDF files. They support a wide range of features, from simple PDF file creation, through layout and formatting changes, to more complex operations like managing PDF forms, security and signatures. In addition, the company also provides PDF solutions for Cloud, Android, SharePoint, Reporting Services and JasperReports.

CeTe: CeTe Software’s DynamicPDF product line, including Merger, Generator, Viewer, Rasterizer, PrintManager and Converter, provides developers access to a complete integrated PDF solution. Functionality includes PDF creation and manipulation, PDF conversion (to and from PDF), PDF printing, as well as an embeddable PDF Viewer. The DynamicPDF libraries and components have functionality for .NET (C# and VB.NET), Java and COM/ActiveX.

ComponentPro:Ultimate PDF for .NET is a 100%-managed PDF document component that helps you add PDF capabilities in .NET applications. With a few lines of code, developers can create a complex PDF document from scratch, or load an existing PDF file without using any third-party libraries or ActiveX controls. The Ultimate PDF component also offers many features, including drawing text, image, tables and other shapes; compression; hyperlinks; security; and custom fonts. PDF files created using the Ultimate PDF component are compatible with all versions of Adobe Acrobat as well, as is the free version of Acrobat Viewer from Adobe.

Glyph & Cog: Glyph & Cog offers a full line of software components designed to help developers add PDF capabilities into their applications. Functionality includes PDF viewing (Qt and ActiveX), printing, text extraction, and more with cross-platform support for Windows, Mac and Linux. Glyph & Cog’s newest product is PDFdeconstruct, a tool that decomposes PDF content into an XML file.

GrapeCity: Within the ComponentOne Studio product, GrapeCity provides UI controls for application development. Its offering includes PDF controls for creating and viewing PDF documents in Windows, web, and Windows Store apps without requiring users to install Adobe Acrobat. With the ComponentOne Studio PDF control for WinForms, WPF, UWP, MVC, ASP.NET, and Silverlight, users may generate and view full-featured reports with encryption, compression, outlining, hyperlinking, attachments, and everything else PDF users need. The new FlexReport reporting engine exports to PDFs and also includes FlexViewer for Windows and web apps, and supports PDF viewing with full navigation features.

LEADTOOLS: LEADTOOLS’ Document Imaging toolkits include a full suite of PDF SDK technology for viewing, editing, creating and converting PDF and Office formats. The Document Viewer framework includes an advanced set of tools such as text searching, annotations, memory-efficient paging, inertial scrolling, and vector display. Developers can implement comprehensive PDF reading, writing and editing with support for the extraction of text, hyperlinks, bookmarks, digital signatures, PDF forms and metadata, as well as updating, splitting and merging pages from existing PDF documents.

Persits Software: Persits Software’s AspPDF and AspPDF.NET are feature-packed server components for managing Adobe PDF documents for ASP and .NET environments, respectively. Their simple and intuitive programming interface enables a Web application to perform many useful PDF-related functions, such as form fill-in, HTML-to-PDF, and PDF-to-image conversion, text extraction, stamping, digital signing, automatic printing, barcode generation, and many others, in just a few lines of script. Free fully functional 30-day evaluation versions are available.

ORPALIS:GdPicture.NET offers extended support of the PDF format for .NET (C# and VB.NET) and non-managed applications written in VB6, Delphi, Microsoft Access and more. Its numerous features include full Unicode support, PDF/A generation, digital signature support, PDF merging and splitting, PDF modification, PDF rasterization, and PDF creation with interactive form fields. With GdPicture.NET, you can also repair corrupted PDFs, add or extract fonts, and draw barcodes and annotations on documents.

TallComponents: TallComponents offers reliable and proven .NET class libraries for desktop, server, mobile and cloud to create, modify, convert, read, print and render PDF documents. The libraries are written entirely in C#, have no external dependencies such as Adobe Reader, and are characterized by an intuitive API combined with knowledgeable and fast support.

Article Tags

About Christina Cardoza

Christina Cardoza is the News Editor of SD Times. She is responsible for the oversight of the daily news published to the website as well as the company's weekly newsletter, News on Monday. She covers agile, DevOps, AI, machine learning, mixed reality and software security. She is an undeniable nerd who loves Marvel comics and Star Wars. On Follow her on Twitter at @chriscatdoza!