GWAVA's Retain Archives and Indexes 5,000,000 PDF Documents Each Day with PDFxStream

GWAVA is
successfully using PDFxStream to enable its Retain email archiving and retention solution to
archive and index millions of PDF email attachments each day. This pairing enables GWAVA to deliver
to its customers more comprehensive and capable email archiving processes that comply with corporate
auditing requirements and government regulatory measures.

GWAVA is the leading provider of software solutions for Novell GroupWise, and its Retain email
archiving product is a key part of their offering in the GroupWise community. Retain integrates with
GroupWise's email handling infrastructure to archive and index emails passing through a GroupWise
instance according to rules set by system administrators. These archives and their attendant search
indexes are critical to many organizations' operations, as they are a key part of fulfilling many
auditing protocols and regulatory compliance measures.

The Challenge

"Snowtide is the perfect fit
and solution for GWAVA users. Throughout our extensive testing, PDFxStream proved itself to be
far and away the best PDF content extraction solution available on the market."
Charles Taite, CEO & Co-Founder, GWAVA

The days of plain text or simple HTML email passed long ago, and GWAVA needed Retain to archive and
index all of the various types of email attachments. Of course, PDF documents are one of the most
important and common types of email attachments, so it was clear that Retain needed to be able to
archive and index PDF documents. This was doubly important since the organizations that have some of
the most stringent regulatory and auditing requirements (such as law firms, government agencies, and
institutions of higher education) depend so heavily on emailing PDF documents as part of their
normal workflow.

With this in mind, GWAVA set out to find a software component that would allow Retain to fold the
textual content of PDF email attachments into its existing indexing and archiving processes. This
component would need to yield highly accurate content extraction results with the greatest possible
performance, and be easy to integrate and maintain within the Retain codebase.

The Solution

This search was lead by Michael Bell, GWAVA’s Vice President of Research and Development. His team
built a PDF text extraction framework in order to thoroughly test potential solutions, and proceeded
to put a variety of PDF libraries through their paces. In the end, the GWAVA team chose
PDFxStream.

PDFxStream was the only solution that clearly had the primary goal of quality text
extraction, rather than handling that as an afterthought.
Michael Bell, Vice President of R & D, GWAVA

This decision was fundamentally based upon PDFxStream's singular focus on PDF content extraction,
the benefits it provides because of that focus. “Each of the other products we tried had different
problems: many were slow and unreliable, most had poor international character set support,” says
Bell. “PDFxStream was the only solution that clearly had the primary goal of quality text
extraction, rather than handling that as an afterthought.”

Results

PDFxStream enabled GWAVA to add PDF file attachment archiving and indexing to its Retain product,
thereby helping its customers comply with critical auditing regulations. PDFxStream is now
distributed worldwide as part of the Retain solution. And according to Bell, “We don’t have exact
statistics, but it would be reasonable to estimate that PDFxStream is extracting content from
approximately 5 million PDF documents each day across our entire installed base.”

Integrating PDFxStream into Retain was simple, too. “Compared to other libraries we evaluated,
working with PDFxStream has been very easy,” says Bell. “It took us no more than ten minutes to
implement it.”