All posts for the month February, 2014

A paperless office is exactly as it sounds—an office without paper, relying instead upon digital documents in digital files.

While the concept’s been around for a long time, the paperless office has been difficult if not impossible to achieve. Today, however, technology advances have made it not only possible but even attractive to take your office paperless—or, if not entirely paperless, certainly less reliant upon the veritable printed page.

Paperless is now an Even Better Option

Scanners are faster and can handle larger loads. OCR (optical character recognition) is better. Digital storage is cheaper. Services that will do everything from digitizing your paper documents to storing your electronic files abound. And, most importantly, a great digital format standard exists in the form of PDF.

The Portable Document Format is the best option for storing your digital documents because it’s:

Standard. You can save all kinds of files as PDF files. That includes Microsoft Word, Excel, and just about any other software format. Once in PDF, your documents can be easily read and made available to whoever needs access to them. All that’s required is a PDF reader such as the free Foxit Reader and you’re good to go.

Secure. There’s a whole spectrum of security for PDF files available to you when using PDF software such as Foxit PhantomPDF. You can choose everything from using no security at all to applying various levels of password protection, encryption and even rights-managed protection.

Small. (Or at least, smaller.) PDF files are much smaller than other formats, which saves you hardware space and makes them easier to send online.

Searchable. That means no more digging around massive numbers of paper documents, hanging file folders and underneath your boss’s desk trying to find one elusive sheet of paper or that much-needed paragraph.

Indeed, when you’re moving to the less-paper/paperless office, the key benefit really comes down to the searchable PDF format. After all, you don’t want to save physical space at the risk of misplacing valuable information.

How OCR Makes PDF Searchable

Fortunately, most of today’s scanners use OCR software that can convert your paper documents into searchable files.

Searchable PDF files are similar to normal PDF files, except that, in addition to the scanned bitmap image of your paper document, they include an invisible overlay containing searchable text. That’s true for any kind of searchable PDF document you create, whether it’s from published brochures, text in legal judgments, or handwritten notes your boss passed you on a paper napkin. It’s also true whether you use PDF software or OCR software to create it.

This enables you to use your digital file system as a searchable database to find keywords, names and phrases that can help you locate the information you need.

Don’t have a scanner capable of creating searchable PDF files? No worries. Any of numerous PDF software applications such as Foxit PhantomPDF can turn a scanned image into a searchable PDF.

PDF IFilter – Server was benchmarked by a knowledgeable independent software reviewer at MSDN Blogs. Key test results include data set crawl time, which Foxit PDF IFilter – Server finished in just 13 minutes—six times faster than TET PDF IFilter and a whopping 39 times faster than Adobe PDF IFilter.

Have a large volume of paper documents? Consider third-party conversion.

Of course, if you’ve got massive amounts of documents that require conversion, scanning them by hand, even with a high-speed scanner that creates searchable PDFs, could take too much time. In that case, consider using one of the many document conversion companies that use PDF software to convert your paper files into searchable PDFs for you. Most will send you a DVD, CD, hard drive or flash drive with your newly searchable PDF documents and will return your original paper documents to you, if you wish.

Many of these services also provide secure hosted storage solutions that can house your digital files for you. You can download your files from their servers or leave copies as a backup. One of the main benefits in this less-paper office scenario is that, if your power goes out or there’s an emergency, you can still access your PDF files on the document conversion company’s hosted server.

Digital Storage Space for your PDF documents

With computer hard drives so affordable these days, creating your own digital file cabinet is within reach even for small businesses with limited resources. You can simply buy a computer, install extra hard drives and create your own server to store all of your PDF files.

For companies with more resources, consider getting a server with the necessary amount of hard drive space to store your PDF files. When you need more space, you can simply buy and install more drives.

For PDF files used for record keeping and backup purposes, which don’t need to be accessed often, you have the option to transfer them to DVD, external hard drive or a cloud-based secure storage solution. For varying monthly fees, you can have access to as large a PDF file storage space as you need.

All in all, searchable PDF and PDF software in your less-paper/paperless office delivers on the promise. No more digging through file cabinets. No more paper cuts. No misfiled documents.

Extracting useful information from PDFs can be a challenge when you’re talking about a gigantic number of PDF documents. Which is why the Sunlight PDF Liberation Hackathon took place. Unlike its name, the hackathon was not about breaking into anyone’s private database of PDF documents but rather, was dedicated to improving tools for PDF extraction.

Why the need? There are many organizations, including public interest groups, that want to search PDF documents en masse.

Everyday Examples of Extracting Data from PDFs

For example, one of the Foundation’s challenges centered on the financial performance of the nation’s major cities. Most large US cities publish Comprehensive Annual Financial Reports (CAFRs) in the form of PDFs. These documents contain a large set of audited financial statements with footnotes. The challenge was to extract a single statement – a ten-year history of revenues and expenditures – from the latest CAFR for four cities (Chicago, New York, San Francisco, and Washington DC) participating in the hackathon. The results enable comparison of revenue sources and spending priorities across cities over a number of years—an obvious benefit to local, state and national governmental agencies, not to mention taxpayers.

As another example, Members of the House of Representatives file a yearly report on their personal finances. Though this report is often submitted electronically, it is only made available in PDF form on the Clerk of the House’s website. The challenge was to find a reliable and sustainable way to extract the information entered on the form, which shifts with downloads and content.

As such, Sunlight’s PDF Liberation Hackathon aimed to tackle real-world PDF data extraction problems and bring coders together to add features, extensions, and plugins to existing PDF extraction frameworks, making them more flexible, useful, and sustainable.

More Information on How to Extract Content from PDF

Developers interested in furthering the research may want to take a look at the Foxit Embedded PDF Software Development Kit (SDK).

The industry leading PDF SDK is targeted to developers, device manufacturers, and telecom carriers who support PDF applications that leverage powerful, standard-compliant PDF technology to securely display, search, and annotate PDF documents and to fill PDF forms. Developers can use the SDK to search for specific text in PDF documents and then extract the content. They can then parse and save the extracted text. Click here for more information on the SDK.