Document Imaging

Document imaging is an information technology category for systems capable of replicating documents commonly used in business. Document imaging systems can take many forms including microfilm, on demand printers, facsimile machines, copiers, multifunction printers, document scanners, computer output microfilm (COM) and archive writers. Document imaging is a form of enterprise content management, built around the need to manage and secure the escalating volume of electronic documents (spreadsheets, word-processing documents, PDFs, e-mails) created in organizations.

In this tenth video of my Xpdf series, I discuss and demonstrate the PDFtoPS utility, which converts a PDF file to PostScript (PS). Also, it provides an option allowing creation of an Encapsulated PostScript (EPS) file. It performs its functions via a command line interface, making it suitable for use in programs, scripts, batch files — any place where a command line call can be made.

1. Download the software

You may have already downloaded the Xpdf tools while watching one of my earlier videos in the series, but there has since been an upgrade from Version 3 to Version 4 and there is a new download site:

In this ninth video of my Xpdf series, I discuss and demonstrate the PDFtoPPM tool, which converts a PDF file to color portable pixmap (PPM) format, grayscale portable graymap (PGM) format, or monochrome (black & white) portable bitmap (PBM) format. It creates a separate image file for each page of the PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any place where a command line call can be made.

1. Download the software

You may have already downloaded the Xpdf tools while watching one of my earlier videos in the series, but there has since been an upgrade from Version 3 to Version 4 and there is a new download site:

In this eighth video of my Xpdf series, I discuss and demonstrate the PDFtoHTML utility, which, exactly as its name says, converts a PDF file to HTML. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any place where a command line call can be made.

1. Download the software

You may have already downloaded the Xpdf tools while watching one of my earlier videos in the series, but there has since been an upgrade from Version 3 to Version 4 and there is a new download site:

This is the eleventh — and final — video of my Experts Exchange Micro Tutorials on the Xpdf utilities. The first video is an overview of the command line tools. The next nine videos are tutorials on all them:

This last video in the series discusses xpdfrc, which is the single configuration file that Xpdf uses for all nine utilities. It provides an enormous number of options, allowing extensive control of the tools, such as character mapping, font configuration, PostScript control, rasterizer settings, text control, and much more.

1. Download the software and fonts

You may have already downloaded the Xpdf tools while watching one of my earlier videos in the series, but there has since been an upgrade from Version 3 to Version 4 and there is a new download site:

N.B.: As with any "free" software, there may be restrictions, which are always specified in the software's licensing agreement, typically known as the End-User License Agreement (EULA). I encourage you to read the entire EULA of this product to be certain that you are in license compliance.

This new video Micro Tutorial shows where to download the free Foxit Reader and explains how to use it to place a date-time stamp on a PDF file.

PDF-XChange Editor has many other features in its free version, but, unfortunately, it cannot do scanning — you must purchase one of its non-free versions to get scanning functionality. Fortunately, there's another excellent, free PDF product that can perform scanning — Foxit Reader. However, the free Foxit Reader cannot do OCR, so you'll want to keep the free PDF-XChange Editor for its OCR capability, and add Foxit Reader for its scanning capability. The combination of the two products will allow you to create searchable PDFs (aka PDF Searchable Image files) with your scanner, utilizing free software.

N.B.: As with any "free" software, there may be restrictions, which are always specified in the software's licensing agreement, typically known as the End-User License Agreement (EULA). I encourage you to read the entire EULA of these products to be certain that you are in license compliance.

In a question here at Experts Exchange, a member asked how to create a signature in Adobe Acrobat Reader DC (the free Reader product, not the paid, full Acrobat product). The member requested step-by-step instructions. This 5-minute Experts Exchange video Micro Tutorial provides detailed steps showing how to do it.

1. Open the PDF file and view the Tools

Open the PDF file with Adobe Acrobat Reader DC.

Click either:

the View>Tools>Fill & Sign>Open menu

or:

the sideways triangle on the right side to open the Tools panel.

2. Run the Fill & Sign tool

If you used the first method in Step 1, the Fill & Sign tool will be open.

If you used the second method in Step 1, click the Fill & Sign tool in the Tools panel to open it.

Either way, you'll have this:

3. Click the sign tool, which is the pen tip

Click the tip of the pen, which brings up the Add Signature and Add Initials choices.

Click Add Signature.

4. Select Type or Draw or Image

Click the Type or Draw or Image icon (default is Type).

Enter your signature, depending on the choice you made above.

5. Place your signature

Position the mouse on the page and left-click to place the signature.

Use the sizing handle in the lower right corner, if desired, to size the signature.

6. To edit/change signature, delete it and create new one

There is no way to edit/change the signature, so delete it and create a new one, if needed.

Click the minus sign to delete it, then start over at Step 3.

7. Save the file with the signature

After placing the signature, do a File>Save or Save As to save the file with your signature.

That's it! If you find this video to be helpful, please click the thumbs-up…

The problem discussed in that article reached epidemic proportions in July 2018. The solution proposed there is very likely to solve your problem, but if it doesn't, come back here to try the idea in this video.

Please read the paragraph below before following the instructions in the video — there are important caveats in the paragraph that I did not mention in the video.

If your PaperPort 12 or PaperPort 14 is failing to start, or crashing, or hanging, it may be because of corrupt metadata (likely) or corrupt data files, such as bad PDFs (much less likely, but possible). This video Micro Tutorial shows how to use a utility called CheckPPFolders that ships with all releases of PaperPort 12 and PaperPort 14. CheckPPFolders is able to remove all PaperPort metadata, as well as identify problem files that may be causing PaperPort to crash, hang, or fail to start. PaperPort will rebuild the metadata, but there are two caveats. First, Folder Color and Folder Notes are in the MaxDesk.ini files, so you will lose those — and there's no easy way to retain the colors and notes. Thus, if you make heavy use of Folder Color and …

This video Micro Tutorial shows how to password-protect PDF files with free software. Many software products can do this, such as Adobe Acrobat (but not Adobe Reader), Nuance PaperPort, and Nuance Power PDF, but they are not free products. This video explains how to do it with excellent, free software called PDF-XChange Editor from Tracker Software Products.

1. Download PDF-XChange Editor

Click the white-on-green Download button for either product. It doesn't matter if you download PDF-XChange Editor or PDF-XChange Editor Plus, since you'll be selecting the Free Version when you install.

2. Run downloaded installer

Run the downloaded installer and select Free Version (unless, of course, you want more features and decide to purchase the Pro or Plus Version).

3. Open a non-secured PDF file in PDF-XChange Editor

Run PDF-XChange Editor and open a PDF file that does not currently have password protection on it.

4. Open Security section of Document Properties

Click File menu.

Click Document Properties.

Click Security category.

5. Open Password Security Settings dialog

Click Security Method drop-down.

Click Password Security.

6. Fill in Password Security Settings dialog

In Options section, select Compatibility from the drop-down and what you want encrypted via the radio buttons.

In Document Passwords section, enter password to open PDF and password to change permission settings.

In an interesting question here at Experts Exchange, a member asked how to split a single image into multiple images. The primary usage for this is to place many photographs on a flatbed scanner and scan all of them into a single image file, but then easily split the single image file into multiple image files, one for each photo. The photos will be placed on the flatbed scanner with ample separation so that there is enough "white space" for the splitting software to separate the images. Of course, the solution may be used on any image that contains multiple images in it, that is, not necessarily scanned photos, as long as there is enough of a separation between images for the splitting software to detect the individual images. The solution presented in this video Micro Tutorial uses the excellent (free!) GIMP software and a filter (plugin/script) called Divide Scanned Images. Kudos to both the GIMP developers and Rob Antonishen, who developed DivideScannedImages and BatchDivideScannedImages.

1. Update to the latest version of GIMP

At the time of this video, the latest version was 2.8.20. This solution will almost surely run on earlier releases (and, with some luck, later ones), but the only version that I tested on is 2.8.20, which is available for download here:https://www.gimp.org/

In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. In addition to the name of the font, it shows the font type and whether or not the font is embedded in the PDF file (and, if embedded, whether or not it is a subset), along with other font information that is discussed in the documentation file. It does this via a command line interface, making it suitable for use in batch files, programs, and scripts — any place where a command line call can be made.

1. Download the software

You may have already downloaded and unzipped the Xpdf tools while watching the first video in the Xpdf series, but if you haven't, then visit the Xpdf website. Click the Download link and then click the pre-compiled Windows binary ZIP archive to download the utilities for Windows.

2. Locate the documentation folder for the Xpdf utilities

Go to the folder where you unzipped the downloaded ZIP file and find the doc folder.

3. Read the documentation for the PDFfonts tool

Go into the doc folder and find the plain text file called pdffonts.txt.

Open it with any text editor, such as Notepad, and read it. This is the documentation for the PDFfonts tool.

In this sixth video of the Xpdf series, we discuss and demonstrate the PDFtoPNG utility, which converts a multi-page PDF file to separate color, grayscale, or monochrome PNG files, creating one PNG file for each page in the PDF. It does this via a command line interface, making it suitable for use in batch files, programs, and scripts — any place where a command line call can be made.

1. Download the software

You may have already downloaded and unzipped the Xpdf tools while watching the first video in the Xpdf series, but if you haven't, then visit the Xpdf website. Click the Download link and then click the pre-compiled Windows binary ZIP archive to download the utilities for Windows.

2. Locate the documentation folder for the Xpdf utilities

Go to the folder where you unzipped the downloaded ZIP file and find the doc folder.

3. Read the documentation for the PDFtoPNG tool

Go into the doc folder and find the plain text file called pdftopng.txt.

Open it with any text editor, such as Notepad, and read it. This is the documentation for the PDFtoPNG tool.

Hi Tia Henderson,
I'm sure that you meant to endorse this video (rather than Kyle's comment that it was Accepted and my comment thanking Kyle). To endorse the video, you must click the thumbs-up icon that is right underneath the video steps (before this Comments section begins). Thanks, Joe

This video Micro Tutorial is the second in a two-part series that shows how to create and use custom scanning profiles in Nuance's PaperPort 14.5. But the ability to create custom scanning profiles also exists in PaperPort going back many years, so if you have an older version, such as PaperPort 11 or PaperPort 12, these videos will still be applicable for you. The first video tutorial shows how to create custom scanning profiles and reviews all the Scanner Enhancement Technology (SET) features, such as auto-straighten, delete blank pages, remove punch holes, etc. It also discusses scanning options, including Mode (B&W, Grayscale, Color), Resolution (100 DPI, 200 DPI, 300 DPI, etc.), and Size (Letter, Legal, A4, etc.). This second tutorial shows how to set the output file type for your scans, such as scanning directly to a PDF Searchable Image file, an Excel spreadsheet, or a Word document — all with text created by an automatic OCR process.

1. Run PaperPort and open the 'Output' tab of the scanning profile created in Part 1

Run PaperPort.

Click the Scan Settings button on the ribbon.

This will bring up the Scan or Get Photo pane.

Select the custom scanning profile that you created during Part 1 of this video tutorial series.

Click the Settings button.

Click the Output tab.

2. Test scanning to a PDF Image file

Click the drop-down arrow on the File type field.

Select PDF Image and click OK.

Put a document in your scanner and click the Scan button. You will now have a PDF Image…

This video Micro Tutorial is the first in a two-part series that shows how to create and use custom scanning profiles in Nuance's PaperPort 14.5. But the ability to create custom scanning profiles also exists in PaperPort going back many years, so if you have an older version, such as PaperPort 11 or PaperPort 12, these videos will still be applicable for you. This first video tutorial shows how to create (and name) custom scanning profiles (or edit existing ones) and reviews all of the Scanner Enhancement Technology (SET) features, such as auto-straighten, delete blank pages, remove punch holes, etc. It also discusses scanning options, including Mode (B&W, Grayscale, Color), Resolution (100 DPI, 200 DPI, 300 DPI, etc.), and Size (Letter, Legal, A4, etc.). The video takes a quick look at the output file type options, but that is discussed fully in Part 2 of the series.

1. Run PaperPort and bring up the 'Scan or Get Photo' pane

Run PaperPort.

Click the Scan Settings button on the ribbon.

This will bring up the Scan or Get Photo pane.

2. Create a new scanning profile or edit an existing one

To create a new scanning profile, click the New button.

To edit an existing scanning profile, click the profile you want to edit, then click the Settings button.

3. Name the new profile

Enter a name for the new profile.

If you want to copy settings from an existing profile, click the drop-down and select it.

We often encounter PDF files that are pure images, that is, they do not have text characters, but instead contain only raster graphics. The most common causes of this are document scanning software and faxing software/services that create image-only PDF files rather than PDF searchable image files, the latter having the scanned or faxed images and text created by Optical Character Recognition (OCR). The solution is to perform OCR on the image-only PDFs to create text. Many software products can do this, such as ABBYY FineReader, Adobe Acrobat (but not Adobe Reader) and Nuance's OmniPage, PaperPort, and Power PDF. Some can even do it in batch mode via a command line interface. But they are all non-free products, many quite expensive. This video Micro Tutorial shows how to OCR the pages of an image-only PDF, thereby creating searchable/copyable text, with excellent, free software called PDF-XChange Editor from Tracker Software Products.

1. Download the Free Version of PDF-XChange Editor

Visit the website for PDF-XChange Editor at Tracker Software Products:

Sometimes we receive PDF files that are in the wrong orientation. They may be sideways or even upside down. This most commonly happens with scanned or faxed documents. It is possible to rotate the view of these PDFs with the free Adobe Reader product, but it is not possible to save the PDF with the rotated pages using Adobe Reader — not even with the latest Document Cloud (DC) version (or any earlier version of Reader). To do this with an Adobe product requires the relatively expensive Adobe Acrobat (Standard or Professional). This video Micro Tutorial shows how to rotate the pages of a PDF, and save the rotated document, with excellent, free software called PDF-XChange Editor from Tracker Software Products.

Microsoft Office Picture Manager has a Picture Shortcuts pane that shows a list with the Recently Browsed folders. While creating my video Micro Tutorial here at Experts Exchange showing How to Install Microsoft Office Picture Manager in Office 2013, I discovered that Picture Manager itself does not provide the capability to delete items from the Recently Browsed folder list or to delete the list in its entirety. Fortunately, there's an easy way to do it outside of Picture Manager. This video Micro Tutorial explains the method.

1. Locate the OIScatalog.cag file

Open Windows/File Explorer or whatever file manager you use and navigate to this file:

c:\Users\<username>\AppData\Local\Microsoft\OIS\OIScatalog.cag

<username> is the user name, such as Joe in the screenshot below.

2. Exit Picture Manager and open the OIScatalog.cag file

Close all instances of Picture Manager that are running and then open the OIScatalog.cag file in Notepad or whatever text editor you use.

3. Delete lines

Delete the lines containing the folders that you want to be removed from the Recently Browsed folder list and Save the OIScatalog.cag file.

4. Run Picture Manager

Run Picture Manager to verify that the folders have been removed from the Recently Browsed list.

5. Optional test — delete entire list

Close all instances of Picture Manager that are running and then delete the OIScatalog.cag file. Run Picture Manager to verify that the entire Recently Browsed folder list has been removed.

That's it! If you find this video to be helpful, please click the thumbs-up icon below. Thank you for watching!

Microsoft Office Picture Manager is not included in Office 2013. This comes as quite a surprise to users upgrading from earlier versions of Office, such as 2007 and 2010, where Picture Manager was included as a standard application. This video explains how to correct this serious omission by the folks in Redmond and install (for free!) Microsoft Office Picture Manager 2010, which plays very nicely with Office 2013. This video Micro Tutorial is fully documented in my Experts Exchange article, How to Install Microsoft Office Picture Manager in Office 2013.

1. Determine the bit-level of your Office 2013.

Open any Word document (a new, blank one is fine).

To see if you have the 32-bit or 64-bit version of Office 2013, click the File menu, then Account, then About Word.

2. Download the Microsoft SharePoint Designer.

Download the matching bit-level for your Office 2013 from one of these links:

Hi pokercrazy,
You're welcome! I'm glad you found it helpful. If you wouldn't mind clicking the thumbs-up button under the video window, I'd really appreciate it. You have a nice day, too. Regards, Joe

In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable for use in batch files, programs, and scripts — any place where a command line call can be made.

1. Download the software.

You may have already downloaded and unzipped the Xpdf tools while watching the first video in the Xpdf series, but if you haven't, then visit the Xpdf website. Click the Download link and then click the pre-compiled Windows binary ZIP archive to download the utilities for Windows.

2. Locate the documentation folder for the Xpdf utilities.

Go to the folder where you unzipped the downloaded ZIP file and find the <doc> folder.

3. Read the documentation for the PDFdetach tool.

Go into the <doc> folder and find the plain text file called <pdfdetach.txt>.

Open it with any text editor, such as Notepad, and read it. This is the documentation for the PDFdetach tool.

4. Set up a test folder.

Copy a sample PDF file that has attachments into your test folder (in the video and the screenshots below, the file is called test.pdf, which is a PDF file created from my EE article, Windows 10 uses YOUR computer to help distribute itself, but with some attachments added to it).

In this fourth video of the Xpdf series, we discuss and demonstrate the PDFinfo utility, which retrieves the contents of a PDF file's Info Dictionary, as well as some other information (metadata), including the page count. We show how to isolate the page count in a plain text file, and the same method may be used to isolate other metadata fields, such as the Author and PDF Producer. PDFinfo provides a command line interface, making it suitable for use in batch files, programs, and scripts — any place where a command line call can be made.

1. Download the software.

You may have already downloaded and unzipped the Xpdf tools while watching the first video in the Xpdf series, but if you haven't, then visit the Xpdf website. Click the Download link and then click the pre-compiled Windows binary ZIP archive to download the utilities for Windows.

2. Locate the documentation folder for the Xpdf utilities.

Go to the folder where you unzipped the downloaded ZIP file and find the <doc> folder.

3. Read the documentation for the PDFinfo tool.

Go into the <doc> folder and find the plain text file called <pdfinfo.txt>.

Open it with any text editor, such as Notepad, and read it. This is the documentation for the PDFinfo tool.

In this video, we show how to convert an image-only PDF file into a PDF Searchable Image file, that is, a file with both the image (typically from scanning) and text, which is created in an automated fashion with Optical Character Recognition (OCR) software. To do this, we will set up a Watched Folder, such that whenever an image-only PDF file arrives in the Watched Folder, it will automatically be converted to a PDF Searchable Image file. We will achieve this using Power PDF, the newest product from the Document Imaging division of Nuance Communications. There are two editions of Power PDF — Standard and Advanced. The Watched Folder feature is in the Advanced edition only.

In this video, we show how to perform Bates Numbering/Stamping of PDF documents using Power PDF Advanced, the newest product from the Document Imaging division of Nuance Communications. There are two editions of Power PDF — Standard and Advanced. The Bates Numbering/Stamping feature is in the Advanced edition only.

Is there a way to Have the FileName displayed in the Header, but in such a way that it EXCLUDES the extension (i.e. the ".pdf")?

I have hundreds of scanned PDFs that I will first batch rename using/assigning unique Exhibit numbers and then want to use a feature like Power PDF's Header & Footer Tool to have the FileName displayed in the upper right corner excluding the ".pdf", and the page number displayed in the lower right corner. Below is a picture of what I want and attached is a PDF of what I have been able to do so far. Any help is most welcome.18-March-2016 Update:

Joe,

I also reached out to Nuance support and as yet they have not given me any useful feedback.

If it is useful, below is a link to my support ticket thread with Nuance:

I am able to add headers with a FileName and footers with a Page number.

But my problem is that I want the %FileName% header to display the name of the File in such a way that it EXCLUDES the “.PDF” extension. I want the Headers to ONLY display: "Exhibit 002", "Exhibit 003", "Exhibit 004", etc..

I realize that I could manually paste the file name into the header field, but since I have hundreds of PDFs which I have to assign "Exhibit #" file names, I want to then automate the Header process by using a macro very much like Nuance's %FileName% macro, but with the appropriate code that STRIPS AWAY the ".PDF"

Currently, this can't be done. Two hours after reading your comment, I sent this email to my contacts at Nuance:

----- Begin message to Nuance -----
With Bates Numbering in Power PDF Advanced, inserting the macro for the file name creates the variable %FileName%. That variable contains the file name without the path but with the file extension (i.e., .PDF). Are there variables with other forms of the file name, such as the path, the file name without extension, etc. (the latter is the most important and the one I'm specifically looking for at this time)? If not, please consider the macros below for a future release. Thanks, Joe

%FileName%
File name without its path but with its extension. This is its current definition, so users already using this macro will see no change.

%FileNameNoExt%
The file name without its path, dot, and extension. As mentioned above, this is actually the main reason for this request. I have users who want the Bates stamp to contain the file name, but not the ".pdf". I included the two macros below for the sake of completeness, but right now I'd be happy with just this one new macro. Also, if there's a work-around, I'd love to hear it - can you think of any way to get the file name without the dot and extension onto each page?

%FilePath%
The file path, including drive letter with colon, but without the final backslash, even for root folders. Thus, %FilePath% followed by "\" followed by %FileName% will create the fully qualified file name.

%FileExtension%
The file extension without the dot. Presumably, this will always be PDF, unless PPA in the future can do Bates Numbering on other file types.
----- End message to Nuance -----

I'll post back here if I receive a reply from them. Btw, I was unable to access your support ticket. After logging into support and clicking on the link, I received a "Permission Denied" message. Seems that ticket threads may be viewed only by Nuance and the submitter. Regards, Joe

In this third video of the Xpdf series, we discuss and demonstrate the PDFtoText utility, which converts PDF files into plain text files. It does this via a command line interface, making it suitable for use in batch files, programs, and scripts — any place where a command line call can be made.

1. Download and install the software.

You may have already downloaded and installed the Xpdf tools while watching the first or second video in the Xpdf series , but if you haven't, then visit the Xpdf website at:

Hi Andrew,
I'm glad to hear that my Xpdf series will be useful for you. This particular one, PDFtoText, is the one that I use the most in my custom programs. Cheers, Joe
P.S. Thanks for the endorsement!

In this second video of the Xpdf series, we discuss and demonstrate the PDFimages utility, which, in a single command, is able to extract all the images from a PDF file and save each one in a separate image file (PBM, PPM, or JPG). It does this via a command line interface, making it suitable for use in batch files, programs, and scripts — any place where a command line call can be made.

1. Download and install the software.

You may have already downloaded and installed the Xpdf tools while watching the first video in the Xpdf series, but if you haven't, then visit the Xpdf website at:

Document Imaging

Document imaging is an information technology category for systems capable of replicating documents commonly used in business. Document imaging systems can take many forms including microfilm, on demand printers, facsimile machines, copiers, multifunction printers, document scanners, computer output microfilm (COM) and archive writers. Document imaging is a form of enterprise content management, built around the need to manage and secure the escalating volume of electronic documents (spreadsheets, word-processing documents, PDFs, e-mails) created in organizations.