What's The Best File Format For Scanned Documents?

I want to "digitize" some paper records, personal, financial, medical, etc. For the most part the records are standard 8½x11-inch white paper with black print.

My OS is Windows 7 Pro, 64-bit. For occasional scanning I have been using Windows Fax and Scan, which has four output file formats: .bmp, .jpg, .png, and .tif. (Windows Fax and Scan defaults to .jpg for "documents.")

What file format would be best for digitizing records? Would a format other than one of these four be better?

Lesle, Best is a relative term, depends what you want to do with the scanned docs. If all you want is a non-editable picture I'd go with the Portable Document Format (.pdf) format as it is IMHO the most likely to be still in use years down the line. Another feature of PDF is that several programs can convert it into editable documents should the need arise. Of the ones you mentioned I'd go with .jpg for most of the same reasons as mentioned for .pdf and it has the smallest file sizes of those listed, TTBOMK.

The Following User Says Thank You to RetiredGeek For This Useful Post:

I agree with RetiredGeek, however would add that if you may need them run through an OCR system use tif instead of jpg to get the best/fastest results from most OCR software. Plus the tif format supports multiple pages in one file.

Last edited by RussB; 2012-06-07 at 22:12.

Do you "Believe"? Do you vote? Please Read:LEARNsomething today so you can TEACH something tomorrow.DETAIL in your question promotes DETAIL in my answer.Dominus Vobiscum<))>(

The Following User Says Thank You to RussB For This Useful Post:

I've been digitizing my paper records for years, and have found that PDF is by far the most convenient format when it comes time to retrieve something.

With any of the other graphics formats, it's a hassle to just quickly look up something. Each page is a separate file, so it's a hassle going from "page 1" to "page 2" of a document . . . they always seem to end up with varying resolutions, so when you look at them on screen you have to resize windows or zoom in or zoom out . . . when you want to print something on paper it prints out too tiny or too large and cuts off part of the page . . . it gradually becomes a nightmare managing a boatload of filenames so they're meaningful . . .

A PDF readily supports multiple pages, so you can have a single file of, say, "2011 Checking Statements.pdf", for example. I digitize just about all my records, even if I keep some of them on paper. I refer back to them frequently, and it's just more convenient for me to open a pdf when I need it rather than rummage through a bunch of papers stuffed in a filing cabinet.

You can digitize important records, put them on a flash drive, and drop it in your safe deposit box. Putting a bunch of paper records in your safe deposit box wouldn't be so easy. If anything should happen to you, your survivors are likely to be more familiar with PDFs and know what to do with them, whereas they might struggle with graphics files.

If lesle is committed to sticking with one of the four graphics formats mentioned, I like TIFF (.tif) best. JPEG (.jpg) is a lossy format, so is not as suitable if you subsequently want to resize or otherwise manipulate the image. JPEG's best use is for full-color graphics. PNG is also a good color format, but may not be as broadly supported as the other three by graphics programs--especially freebie or "lite" versions of programs. It's best characteristic is it supports a "transparent" color, so is nice for things like web graphics where you might want to superimpose one graphic over another. BMP is a veteran, widely supported format, but the format doesn't let you specify a dpi resolution like TIFF can.

The TIFF header allows for a dpi spec, so your graphics program may allow you to change it. I'm not aware of such an option for BMP, PNG, or JPEG. If I save a TIFF image at 100-dpi, it will print on paper larger than the same image saved at 300-dpi. So if I scan at 300-dpi and save at 300-dpi, it will print on paper the same size as the original. With the other three formats it's not as easy to control the print size.

The Following User Says Thank You to Paul T For This Useful Post:

It's been a learning curve the last few days. I have an HP C4280 All-in-One. When I migrated from XP to Windows 7, I plugged the USB in, W7 did its thing, and thirty seconds later the printer just worked. I never bothered to install the HP software.

And the Scan in Windows Fax and Scan worked so easily I never thought about using anything else. Since its output is limited to .bmp, .jpg, .png, and .tif, that was my mindset.

Anyhow, I've installed the latest HP software, and a couple of MS xml updates that popped up afterward, and am now scanning to .PDF. The HP software includes I.R.I.S. OCR software, and I'm learning, editing, and saving.