How to Increase the JPEG Quality from your Scanner!

Many photographers still find themselves faced with a need to scan hundreds
of their old photos. While some scanner drivers output bitmaps (.BMP) or JPEGs,
not all utilities provide the ability to select the level of compression /
quality of the saved JPEG images. This article shows you how to increase the
JPEG quality output from your scanner!

Introduction

The task of scanning photos can be a laborious one. Many of us digital
photographers have stacks of old photo albums that we hope to get around
to scanning someday into our digital asset management software (DAM). But,
just like trying to decide what quality to rip your CDs into MP3s, you
must decide what quality to use when saving the scanned photos.

Choosing a poor image quality means that your collection is only
preserved (through backups) at a relatively low quality. When it comes time
to reprint from your scans, you may wish you had saved with better quality
when you spend the time scanning!

On the other hand, most scanner drivers provide the option to save as
Windows Bitmaps (.bmp). This would be the highest quality, but the
file format does not use any compression. So you end up wasting nearly ten
times as much storage space as is really necessary.

Other Uses for this Technique

The following technique has been used by me to substantially improve the quality of JPEG output from my scanner. The same technique (combined with my Quantization Table listings) have also been used by others to increase the quality of images from their camera phones!

What output options does your scanner driver provide?

JPEG - For natural photos, JPEG is really the
best file format to use because of its extremely effective compression
techniques. However, through lossy compression comes a slight reduction
in quality over uncompressed / non-lossy techniques. As a photographer,
you'll have to make the tradeoff decision.

BMP / TIFF - Lacking an efficient lossy compression
scheme, these file formats will consume huge amounts of disk space. Therefore,
they are best used when the utmost quality must be extracted from the
scanner. For archiving large collections of snapshots, this is not usually
the best choice.

TWAIN Driver - The scanner utility also generally
offers the ability to import scanner data directly into the image editor
of your choice (such as Photoshop). For individual photo scans, this may
be the best choice (as you can work in Photoshop's 16-bit mode), but
it is largely unsuitable for batch scanning (i.e. scanning large collections).

Now that you've decided to use your built-in scanner toolbox / utility to
save out to the JPEG file format, you need to consider what quality options
are available.

An assortment of scanners

I have gone through 5 different scanner models from Canon, HP and other manufacturers over the years. I started out with a bulky flatbed, then went on
to a conveniently-thin LiDE model.
Discovering some of the color limitations of LiDE models, I then made a choice to
return to the bulkier CCD-based flatbed models. While I prefered the scanner control
panel flexibility provided by some other manufacturers, I prefer the actual scan quality from my current hardware choice.

The most useful feature for Bulk Scanning of Photos

One feature provided by many of these scanner control panels is the
called Auto-Crop (or Multi-Crop). It
allows you to place multiple photos
on the flatbed's platen, press a button, and the individual photos are
automatically identified and cropped out into their own files. With a standard-sized
scanner, I can get 3 4x6's cropped reliably, and occasionally a fourth.

The time savings afforded with this feature outweigh the benefits I've
seen with other methods.

What, no JPEG quality settings??

As much as I liked my scanner, I discovered very quickly that the
automated methods provided no option to set the JPEG compression quality level!

This is almost unbelievable, especially on one of the top scanners
available from this manufacturer. I desire the time-savings of the Auto-Crop
automation, combined with reasonable image quality output, while not wasting
significant file space with inefficient file formats.

I contacted Technical Support and was told that there is no way to
control the JPEG compression quality used.

Warning! Warning!

The following article involves hacking your scanner utility software.
This is provided for entertainment purposes only. You will need to read the
license agreement of your scanner software to ensure that modifying it does not violate any
terms. In light of this, I am not providing filenames and file offset values in this tutorial.

That said, this is also quite a complicated process and should only be done by those who are relatively comfortable with the topics covered herein (JPEG compression, quantization tables / quality and hex editors).

How I modified the Scanner's JPEG Compression Quality!

Not happy hearing that there was no way to improve my wonderful
scanner's output, and armed with a reasonable understanding of
JPEG compression, I set out to dig a little deeper.

Easier Method Alert!

With the introduction of JPEGsnoop v1.0.0, you can now automatically locate most DQT tables with the Search Executable for DQT option! For a brief introduction to this option, please see the Interesting Uses page. The steps shown below were done prior to the release of this new time-saving feature.

Step 1 - What amount of compression is used?

Extract the Quantization Table (listed under the DQT Heading) for both Luminance and Chrominance
Take special note of the Approximate Quality Factor. Is it the same
for both Luminance and Chrominance?

Look at the AnnexRatio section -- are the numbers nearly all the same?

In this case, I see a strong trend suggesting that the scanner driver is
saving JPEG images with a quantization table that is based on a linear
multiple of the JPEG Standard's Annex K values. This is very common
among software tools and even some digital cameras.

If we have confirmed that both the Luminance and Chrominance quantization
tables have a strong correlation with the JPEG Standard suggested tables
(as determined by both: consistent AnnexRatio values and similar Quality
Factor value for both tables), then we can presume that the actual
quantization table used in the software is probably calculated dynamically
(at run-time) from the suggested tables in the JPEG Standard Annex K.

Since the tables are calculated dynamically (as opposed to hard-coded),
it makes my work a fair bit harder.

If it turns out that your tables don't seem to match the Standard tables, then you might have a far easier time with the modification. Instead of trying to reverse engineer the table generation (steps 4 and 5), you can search directly for the output table and modify it accordingly (no use of the formula).

Step 2 - Calculate the table in hexadecimal

Later steps will require us to search for an executable for a give-away
hexadecimal string. In this step we will calculate the sequence.

Look at the JPEG standard's quantization table, and convert the values
to hex (if you're really stuck, you can use an online hex calculator.
The hex table values may be represented inside the software with either 1, 2 or 4
bytes per number. I decided to start my search for 2-byte values.

Size (bytes)

C Data Type

Example

Total DQT Table Size (bytes)

1

unsigned byte

5C

64 bytes

2

unsigned short

5C 00

128 bytes

4

unsigned int

5C 00 00 00

256 bytes

NOTE: The above table assumes a little-endian notation, which is
the most likely arrangement for multi-byte numbers stored on Windows PCs.

The following conversion of the JPEG Standard's [Annex K] luminance table into hex assumes 2-byte values (I might have had to redo the same process with 1- or 4-byte integers instead if my 2-byte search came up empty).

Now that I have a table to search for, I want to pick out a
representative string from it. For a variety of reasons (mainly to reduce susceptability to small differences in tables), I decided to select a small range from the chrominance table. I picked a section that was at the start of the constant sequence we see "63 00 63 00" etc.

Sequence selected: 42 00 63 00 63 00 63 00

Step 3 - Look for the hardcoded quantization table

Open your favorite hex editor

Locate your scanner tool software (should be evident from either watching
the Window's Task Manager, or an advanced tool such as FileMon). In my case
I found the .exe file within:C:\Program Files\<Manufacturer>\<Scanner Utility>\<Utility>.exe

Search for the representative hex string

If your search comes up empty, try these other searches:

Search for the 1-byte representation

Search for the 4-byte representation

Search for a different part of the standard table

Assuming that the program is using linear multiples of the JPEG standard
tables (herein called the basis tables), there is a high probability that the table will be stored within
the program somewhere. So it is very likely that you'll be able to find
it with this mechanism. If all of the above fail, there are other
techniques, but they are beyond the scope of this article.

In my case, I found the sequence quite easily with 2-byte unsigned ints (the representation shown above).

Now, we need to work backwards to locate the start of the table(s). By examining the bytes near where the search result found a match, work backwards to find the file offsets of the start of the Luminance table and the start of the Chrominance table.

Step 4 - Finding the Quality Factor

Now that we know that the output is based on the JPEG standard, and (in my case
Approximate Quality Factor 70), I have two choices:

Approach 1 - Search and Modify for the Quality Factor variable

Approach 2 - Modify the Basis Table

While it is possible to modify the quality factor variable (and change it from,
say, 70 to 95), the method to locate this variable is complicated and out of the
scope of this discussion.

Therefore, I'll choose to modify the basis tables instead.

Step 5 - Reverse Engineering the Table

If the program is using the standard tables (lets call them the basis tables) and then calculating a
new quantization table dynamically (i.e. from quality factor 70), then the process becomes a little more complicated.

We know that the program is internally using a formula to convert the basis DQT table to the output DQT tables. A formula that is very commonly used in the industry is the following (popularized by cjpeg and other utilities):

If I want the Scanner Utility to produce an image that uses a JPEG
compression Quality Factor of 97 (similar to decent current-day digital SLRs),
I calculate a DQT Basis table assuming a DQT Output table that represents
a Quality Factor of 97.

Taking my Canon 10d as an example of a high quality factor (~97), again
using JPEGsnoop, I extract the tables as:

Now, I must use the above formula to calculate what the new basis
DQT tables should be to get the desired output DQT tables. Passing
each value through the formula, and then converting to the 1, 2 or 4-byte
representation (as determined earlier), I get the values below. Note that most of the time you can simply round-down any fractional result you get:

Instead of calculating this all out for yourself, you are welcome
to skip ahead and simply use the above values, as they will provide very high-quality
output from your scanner utility. There is no need to match these
quantization tables exactly -- a rough approximation to these will likely
be far better than the built-in values provided with your utility.

Step 6 - Modify the Executable!

Now we can make some actual modifications and cross our fingers...

MAKE A BACKUP COPY OF YOUR EXECUTABLE FIRST!!!

With your hex editor utility, select the range of bytes that you
found earlier that contained the Luminance DQT table. Paste in the series
of hexadecimal bytes that you just calculated above. Make sure that you
are overwriting the exact same length as the original, and not inserting
bytes into the file! (otherwise your executable will fail upon launch).

The snapshots below show the before and after view of the luminance table modifications.

Before Modification

After Modification

Now, repeat the same overwrite for the chrominance table (which may follow
immediately after the luminance table). Save the executable file.

Step 7 - Test it Out!

Open your scanner utility

Save a scan as a JPEG

Open the JPEG in JPEGsnoop and check the quantization table section

If you did everything correctly, you should see that your scanner is
now saving with much higher JPEG image quality than before! Congratulations!

If you succeeded in using this method with your scanner or other program,
please share your results below!

Reader's Comments:

Please leave your comments or suggestions below!

2012-11-19

Bouke

I did hear/read some statements regarding the relative sensitivity of JPEGs to becoming corrupt over time. A few photographers I spoke to have told me this is the main reason why they store their data in RAW (the photos of my wedding are in .NEF format, for instance). I actually have noticed that some of my very old Jpegs did in fact become corrupt after sitting on the harddisk for multiple years. Supposedly shifting one single bit in a JPEG because of anything really causes the whole file to become corrupt?
Is there much truth to this?

As the11thplague also does, I also archive my scans in PNG or TIFF with LZW compression. Only when I badly need the space, I just convert them into (near)max quality jpegs. Depending on the purpose the native formats of the editor (paint.net's in my case) can also be very beneficial. I do not know enough about all the possible RAW formats however to actually use them to my advantage.

Interesting question... The short answer is that, yes, it is safer to store your images in RAW (eg. DNG) format than JPEG.

Although JPEGs files themselves are not more likely to get corrupted than any other file, the impact of a corrupted bit can be much greater than in other file types. This is due to the fact that highly-efficient compression mechanisms eliminate the redundancy that one needs to recover from errors. Other file formats (especially uncompressed ones) would certainly be "safer" for image storage on a medium that is susceptible to occasional errors/deterioration. A single bit error can cause significant distortion and/or color artifacts in an image. If the bit error occurs in certain header bytes, then it can make the entire file unreadable by most decoders (though that is often the case with other file formats too).

2009-04-02

the11thplague

And what about the .png format ? it is a lossless format, compressed like a zip. It's far smaller than .bmp, but it is just as good. Also, it has the transparency option wich neither .jpg nor .bmp have. Plus, all new scanners provide the .png option, and our hard drivers are so huge and cheap that there is no problem about space!

2007-07-29

mark cox

This is an interesting article, with your technique, there is a low chance of failure, because false positives are unlikely when searching for a whole table. I would like to try this on the firmware for my all-in-one, or digital camera

For sure. The tables that are used as the basis for dynamic DQT generation are not always identical to the ones printed in the JPEG standard. So, it may take a little more effort to locate a match in the table (that's why I started with the chrominance table, which is easier to identify). If, instead, the full table is hardcoded (i.e. no dynamic generation), then it becomes far easier.

As for the firmware modification, this is not always easy or even possible. Two big issues I think you may encounter:

Many firmware loads use instruction code compression and unpack it on the device.

Because errors in firmware loads may cause a device to no longer function, there is a high probability that the load is protected by an MD5 or other checksum, that will prevent it from being installed.

On most decent digital cameras, the JPEG compression is likely done in a hardware accelerator ASIC and hence you will not have access to the tables. Software-based encoders are only likely to be found on very low performance digicams and multi-purpose devices (such as your all-in-one).