As
organizations move towards managing more and more digital
documents, they must develop a document management system that
will best suit their needs. However, it is not enough to have a
document imaging system; it must also be maintained.

One of the tasks in the implementation and ongoing management of
a document imaging system is to provide estimates of the amount
of storage the organization will need for the short to medium
term.

For any systems, especially large ones, it seems like a daunting
task with many factors to consider. Where to begin?

Estimating

In
the case of a document management system, planners must
determine how much storage will be required for the immediate
future.

This can be accomplished by first determining how much storage
space will be required for the individual documents in the
system.

The best way to do this is to use estimates of the actual
figures. It is much easier for managers to make decisions based
on easily useable, round figures that can be readily multiplied.

For example, 50 thousand bytes (50 Kilobytes) can be used as the
industry standard for a bi-tonal (black and white) scanned
document because it is close to the average size of scanned
pages and also yields an estimate of exactly one million bytes
(a Megabyte) for 20 pages, exactly one billion bytes (a
Gigabyte) for 20 thousand pages, and exactly one trillion bytes
(a Terabyte) for 20 million pages. In business, most documents
will be bi-tonal.

These estimates lean towards an over-estimation rather than an
underestimation of storage requirements. When dealing with these
types of estimation, it is always safer to be conservative,
especially when all assumptions have been factored in.

It is especially important to use conservative estimates since
the addition of hardware in the future can be complicated. You
may need to purchase new platters, new servers and/or reorganize
existing images. All of this can be very costly.

As well, if everyone uses the same estimates, it is easier to
discuss and compare document imaging systems. Because the
estimates are industry standard, less time can be spent
evaluating estimating methods, and more time can be spent
understanding how the system will be used and whether the system
design will accommodate the planned use.

Born Digital Documents (i.e. Microsoft
Word documents etc.)

When documents are imported directly in the form they were
created, they require much less storage space.

For example, a Microsoft Word document will only require 25
Kilobytes of storage space; as opposed to 50 Kilobytes for the
same document when scanned. The reason for this is that scanned
images pick up a lot of extra information through what’s called
“digital noise”.

Compression

All of the figures given for scanned images are for compressed
file sizes, unless otherwise noted. All imaging systems compress
their image files for storage and transmission. Compression
removes the redundancy and digital noise from the files, making
the files smaller. These compressed page files have an average
size of approximately 50 thousand bytes per page for bi-tonal
pages.

Different Resolutions

Image files created by scanning at 200, 300, and 400 dpi (dots
per inch) all have the same information content as the original
image.

The only difference is that the higher resolutions merely
increase the redundancy in the image file.

Compression removes this redundancy. In general, higher
resolution scans of an image are slightly larger than lower
resolution scans of the same image because higher resolution
scans pick up more digital noise.

This variation between the compressed image sizes of different
resolutions is within the variation range of document image
sizes in general. In almost all cases, measuring the actual
sizes of the first one percent of scanned images will easily
adjust for this variation without requiring significant system
changes.

Industry Standard vs. Actual Size of
Documents

Making the assumption that your documents are similar to the
industry average documents usually produces very small
variances.

Because the cost of storage is very low as a percentage of
overall system cost, and is constantly decreasing, an error of a
few percent in an estimate has very little effect on the overall
system cost. If round estimates speed up the understanding and
discussion process, the benefit of rounding far out-weighs the
cost of the slight variances.

After one percent of the documents have been scanned into a
system, an actual average page image size can be calculated.

This actual average page image size will almost always provide
the small correction necessary to adjust previous estimates.
This is the system sizing method used in almost all system
implementations.

The following table illustrates the differences between Industry
standards and the average sizes that were calculated from the
representative sample of documents:

Advancement
Average

Industry Average

Advancement Average

Industry Average

PDF

DOC

XLS

JPG

PDF

DOC

XLS

JPG

1 Page
Standard Typed

34 KB
(at 200 dpi)

50 KB
(at 300 dpi)

62 KB

25 KB

n/a

n/a

50 KB

25 KB

25 KB

n/a

2 Page
Standard Typed

69 KB
(at 200 dpi)

100 KB
(at 300 dpi)

n/a

31 KB

n/a

n/a

n/a

100 KB

30 KB

n/a

Batch Standard
(14 pages, bi-tonal)

2.8 MB
(Avg. of 200 KB
/page at 200 dpi)

700 KB
(Avg. of 50 KB
/page at 300 dpi)

n/a

n/a

n/a

n/a

n/a

n/a

n/a

n/a

1 page
colour standard

320 KB
(at 200 dpi)

500 KB
(at 150 dpi)

87 KB

n/a

n/a

n/a

1 MB

25 KB

25 KB

1 MB

2 page
colour standard

640 KB
(at 200 dpi)

1 MB
(at 150 dpi)

n/a

n/a

n/a

n/a

2 MB

30 KB

30 KB

2 MB

1 page
greyscale standard

262 KB
(at 200 dpi)

500 KB
(at 150 dpi)

n/a

55 KB

n/a

n/a

50 KB

25 KB

25 KB

50 KB

2 page
greyscale standard

324 KB
(at 200 dpi)

1 MB
(at 150 dpi)

n/a

94 KB

n/a

n/a

100 KB

30 KB

30 KB

100 KB

Receipts standard
(PDF)

n/a

100 KB
(at 300 dpi)

62 KB

n/a

n/a

n/a

100 KB

n/a

n/a

n/a

Using these estimates and projecting a future growth rate by estimated
the number and size of future documents facilitates planning for future
storage requirements.

You can also use the same methods of storage estimation for the future
for your analysis of staffing requirements and both sets of statistics
dovetail nicely for use in the planning process.

Thomas Tran has worked at the University of Toronto for 4 years.
During this time, he has assisted in the implementation and
maintenance of the document imaging system in the Division of
University Advancement.