Chapter 6: Adding Images to Your Site

After a chapter with sections urging publishers to eschew graphics, why
a chapter urging that publishers gear up to distribute thousands of
photographs? Independence Day.

The
book by Richard Ford, about a long uneventful
weekend in the life of Frank Bascombe, a divorced real estate salesman
in Haddam, New Jersey, won the Pulitzer Prize for Literature:

"Unmarried men in their forties, if we don't subside entirely into the
landscape, often lose important credibility and can even attract
unwholesome attention in a small, conservative community. And in
Haddam, in my new circumstances, I felt I was perhaps becoming the
personage I least wanted to be and, in the years since my divorce, had
feared being: the suspicious bachelor, the man whose life has no
mystery, the graying, slightly jowly, slightly too tanned and trim
middle-ager, driving around town in a cheesy '58 Chevy ragtop polished
to a squeak, always alone on balmy summer nights, wearing a faded yellow
polo shirt and green suntans, elbow over the window top, listening to
progressive jazz, while smiling and pretending to have everything under
control, when in fact there was nothing to control."

The script of Independence
Day, the
movie, in which aliens blow up the White House, wasn't
quite as well-crafted. Yet more people went to see the
movie than read the book.

People like pictures.

People like movies even better than pictures but, for most publishers,
they are unrealistically expensive to produce and, for most users even
in 2003, they are unrealistically slow to download.

This chapter documents a method for building an
image library and presenting it to each Web user in the best way for
that particular person.

Images on the Web Can Look Better than on a Magazine Page

Computer monitors don't have the resolution of photographic prints but
they can display a wide range of tones. If you take care with your
images, you may be able to present the user with the jewel-like
experience of viewing a photographic slide.

High-quality black-and-white photographic prints have a contrast range
of about 100 to 1. That is, the most reflective portion of a print
(whitest highlight) reflects 100 times more light than the darkest
portion (blackest shadow). The ideal surface for reflecting a lot of
light would be extremely glossy and smooth. The ideal surface for
reflecting very little light would be something like felt.

It is not currently possible to make one sheet of paper that has
amazingly reflective areas and amazingly absorptive areas. Magazine
pages are generally much worse than photographic paper in this
regard. Highlights are not very white and shadows are not very black,
resulting in a contrast ratio of about 20 to 1.

Slides have several times the contrast range of a print. That's because
they work by blocking the transmission of light. If the slide is very
clear, the light source comes through almost undimmed. In portions of
the slide that are nearly black, the light source is
obscured. Photographers always suffer heartbreak when they print their
slides onto paper because it seems that so much shadow and highlight
detail is lost. Ansel Adams devoted his whole career to refining the
Zone System, a careful way of mapping the brightness range in a
natural scene (as much as 10,000 to 1) into the 100 to 1 brightness range
available in photographic paper.

Computer monitors are closer to slides than prints in their ability to
represent contrast. Shadows correspond to portions of the monitor where
the phosphors are turned off. Consequently, any light that comes from
these areas is just room light reflected off the glass face of the
monitor. You would think that the contrast ratio from a computer monitor
would be 256 to 1 since display cards present 256 levels of intensity to
the software. However, monitors don't respond linearly to increased
voltage. Contrast ratios between 100 to 1 and 170 to 1 are therefore
more typical.

Most Web publishers never take advantage of the monitor's ability to
represent a wide range of tones. Rather than starting with the original
slide or negative, they slap a photo down on a flatbed scanner. A
photographic negative can only hold a tiny portion of the tonal range
present in the original scene. A proof print from that negative has even
less tonal range. So it is unsurprising that the digital image resulting
from this scanned print is flat and uninspiring.

Start by Thinking about Building an Image Library

Rather than throwing together digital images on a per-project basis, a
publisher is best off thinking in terms of building a digital image
library. A successful image library has the following properties:

You obtain high-quality digital representations of each image.

Once scanned, you never have to go back to the original negative or slide.

You can quickly find an image.

Obtaining a high-quality scan isn't so difficult. You must go back to
whatever was in the camera, that is, the original negative or slide. You
should not scan from a dupe slide. You should not scan from a print. The
scan should be made in a dust-free room or you will spend the rest of
your life retouching in PhotoShop. Though resolution is the most often
mentioned scanner specification, it is not nearly as important as the
maximum density that can be scanned. You want a scanner that can see
deep into shadows, especially with slides.

Never going back to the original slide or negative again means never. If
a print magazine requests a high-resolution image to promote your site,
you should be able to deliver a 2,000x3,000 pixel scan without having to
dig up the slide again. If there is a fire at your facility and all of
your slides burn up, you should have a digital backup stored elsewhere
that is sufficient for all your future needs.

Quick retrieval means that if a user says, "I like the picture of
the dog holding the rope in SQL for Web Nerds (http://philip.greenspun.com/sql/),"
it is easy to find the high-resolution digital file.

Using Kodak PhotoCD to Manage Your Image Library

Kodak PhotoCD is a scanning standard that kills three birds with one,
er, CD. Every image on a CD is available in five resolutions, the
highest of which is 2,000x3,000. Kodak will tell you that you can make
photographic-quality prints up to 20x30 inches in size from the
scans. Scans are made from original negatives or slides in
dust-controlled environments inside photo labs. The newest generation
(1996) of PhotoCD scanners can read reasonably deeply into slide
film shadows. If you want more shadow detail and an additional
4,000x6,000 scan, you can get a Pro PhotoCD made.

The full PhotoCD scans are in a proprietary format that browsers don't
understand but tools such as Adobe PhotoShop do. Each .pcd file is
about 5 MB in size and is therefore reasonably quick for a print
publisher to download.

Quickly locating an image on PhotoCD is facilitated by Kodak's provision
of index prints. You can work through a few hundred images per minute by
looking over the index prints attached to each CD-ROM.

What does all of this cost? About $1 to $3 for a standard scan from a
35mm original; about $15 for a Pro scan from a medium format or 4x5-inch
original. When you consider that you are getting 500MB of media with
each PhotoCD, the scanning per se is free (in other words, it is
cheaper to have a PhotoCD made than to buy a hard disk capable of
storing that much data).

Delivering your library to the Web

Users are not going to thank you if you present a 5MB .pcd Image Pac
file straight from the PhotoCD. For starters, it would take 12 minutes
to download with a 56K modem. And then when they were all done, they
wouldn't be able to view it because Web browsers don't understand Kodak
PhotoCD format. There are two ways to reduce the size, and therefore
downloading time, of digital image files. The first is to cut down the
resolution; rather than offer 2000x3000 pixels (many more pixels
than are displayable on current monitors) offer the 256x384 pixel size
from the PhotoCD. Each pixel requires three bytes (one for red, one for
green, and one for blue) and hence the file size reduction is from 18 MB
down to 294 KB. The JPEG image compression
system throws out information in a photographic picture that, in theory,
your eye won't miss. Typical scenes can be compressed 10 to 1 before
visually objectionable artifacts appear. That results in a file
size of 29 KB, downloadable by a 28.8 modem user in about 10 seconds.

Suppose that a publisher has a high-resolution digital image in a large
uncompressed file. How to make this image available to Web users?

What most commercial publishers do is manually produce a single JPEG
that will fit nicely on the same page as the text, on a small monitor.
This is usually about 180x250 pixels in size. The user gets a little
graphical relief from plain text but can't extract much information from
the photo unless it is a very simple image.

At right: an example of an unlinked, and hence frustrating, thumbnail.
It is a ruin in Canyon de Chelly and users who want to see
construction details of this Anasazi dwelling won't get any help from
this image.

A slightly more sophisticated approach is to produce two JPEGs. Place
the smaller one in-line on the page and make it a hyperlink to the
larger JPEG. This helps the curious user but note that there is no
effective way to caption the image and the user only gets to choose
between two sizes unless you uglify your pages by adding a second
hyperlink to a "huge" JPEG.

At right: a thumbnail linked to a larger, static JPEG. Note that
there is no convenient way to associate a caption with either image and
certainly the user won't get any context once he has requested the
larger JPEG. Note: this is, of course, the co-author Alex at the age
of three months, from http://philip.greenspun.com/dogs/alex

Think about the user interface of the "small linked to large" system
from the user's perspective. The publisher shows you a small picture.
If you want a bigger view, you click on it. This is fundamentally the
correct user interface, but it is a shame that clicking on an image
doesn't also result in the display of a caption, of technical details
behind the photograph, of options for yet larger sizes.

What readers really want is for a thumbnail to be a link to an HTML page
showing the caption, technical details, a large JPEG, and links to
still-larger JPEGs or perhaps print-quality versions. This will involve
a lot of work unless the publisher has automated production from initial
high-res scan through to the Web directories (more on that below), but
the users get the best possible interface... or do they?

What should a server infer from the user's action in clicking on a
thumbnail? That he wants to see a 512x768 pixel JPEG? What if he has a
really big monitor? A really small monitor? A really slow Internet
connection? A really fast Internet connection? What if he has
previously told me what he wants?

Most of the thumbnails on my site are now linked to computer programs.
Since I run AOLserver, these are written in the AOLserver Tcl API but
they could just as easily be Perl CGI scripts or Microsoft Active Server
Pages. The programs first check the Magic Cookie headers on requests
for larger images. If the user has previously said "I'm a nerd with a
1600x1200 pixel monitor and a fast connection, please always give me
1000x1500 pixel JPEGs by default" then the program sees the cookie and
generates an HTML page with a large JPEG on it, plus the caption and
tech details (if available). If there is no cookie, the program has a
reasonable default (in 2003, it is to serve an HTML page with a 512x768
pixel JPEG, captions, and links) plus a "click here to personalize this
site" option. Try it out with the image at right.

Creating JPEGs from PhotoCD Image Pacs

Before plunging into a description of scripts that convert 100 images at
a time, let's talk about what steps are necessary to convert an
individual image.

Using PhotoShop

Though it is time-consuming, the highest quality way to produce a JPEG
from a PhotoCD Image Pac is with Adobe PhotoShop:

Open the image pac file on the PhotoCD
(typically something like G:\Photo_cd\Images\Img0001.pcd )
from the File menu in Adobe PhotoShop ("Open" command in older versions,
"Import" in newer versions of PhotoShop)

Remove the "PhotoCD haze." Start with the Levels command
to see a histogram of the intensities in the image. Note that there is
probably nothing at either the full white or full black ends of the
spectrum. Try pushing the Auto Levels button. If the results are too
radical, undo it and manually slide the white point a bit to the left
and the black point a bit to the right.

Straight
off the PhotoCD. Note the warm purple sunset colors. What a
dedicated photographer I must have been to stay out with a tripod
until after sunset and then hike back to the car in the dark.

After Auto Levels: a snapshot taken at midday!
This is how I remember the scene and how the picture looks when printed
by a pro lab.

Edit and crop as desired. A WACOM drawing tablet makes it much
easier to do fine touch-up work or realize artistic ambitions with
PhotoShop.

If you want to add a black border, change the background color to
black and use the Canvas Size command to increase the canvas to 102
percent in width and height.

When you are satisfied with your image, save it as a PhotoShop
native format file using the Save As command.

Use the Unsharp Mask filter once with the following settings:

Image

base/16

base/4

base

base*4

base*16

Resolution

128x192

256x384

512x768

1024x1536

2kx3k

Amount

100

100

100

100

100

Radius

0.25

0.5

1.0

2.0

4.0

Threshold

2

2

2

2

2

Be prepared to undo and change the settings if the result isn't what you
want. Unsharp Mask finds edges between areas of color and increases the
contrast along the edge. The Amount setting controls the strength of the
filter. I usually stick to numbers between 60 percent and 140
percent. Radius controls how many pixels in from the color edge are
affected (so you need higher numbers for bigger images). Threshold
controls how different the colors on opposing sides of an edge have to
be before the filter goes to work. You need a higher threshold for noisy
images or ones with subtle color shifts that you want left unsharpened.

Use the Save a Copy command to write your image out as JPEG, medium
quality. I usually pick a descriptive filename, add the image number
from the PhotoCD, add a resolution number (1 for base/16 through 5 for
base*16) and then explicitly add a ".jpg" because I know that
I'll be FTP'ing this file to a Unix server, e.g.,
"hairy-dog-37.4.jpg" for a 1024x1536 pixel photo that came
from IMG0037.pcd off the disk.

Use the Revert command to reload the unsharpened PhotoShop format
file from disk.

Use the Image Size command to shrink the picture to a standard
PhotoCD size, then use Unsharp Mask again.

Save the small image as JPEG, low or medium quality. I use the same
filename as in Step 8, but with a different resolution number, e.g.,
"hairy-dog-37.2.jpg" if the image was resized to 256x384.

One of the rationales behind this approach is that you never sharpen
twice and that sharpening is always the very last image processing
step. That's why an intermediate PhotoShop native format file is
necessary.

Batch processing

Instead of working with one image at a time, it is worth thinking about
using a program that will take hundreds of images and, for each one,
produce four sizes of JPEG and a Web server script that will generate
HTML pages on-the-fly with captions. PhotoShop has some scripting
capabilities but they aren't quite powerful enough to do this. Instead
you must embark on a little programming odyssey. First, write a
flat-file for each PhotoCD containing captions for each image on the
disk. From this file, it is easy to write a program to automatically
produce all the necessary Web content to implement the "user choice"
scheme described above.

Here's the format for the description file:

line 1: spec for which resolutions to convert
line 2: on which resolution images to write copyright notice | notice
line 3: apply sharpening? (scanning blurs images)
line 4: add black borders (nice for negatives)
line 5: URL stub (where on the server will this image reside?)
lines 6-N: one line/image, formatted as the example below
<number on disk>|<target file name>|<which_resolutions>|<caption>|<tech details>|<alt>|<tutorial info>

The first line says "By default, for each image, convert the first, third,
and fourth resolutions from the PhotoCD file." The second line says, "On
the third, fourth, fifth, and sixth resolutions, add 'copyright 1993
philg@mit.edu' to the bottom right corner of the JPEG." The next few
lines specify that all images from
this disk will be sharpened and surrouded by black borders. The files
will reside in the /images/pcd1765/ directory from my server
root.

The first image marked for conversion is actually number 2 on the disk
(at right). The JPEG file is specified to contain the word "bearfight".
The next field is empty ("||") because this image should be converted to
the default resolutions (1,3, and 4). The caption comes next and,
finally, some technical information behind the photo (most publishers
wouldn't care about this but remember that photo.net is visited by a lot
of photographers who are improving their craft).

What does a Web page author get for all of this? Most valuably, /images/pcd1765/index-fpx.
Here's a fragment of the HTML from that undifferentiated list of all
the photos on the disk:

Note that this linked thumbnail can be cut and pasted into a document
anywhere else on the same Web server because the IMG and HREF are both
to absolute URLs (beginning with a forward slash; to use this chunk of
HTML on another server you'd have to add the "http://www.photo.net/"
hostname onto both the HREF= and SRC= values). In fact, if you've
organized your PhotoCD indices in the file system you need not cut and
paste. Twenty lines of Lisp code will program a good text editor (Emacs) with a new command so that one can
just type "1765" and then "2" and the editor finds the reference and
inserts in into a document. Once the captions are written, it takes me
only about an hour to add 100 images from a single PhotoCD to HTML
documents dispersed around the server.

There are a couple of things to note about the .tcl script to which the
thumbnail points. First, it runs inside the AOLserver process rather
than as a separately forked CGI script. This is at least a factor of 10
more efficient than starting up a Perl script or whatever. Since the
script executes only when a user clicks on a thumbnail, the efficiency
probably doesn't matter unless you have an unusually popular site on an
unusually wimpy server. Still, it is nice to conserve a server's power
for image processing and running a relational database management
system.

The second point is about software engineering and is hard to illustrate
without showing the source code for the Tcl script:

Yes, the whole thing is basically just a single procedure call to
philg_img_target, defined server-wide. Suppose Web
standards evolve so that there is a way for the server to tell a client
to what color space standards an image was calibrated. By changing this
single procedure, one could make sure that all of the thousands of
PCD-derived images on photo.net are sent out with a "I was scanned in
the Kodak PhotoCD color space" tag. The photos will then be rendered on
users' monitors with the correct tones (right now they come out way too
dark on Linux, a bit dark on Windows, and a bit too light on the average
Macintosh). All of the personalization code that sets and reads magic
cookies is also encapsulated in philg_img_target (the
source code for which is available at http://philip.greenspun.com/panda/philg_img_target.txt).

How does this all happen? The image conversions from PhotoCD to JPEG
are done with ImageMagick. You can find precompiled binaries for most
Unix variants and also Windows 95/NT at http://www.imagemagick.org. Here's
what a typical ImageMagick command (to make a thumbnail
JPEG) looks like:

Note that this is all a single shell command. There are a bunch of
optional image processing flags before the "from-file name."
To avoid the production of progressive JPEGs (see below), the default
from ImageMagick, the command invokes "-interlace NONE". Any
kind of scanning introduces some fuzziness into the image so it is best
to ask ImageMagick to sharpen a bit with "-sharpen 50". For a
two-pixel border all around, there is "border 2x2". One could
choose the color but the default will be black. Finally, ImageMagick
lets you add a comment to the JPEG file. Anyone who opens the JPEG with a
standard text editor like Emacs will be able to see that it is
"copyright Philip Greenspun" (conceivably other programs might
display this information, but PhotoShop doesn't seem to, on either the
Macintosh or Windows). See http://www.imagemagick.org/www/convert.html
for a complete list of options.

The from-file name, "/cdrom/PHOTO_CD/IMAGES/IMG0013.PCD;1[1]"
is mostly dictated by the directory structure on a PhotoCD. However, the
final "[1]" tells ImageMagick to grab the 128x192 pixel
thumbnail out of the Image Pac ("Base/16" size in Kodak
parlance). The to-file name, "bear-salmon-13.1.jpg" includes
the image number (13) from the original PhotoCD to make it easier
for a human to find the entire Image Pac if requested.

ImageMagick is convenient but typing 300 commands like the above for
each PhotoCD would get tedious. An girlfriend of the authors' took pity
and used what she had learned as an MIT undergraduate to write a Perl
script to automate the process: http://philip.greenspun.com/panda/pcd-to-jpg-and-fpx.txt.

Of course, no romance is one sunny
Perl-filled day after another. When she delivered the code, I
complained about the lack of data abstraction and told her that she
needed to reread Structure
and Interpretation of Computer Programs (Abelson and Sussman;
MIT Press 1997), the textbook for freshman computer science at MIT. I
also asked her to rename helper procedures that returned Boolean values
with "_p" ("predicate") suffixes. She replied "People will laugh at you
for being an old Lisp programmer clinging pathetically to the
1960s" and then dumped me.

Thus with a minimum of mechanical labor, we managed to get thousands of
images on-line, in three sizes of JPEG. What was our reward? Hundreds
of e-mail messages from people asking where they could find a picture of
a Golden Retriever or a waterfall. Once again MIT friends and Perl came
to the rescue. Jin Choi spent 15 minutes writing a Perl script to grab
up all the captions from the flat files into one big list. We then
stuffed the list into a relational database table:

"70-200"
(the images noted as taken with a Canon 70-200/2.8L zoom lens)

Is this a great system or what? Let's consider what happens if there is
a mistake in spelling a word in a filename. To correct the mistake, one
must change the description.text file. Then one must
either rerun the conversion script over the entire PhotoCD or manually
change the filenames of three JPEGs (plus all the HTML documents that
reference the thumbnail). Then one must rename the .tcl script that
corresponds to the image and also update a row in the relational
database table that sits behind the stock photo index system.

Perhaps it would be better to keep the image information in just one
place: the relational database (RDBMS). Then we can drive the image
conversion and .tcl file production from there. If we were willing to
make the serving of plain old images dependent on the RDBMS being up and
running, then we could just have HTML files reference photos according
to their ID in the database and dispense with the static .tcl files.

Once we're running this as a database service, wouldn't it make more
sense to let other people use it? If many photographers are using the
same RDBMS server, then a fifth grader doing a report on Paris could search
for "Eiffel Tower" and find all the photographers who had images of that
monument and were willing to share it. It could also be useful for
professional photographers who wanted to collaborate around images with
their clients.

Organizing JPEGs on Your Web Server

If you're not going to go the full RDBMS-backed route, you need to think
about where to put JPEG files on your Web server. At first it seems
reasonable to put image files into the same directories as the .html
files where they are used. This can make it tough to find and tough to
cross-reference images.

Here's the system that we used on photo.net for many years:

Keep a separate directory for each PhotoCD with a name like
"pcd1253", where the "1253" corresponds to the last
four digits of the Kodak serial number, printed prominently on the index
print that comes with the disk and less prominently around the CD-ROMs
spindle hole.

If you manipulate images from this disk, put them in a subdirectory
like "pcd1253/manipulated-images".

Make all of your references from HTML files absolute paths to the server root; for example,

<IMG SRC="/images/pcd1253/pink-lady-and-dogs-8.1.jpg">

Keep an index page for each PCD that contains absolute links
to each image, each one ready to be cut and pasted into another HTML
file and work properly.

Adding Images to Your Web Pages

An image-rich HTML document need not take any longer to load than an
image-free document. In 1994, Netscape extended the HTML element IMG
with WIDTH and HEIGHT attributes. If set, they tell modern browsers how
big an image will be once loaded. That way the browser is able to lay
out the complete text of an HTML document before any of the
images have loaded. Here's an example for a thumbnail:

Note that if the image isn't exactly 128x192, the browser will crudely
resize it to fit, potentially resulting in a fuzzy or blocky picture.
Publishers who don't use automated production techniques such as those
described above oftentimes find that they are lacking the desired size
of digital image. They think to themselves, "I know that I want it to
be 200x300 pixels so I'll just put in WIDTH=300 HEIGHT=200 tags".
Here's the result, an e-mail message from a friend and the author's
response:

I have just been through [a part of his site], which was completed six
months ago. It is beginning to erode. The images are fading. Something
is wrong with the server, or somewhere down the pipe. I am very
worried that everything I'm doing has the possibility of fading. The
images look weakened, diluted, and on the Mac, they look shamefully
weak, like there's some complete idiot behind the site. None of us are
idiots, and I may have been very inexperienced at the beginning of
this web site, but I've been working with digital images since 1990
and I've never known of this happening before.
Something has to be done soon about this.
I notice in that you're using WIDTH and HEIGHT tags in Netscape to
rescale your images, e.g., on
http://.. [*** URL omitted to preserve the friendship ***]
do a "Page Info" in Netscape and you'll see that it is "455x288 (scaled
from 504x319)".

Unless you're using my batch-conversion script or Web/db application, it
is probably best to leave the WIDTH and HEIGHT blank. When you're all
done with the page, run WWWis, a Perl program that grinds over an .html
file, grabs all of the GIF and JPEG files to see how big they are, then
writes out a new copy of the .html file with WIDTH and HEIGHT attributes
for each IMG. The program is free and available from http://bloodyeck.com/wwwis/.

Other attributes potentially worth adding to your HTML tags are the following:

"hspace=5 vspace=5" These will tell the browser to leave
five pixels of white space around the image.

"border=0" If the IMG is a hyperlink to a larger JPEG, it
will be surrounded by a blue border. If you think this blue border is
ugly then you can set border=0 to turn it off. Personally I leave the
border because it serves as a user interface cue.

"align=left" This lets text flow around the right edge of
the IMG.

Alternatives to the JPEG format

Standard JPEGs are not always the best format. If you are publishing
line drawings or other very simple graphics with large areas of a single
color, then a GIF can be sharper and smaller (see the comparison of GIF
and JPEG sizes at http://philip.greenspun.com/wtr/img-format/).

Some people like interlaced or "progressive" images, which load
gradually. The theory behind these formats is that the user can at
least look at a fuzzy full-size proxy for the image while all the bits
are loading. In practice, the user is forced to look at a fuzzy
full-size proxy for the image while all the bits are loading. Is it
done? Well, it looks kind of fuzzy. Oh wait, the top of the image seems
to be getting a little more detail. Maybe it is done now. It is still
kind of fuzzy, though. Maybe the photographer wasn't using a tripod. Oh
wait, it seems to be clearing up now ...

A standard GIF or JPEG is generally swept into its frame as the bits
load. The user can ignore the whole image until it has loaded in its
final form, then take a good close look. He never has to wonder whether
more bits are yet to come.

The most serious problem with JPEG files is that they aren't stored in
any kind of standard color space. So if you edit them to look good on a
Macintosh monitor (gamma-corrected), they might look horrible on
someone's Linux machine (typically not gamma-corrected). PhotoCD Image
Pacs are always produced in a calibrated color space, thus preserving
your investment in an image library against the day when the Web
standards finally enable a publisher to specify a gamma along with an
image.

Alternatives to PhotoCD

If you have a negative or slide larger than 4x5" in size and/or very
dense, you will want to get a drum scan at a service bureau. Drum
scanners use photomultiplier tubes that can read much higher densities
than the CCD scanners used by Kodak for the Picture Imaging
Workstation. Unfortunately, they are also expensive and slow, so it
costs at least $75 to have a high-resolution drum scan made.

If you're in a hurry to get a handful of images from slides or negatives
onto the Web, you may wish to use a desktop film scanner. The most
interesting recent practical innovations in desktop scanners are
systems that look at a piece of film from different angles and try to
figure out what is a dust particle or scratch and what is part of the
camera-formed image. An example of this technology is "Digital ICE",
incorporated in the Nikon Coolscans.

Photos from the Users

While a publisher may have a substantial library of images on film it is
safe to say that digital cameras are ubiquitous in the hands of readers.
Anywhere that users contribute text is an opportunity for you to let
them contribute pictures as well. Sadly you can't expect users to do
any resizing or thumbnail production so your server needs to be able to
invoke ImageMagick or a similar tool at upload time. Expect the user to
upload an enormous full-resolution image, e.g., 6 megapixels or
2000x3000 pixels. If you present this back unmodified to other readers
they won't be happy because (a) the image will be slow to load, and (b)
the image will be larger than their screen (unless resized by their
browser). You'll probably want to produce a thumbnail and an 800x1200
pixel version, discarding the original upload unless you believe that a
lot of your users will want to make prints.

Do be careful in reviewing user-submitted photos. In the United States,
it is illegal to possess child pornography and it is not a defense to
say that you did not know that you posessed it. You want to make sure
that the Federales don't seize your server because of something a user
uploaded. [One implication of this law is that an effective way to get
rid of annoying relatives is by visiting their house, slipping some
child pornography behind the sofa, then tipping off the police to the
perverts at 80 Elm Drive. Your in-laws will plead righteous ignorance of
what was behind their sofa all the way to Club Fed.]

Digital Watermarking

It is technologically feasible to hide a pattern of bits into an image
and subsequently test for the presence or absence of this pattern. This
process is called digital watermarking. It works by very slightly
changing the color or intensity of individual pixels. In theory the
presence of the watermark need not visibly degrade image quality.

Here are two common uses of watermarking:

tag an image with "This image was taken by Joe Photographer"

tag an image with "This image was taken by Joe Photographer and
downloaded by Jane User on August 6, 1998" (requires software to process
images on-the-fly and also that Jane User have identified herself to the
server before downloading the image)

In theory watermarking will make it more difficult for someone to
misappropriate an image and use it without permission or payment. In
practice, there are some minor problems:

Unless it is so strong as to be visually objectionable, the
watermark may be unrecoverable after a few JPEG decompressions and
recompressions, as might be typical if an unauthorized user were
adapting the image for some other use.

There is no single standard watermarking system. Thus an honest
person who wished to check whether or not an image was watermarked would
have no way to do so. They would have to download software from 20
different vendors and check a particular image with all those programs.

The most serious problem with watermarking is that it is unclear how it
helps an individual photographer. The watermark might help alert a
photographer to the theft of an image. However, if the user is in
another country it may not be practical to enforce ownership.

A Personal Approach to Copyright

For my first couple of years of publishing on the Web, I got very huffy
about copyright infringement and entertained dark thoughts of litigation
when one of my photos, without credit or payment, appeared on a site or
in a print magazine. I didn't spend days behind a tripod waiting for
the perfect moment so that people could use my work to promote their
ugly commercial ventures.

But realistically litigation in the United States isn't for middle
class people. I couldn't afford to hire a team of lawyers to chase down
the miscreants and I didn't want to spend my life filing lawsuits
myself. I decided to focus my energy on creating new works and not lose
sleep over piracy of the old ones.

Then one of my readers e-mailed me a Web page with an uncredited
usage of one of a bear photograph from Travels with Samantha:
". . . Yes, the IRS does try to intimidate and bully you. And, they
publicly crucify some public figure each year (Willie Nelson, Leona
Helmsley, Darryl Strawberry, Pete Rose, etc.) who has been caught with
allegedly "fudged" returns. But, as you will learn, there is
NO LAW, anywhere, that mandates you file a tax return! . . ." For
$88-1000, you could purchase assistance in "How To STOP Paying
Federal Income Tax - LEGALLY!"

This wasn't exactly what I had in mind when I set up my site, so I
sent the company some email requesting that they remove the picture. The
response was frustrating but I was open-mouthed in admiration of its
creativity:

Our programmer says that he has never seen your book, and that the picture
came from a site that listed lots of pix (both gif & jpg) which were
presented as available for download by anyone!
At this point we are not interested in getting into a tussle with you, but
the question is now open whether the picture is original with you or you
took the picture from the same source.
We'll need some coroberation to verify your claim to the picture before we
go any farther.
Dana Ewell, CEO
!SOLUTIONS! Group

I still didn't have enough money to hire a team of lawyers to "put
the genie back in the bottle" but why not use the following assets:
(1) philip.greenspun.com is more interesting than average; (2)
philip.greenspun.com is more stable and, because it is so old,
better-indexed than average. Every time another publisher used one of my
10,000 on-line images with a hyperlinked credit, I might earn a new
reader. Did their site suck? So much the better. The user wouldn't be
likely to press the Back button once he or she arrived at my site
through the "photo courtesy http://philip.greenspun.com"
link. Of course, my site is non-commercial so extra readers don't
translate into cash, but it is satisfying that my work becomes available
to people who might not find it otherwise. That took care of the
good-faith users. Could I use my site's stability and presence in Web
indices to deal with the bad-faith users?

It hit me all at once. An on-line Hall of Shame. I'd send
non-crediting users a single e-mail message. If they didn't mend their
ways, I'd put their names in my Hall of Shame for their grandchildren to
find. I figure that anyone reduced to stealing pictures is probably not
creative enough to build a high Net Profile. So a search for their name
won't result in too many documents, one of which will surely be my Hall
of Shame. What if the infringer were to retaliate by putting up a page
saying "Philip Greenspun beats his Samoyed"? Nobody would ever
find it because
a
Google search for "Philip Greenspun"
returns too many documents. Try it right now, then try
"Shawn Bonnough"
for comparison, making sure to include the string quotes
in both queries.

A practical-minded person might argue that my system doesn't get me any
cash or stop any infringement. A technology futurist might argue that
one of the micropayment schemes from the 1960s is going to be set up
Real Soon Now. Both of these people are right I guess, but I won't be
sorry if what seems to be the evolving custom on the Internet solidifies
into law: send e-mail asking for permission to republish, be scrupulous
about crediting authors, and prepare to be vilified if you flout these
rules.

Would that be the best possible system? Maybe not, but it has to be much
more efficient for society than a bunch of corporations hiring lawyers
to sling mud at each other in court. Under my system, we can enjoy
seeing our work (with credit) on other folks' sites, vent our spleens at
midnight by adding to a Web page of transgressors, and then move on to
new productive activities.

Note that other folks seem to be independently settling on this
philosophy. In 2002, for example, the Open Content initiative was
launched to develop a license that one can use to give away content in
exchange for credit when it is reused. See www.opencontent.org.

Summary

Here is what you might have learned in this chapter:

Photographs are valuable content, as distinct from graphics which
are generally a waste of your money and the user's downloading time.

When dealing with film-originated images, always scan from an
original negative or slide, never from a print.

Think seriously about the fact that you are building an image
library and whether the Kodak PhotoCD system can help you.

Build a structured database of information about your images that
you can use for automatic conversion to JPEG format, file naming, and
captioned HTML code generation.

Use WIDTH and HEIGHT tags in your IMGs so that the text on pages
with in-line images doesn't take longer to load.

look for opportunities to let users contribute photos

Stopping other people from using your creative work probably won't
increase your ability to sell that creative work and will cost a
tremendous amount of time and money; it is usually better to build a hall
of shame and devote yourself to creating new works.

More

For thinking about how traditional photography tackles the problem
of compressing scene contrast down to the capabilities of negatives,
slides, and prints, read
Ansel Adams's
The
Negative (Little Brown, reprinted 1995)
and
The Print
(Little Brown, reprinted 1995). Another great book about the technical
aspects of photography
is
Basic Photographic Materials and
Processes (Stroebel 2000; Focal Press).

Reader's Comments

Some of the information (only the technical, not related to aspects of photography) can be updated. JPEG images now have the capability to use headers (digital cameras now embed all the technical information inside the headers themselves using the EXIF format). Also it is possible to put comments inside the images, so you dont have to maintain data about image seperately from the image. All you need to do is to put the information in the image once and then program your webserver to pull that information out as and when required (full scale searching based on comments, exposure, camera model...).