PostScript file conversions

Introduction

The two main paths from LaTeX source to a PDF file require you to either
convert all your images and graphics toEncapsulated PostScript (EPS),
or convert them all from PostScript to something else.
You can't intermix the two kinds of files. This means you need to be able
to convert accurately between PS and non-PS graphics.

Another reason for wanting to convert back and forth is to use
PostScript's nice fonts to annotate raster images and photographs that
start out in some other graphics format. The idea is to convert the file
to PS; add the annotations; then convert back to whatever you need.
Once again, reliable conversions into and out of PostScript are necessary.

Although we think of PostScript mainly as a vector graphics
language, it does support rasterized images.
So these conversions are, in principle, quite possible. But in practice,
they turn out to be awfully tricky to execute cleanly.

Problems

These conversions aren't as easy as using the netpbm
package to convert among other graphics formats.
For example, the otherwise very versatile anytopnm script can't
handle PostScript. And, though there are programs called pnmtops
and pstopnm, they have incompatible defaults and options, so that
you can't just say

pnmtops image.pnm > image.ps

and then

pstopnm image.ps > image.pnm

and get the original image back again. (Indeed, that last
command line will produce an empty image.pmn file, because
pstopnm will write a file named image001.pnm.)

A major conflict between these two commands is that pnmtops
generates a PostScript file that uses the setpagedevice
command, which implicitly invokes initgraphics and so defeats the
attempts of pstopnm to center and scale the image correctly.
(This situation is mentioned, though rather indirectly,
at the end of the pstopnmman page.)
So you must ensure that the ignored transformation would actually have
done nothing, if it had been executed.

Instead of using this troublesome pair of commands,
you might suppose that you could use convert or the gimp
to read one format and write the other. You can; but you usually find
your image resized, and (often) mangled in the process.
These programs actually use gs to interpret PostScript;
so problems come from the arcane behavior of gs, and the
difficulty of controlling it indirectly.

Conceptual difficulties

Which way is up?

One obvious source of confusion between PostScript and the PBM family of
raster images is that the origin of coordinates is at the top of
a PNM image, but the origin of a PostScript raster image (or page) is at
its bottom.

However, this turns out not to be a serious problem. On
the one hand, the conversion commands handle this directional
problem automatically; on the other, even raw PostScript code has a simple
way to invert an image (by changing the sign of one or more elements in
the transformation matrix).
Unless you delve into the actual PostScript code of a page, this reversal
of the positive y direction isn't apparent.

Pixels and points

There is another, less obvious, difficulty in such conversions.
The image pixels in raster images are usually addressed by (dimensionless)
row and column numbers. But in PostScript, the coordinates in raster
images, like everything else, are expressed in points, which have
dimensions: 1 pt = 1/72 inch.
That means you have to keep track of units in converting to or from
PostScript.

Another way of looking at the problem is this: although PostScript is
device-independent as long as it's setting type and rendering vector
graphics, it becomes device-dependent (or at least, resolution-dependent)
when dealing with raster images.
This fact is emphasized in section 6.1.1 of the
PostScript
Language Reference manual,
which says

The distinction between document generation and document rendering is
essential ….

From this point of view, PostScript's natural resolution is 72 dpi.
But the “device resolution” of an X-Window screen is 100 dpi.
Most laser printers are 300, 600, or 1200 dpi.
These numbers are incompatible; you can't fit an integral number of 72-dpi
pixels into any of them.

Ghostscript complications

Worse yet, Ghostscript, which is invoked by the gs command, wants
to render color and grayscale images at 60 image pixels per inch,
using Floyd-Steinberg dithering.
(That's because early color inkjet printers used a dot spacing of 360 dots
per inch; this choice gave a dithered halftone spot of 6×6 or
36 ink drops.)

Furthermore, the various programs and scripts that use gs to do
the actual work on PostScript files have different default resolutions.
The Gimp, for example, uses 100 dpi on the X-Window
system. The default is 300 dpi in pnmtops.
And gs defaults to the 72 dpi natural to PostScript (i.e., one
pixel per point).

You'd think this could be handled by setting a convenient resolution (or
other options) in the GS_OPTIONS environmental variable. But some of the
scripts and programs that invoke gs override this on the gs
command line.

Pixels vs. paper

Another consequence of PostScript's physical units is that the default
PostScript output medium is a page of paper, either American
“letter” size or European A4.
And many programs that produce PostScript output are used to send output
to a PostScript-compatible laser or inkjet printer.
So most of the programs that write PostScript want to center your raster
image on a sheet of paper.

This centering can add unwanted white margins to an image converted
from PS to rows and columns of pixels, or trim off parts of the image that
fall outside the imaginary sheet of paper.
There are ways of coaxing most of the gs-using programs and
scripts to do what you want. But it usually isn't their default behavior.

An Example: from graphics to PostScript

Here's the composite image of the upper green rim of the low Sun at
several different altitudes near the horizon. This image is the final
result, annotated with altitudes (at the left) and a scale bar in the
upper right. These details were added to the PostScript version of an
original PNM file, which is shown
below.

This is the original figure. Though converted to a PNG file for
compactness, it shows exactly what the image looked like before
conversion to PostScript. (Remember that PNG uses lossless compression;
it preserves all the image detail.)

Note the smoothness of the upper limb in each sub-panel. Also, notice
that the bottom edge of each sub-image is a sharp discontinuity.
These details may not be obvious to the eye at normal screen-viewing
distances, so I've enlarged a small part of the image.

In the enlargement below,
you can see the individual pixels. It's just the upper left corner of the
image to the left, magnified 4 times.
Apart from the finite resolution, the image structure is quite smooth and
regular.

[Scroll on down to see the enlargement.]

Now, suppose you take that original PPM image and convert it to PostScript
with the command line

pnmtops start.pnm > ps1.ps.

Then display it with gimp, accepting its default
resolution of 100 dpi. Here's what you'll see:

The first thing you notice is that a large white area surrounds the image.
That's because gs, by default, places it on a full page that
corresponds to the default paper size. The white area at the bottom and
left side is part of this page. (The image is truncated because I let the
Gimp use the image's BoundingBox; evidently, it misplaced it. The image
was scaled down to 50% of full size by the Gimp to fit on the screen.)

A second problem is that the image has been rescaled. If you use the
ImageMagick utility identify to show the number of rows and
columns in each version, you'll find that, while the original was
547×606, the PostScript file ps1.ps
created by pnmtops has an
image only 526×582, which is just 0.96 as large.

This reduction is a side effect of the scaling assumed by pnmtops.
This program assumes the input image has 300 pixels per inch; and it
produces PostScript output at exactly 72 dpi (the standard PS scaling).
However, if you don't tell it how to scale things, it assumes a fictitious
output device with a scale of (300/72 = 4.166666 …) — but
then rounds this value to an integer, namely, 4. Now, 1/4 of 300
is 75; so the output ends up being scaled to a size 72/75 = 0.96 of the
input size.

So if you accept the Gimp's default resolution of 100 dpi, the image
is re-scaled to 100/72 of the original. As (100/72) × (72/75) =
100/75 = 4/3, what the Gimp displays is 4/3 the size of the original
image: instead of 547×606, it's 730×808 pixels. (See the
scales on the Gimp's window frame.)

You can get the Gimp to display ps1.ps at the right scale,
and without distortions,
if you specify 75 dpi; but the top and right edges of the image are still
truncated (as shown at the left here) if you allow it to use the BoundingBox.
The scales show that we have an area 547×606 pixels, all right; but
it's not centered on the original image.

Though it isn't obvious, the top of the figure is missing. (Count the
sub-panels: there were 6 in the original image, but the top one is missing
here.)
Also, the right edge has been truncated; this is most obvious in
the lower right corner of the image.

The truncation and white border are caused by the Gimp's attempt to use
the BoundingBox. Because you told it to use 75 dpi instead of 100, it
shifted the displayed area only 75/100 as far up from
the lower left corner as it should have, instead of
all the way to the center of the page. But if you tell Gimp to
ignore the BoundingBox, you get lots of surplus white space at the top
and right edges, as shown below.
At the right is the image you get if you tell the Gimp to open ps1.ps
at 75 dpi, but not to use the bounding box.
Though the image is still displaced, you get to see all of it.
But it's still surrounded by a large white area; in fact, the white-page
background is now much larger than the size of the image, so Gimp
shrinks the page to fit it on the screen.

You might think that these problems are just due to some bug in the Gimp,
so that using gs or gv to display the image would work
OK.

[Scroll down again.]
Here you can see the image properly centered on the page.
I've added a border around the full image to
distinguish the white background of the gs display
from that of the Web page.

Unfortunately, there are now little spiky artifacts projecting from the
upper limb at regular intervals. Clearly, something is wrong.

produces. The centering is correct, but there are again errors in the
display. These appear as subtle irregularities
in the solar limb. At first glance, they appear to be missing lines,
which might be attributed to the incorrect scaling.

But the problem is more complicated than that.
Let's use xmag to blow up an example (shown below):
Here you can see that there's a step in the limb, all right; but there's
also a vertical artifact, below and to the left of the jog in the limb.
(Look directly below the word “new” in the xmag headings.)
This vertical feature certainly can't be due to a missing line.

So, while the differences between the gs, gv, and
gimp displays of the same image obviously are in the
displaying process, rather than the PostScript image itself,
the nature of the problem is not immediately obvious.

In any case, we'd like to get rid of the centering problem that gimp
makes evident, and the resulting unwanted white border that is displayed
by gs and gimp.

To get rid of the white area, we need to use the -nocenter
option to pnmtops. That doesn't in itself fix the problem; but
it does at least get the image into the corner of the page.

To prevent the mis-scaling, you have to use the -equalpixels
option of pnmtops, as well as the -nocenter option.
This will produce an output
that's at exactly 72 pixels per inch on the PostScript page.
So the result of

pnmtops -nocenter -equalpixels start.pnm > ps2.ps

can be displayed correctly by gimp, but only if you tell it to
use the correct dpi setting.

You might have expected that to be 72, but it's really
300 — the value assumed by pnmtops at its input side.
As a result, trying to display ps2.ps with either gs or
gv, which assume 72dpi, produces an image scaled down by 72/300 =
6/25 = 0.24, which is even farther from what you want. You just get a
little postage-stamp image like the one shown at the left.

So let's try to correct for this by imposing 72 dpi on the input side,
by specifying -dpi 72 as well:

pnmtops -nocenter -equalpixels -dpi 72 start.pnm
> ps3.ps

Now the Gimp displays the image correctly at 72 dpi. And so does gv;
though gs by itself, with no special options, still produces a garbled
image. And we finally have an output image that identify says is
547×606, the same as the original version.

The curious thing is that all three PostScript versions
have the same image data; it's only the transformation matrix and the
centering of that image that change. In fact, the actual pixels are
correctly converted to PostScript in every case, as you can confirm by
printing any of these images on a good PostScript printer.

That means the problem is entirely in gs, which the Gimp and
gv both use to display PostScript files.

Variations on this theme

There are a couple of additional points that need attention. First, if
your input image is in “landscape” format, pnmtops will try to
rotate it to fit on a standard page. To prevent this and preserve the
proper orientation, you must add -noturn to the list of
pnmtops options. (It doesn't hurt to do this, regardless of the
original orientation.)

Second, the default is to generate uncompressed PostScript. The file can
be made much smaller by adding -rle to the pnmtops
options. However, the price of a smaller file is much more time spent
in rasterizing the image. For example, a page with two large images on it
took up 313 kB with the -rle option, instead of 7.6 MB without;
but the compressed version took 2 hours to print, while the
uncompressed one printed in 13 minutes (on an old, slow, HP LaserJet III).
That's because the image had to be decompressed by the PostScript
interpreter in the slow printer.

That means the compression saved a factor of 24 in disk space, but cost a
factor of about 9 in printing time. But modern printers are much faster;
the printing time of the compressed version was about a minute on a LJ 4100N.
Transmitting all those uncompressed bytes over the parallel port kept the
cpu load of my 1400 MHz Athlon near 70% for those 13 minutes, too.

The Example Continued: from PostScript to PNM

It turns out that we also find gs being used to convert
back from PostScript to PNM (or any other) graphics: Ghostscript is the
PS interpreter used by not only the pstopnm program, but also
convert, as well as the Gimp.

So it's necessary to understand the options to gs to get these
conversions done cleanly.
It's a little easier to use pstopnm (rather than gs) to do the
conversions, as it needs fewer options on its command line. But you still
need to understand what it's telling gs to do.

The simple command

pstopnm ps3.ps

writes a file named ps3001.ppm, which looks like this:

You'll notice the unwanted white border is still here, though at least the
image is now in the lower left corner of the “page”. The problem is
that pstopnm, like pnmtops, wants to have a full page
for its output image. [Not a bug, but a feature, right?]

Furthermore, this image has those nasty glitches along the solar limb;
here's a magnified view of them:
So we still have that problem
to contend with. And displaying the image with
the Gimp shows that, once again, the image has been re-scaled.

The man page for pstopnm indicates that we should be
able to get rid of the unwanted white borders by adding -xborder=0
and -yborder=0 to the options.

produced the image shown at the right. This clearly got rid of the
unwanted borders; but there are still artifacts along the limb.
And identify ps3.ps shows that this image is 607×673
instead of 547×606, so there's still unwanted scaling.

Once again, it turns out that the scaling is an unwanted feature of
pstopnm: it assumes you want to fit the image into a standard
page of paper, and enlarges it to just fit on the page.

The way around this is to tell the program what size you really want, by
adding -xsize=547 and -ysize=606 to the
options.

So we actually need the command line

pstopnm -xborder=0 -yborder=0 -xsize=547 -ysize=606
ps3.ps

to get what we want.

Actually, you should probably use -xmax and
-ymax instead of -xsize and -ysize: if you don't,
and the image turns out to be bigger than the number of points available on a
standard sheet of paper, you'll get unwanted scaling again.
(And note that the = signs can as well be replaced by spaces; pstopnm
will parse the arguments correctly either way.)

Now we find that ps3001.ppm has the correct dimensions,
displays correctly in the Gimp, and is in fact identical to our original file!

Discussion: the gs display problem

You might have noticed that gs messes up the display when
(and only when) the image is re-scaled. When it's displayed pixel for
pixel on the screen, there's no problem.

Typically, the artifacts have a blocky appearance:
there are periodic
discontinuities in the image in both x and y. That's a
symptom of Floyd-Steinberg dithering — which, it turns out, is what
gs uses to present continuous-tone images on bitmapped displays.

As a horrible example of bad dithering, here's what the Gimp shows if you
open ps2.ps at 360 dpi:

This display is enlarged to show details of the dithering artifacts.
The effect produced looks like an uneven mixture of wide and narrow lines,
badly out of synch.

There seems to be no way to prevent this unwanted dithering from the gs
command line, apart from specifying a fictitious “page size” that
exactly matches the image. It's a nuisance to have to run identify
to learn what those required dimensions are, but it appears to be
unavoidable.

More problems

Unfortunately, you sometimes can't avoid rescaling and the resulting
dithering problems. For example, suppose you want to match the width of a
figure to the width of the text column in a PDF file. Unless your
original just happens to have the right width, it will have to be
scaled to fit the column width.
If you use gv to display the PDF file, it will be using gs
again; the image will be scaled, and will suffer dithering errors.

However, there's a way to ameliorate the display problem.
You can tell gs to interpolate instead of dithering.
The problem is that you don't have access to the gs command line
used by gv or the Gimp. But you can set the environmental variable
GS_OPTIONS to include the -dDOINTERPOLATE flag, like this:

export GS_OPTIONS='-dDOINTERPOLATE'

The resulting display will look a lot better, even though it's not
perfect. Here's an example of setting that flag in the environment and
then displaying ps2.ps with gimp at 360 dpi:

As you can see, the limb detail is now smooth and looks acceptable. But
the interpolation has produced an artifact at the edge between sub-panels
of the figure: look at the row of half-bright pixels at the boundary.
So this clearly isn't an acceptable way around the
conversion problem; it just improves the display.

Plan B

There's another way to avoid these display/conversion problems.
If you convert a PostScript file to PDF format, you can use xpdf
to display the result. And it has an entirely different rendering engine;
it does not use gs. Consequently, xpdf shows
images without appreciable distortion.

Furthermore, there are conversion utilities based on xpdf
that will produce much better images. For example, pdftopbm
generates quite good bitmaps from line drawings. And pdfimages
will extract equally good PPM images (both black-and-white and color)
from PDF files.

Now the problem is to make sure the image wasn't corrupted when it was
made into a PDF. The shell script ps2pdf, like its cousin, the
Perl script epstopdf, invokes gs; but they only add a
PDF wrapper to the original PostScript code, and so do not
corrupt the images. Any artifacts occur when the PDF file is displayed,
not when it's created.

Curiously, there is even some image corruption when the PDF file is
displayed by Adobe's acroread reader. Apparently it also does
some dithering and/or interpolation.

PostScript image formats

Another complication has to do with the way images are encoded in
PostScript. Image data can be stored with 1, 2, 4, 8, or (in Level 2) 12
bits per component:

1 bit per component is appropriate for black-and-white figures,
such as line drawings. It corresponds to the PBM bitmap format.

2 and 4 bits per component don't seem very useful.

8 bits per component is appropriate for grayscale (PGM) images
(0 to 255).
It's also used for full-color (PPM) images, with 3 (RGB) or 4 (CMYK)
components per pixel.

While the pstopnm command lets you specify explicitly which of
the PNM formats (PBM, PGM, or PPM) to write, the pnmtops command
doesn't let you pick which PS image encoding is used. Instead, the latter
maps input PBM files to 1 bit per component, PGM to 8, and PPM to
8×3.

A related complication is that the Gimp won't write PBM images, only PGM
ones. And, to make matters worse, the identify command doesn't
distinguish among PBM, PGM, and PPM, but reports them all as PNM.
(You'll need to use the file command to make sure what's actually
in a PNM file.)

The examples offered above all involved color images and PPM files, so these
problems don't arise. But when you're dealing with black-and-white
line drawings,
you may need to invoke the pgmtopbm command to reduce
the file sizes before converting to PostScript.

So, to sum up:

To go from PostScript to PPM (which can be converted to any other graphic
format, like PNG), use

pstopnm -xborder=0 -yborder=0
-xmax=<width> -ymax=<height> image.ps

and the portable pixmap is written to image001.ppm. [Note that
the output is named a ppm file, although the command is
pstopnm.] You'll need to run
identify image.ps
first, to find the numerical values to use for <width> and
<height>. And, if the original EPS file doesn't have the
right page size, you could have more problems; see the page on
converting EPS to PNG
for details.

Usually, you'll want to convert the PNM file to PNG, which is enormously
more compact (and uses lossless compression, so the compressed version is
still an exact copy of the original). To do that: