This document suggests three ways of presenting an image with
a caption in HTML. Styling in CSS is also discussed.

Summary: three methods

Sadly enough, there is no
markup for image captions in HTML,
unless you count the
figcaption element in HTML5 proposals.
What comes closest to
semantically associating some text content with some image is putting them
into a table so that the image is in one cell and the text is either in
another cell or in a caption element.
Then there’s the “semantically empty” approach,
which is better than
semantically wrong (such as suggestions to use definition list markup).

There are two basic ways to use a table for an image and its caption,
so as a whole, we have three alternative methods:

a table with the image in one cell and the caption in another cell under it

a table with the image in its only cell and the caption
as the table caption (caption) element

a div element containing both the image and
an inner div element, which contains the caption

The two-cell approach

This approach generates by default (i.e. if you don’t use
style sheets or additional attributes to affect the rendering)
a presentation that is illustrated on the right.
The image and the caption text are in two cells of a one-column table.
The markup above assigns a class, caption, for the caption
text cell, but it’s there just to make styling easier.
The same applies to class image assigned to
the table. There is nothing magic in class names in HTML
and CSS;
they are just names chosen by an author as he finds convenient and
hopefully descriptive to anyone who reads the code.

By default, text in a table cell (td element)
is left-aligned, but you can change this by
using e.g. align="center" in the td tag,
or in CSS (e.g.,
td.caption { text-align:
center).

A table is normally left-aligned by default
and appears with no other content on either side of it.
You can affect
this using a align attribute in the table or,
more flexibly, using CSS. It might be a good idea to set
just some left margin for the table, using e.g. the CSS code
table.image { margin-left:
2em }.

A single-cell table with caption element

This approach is similar to the first one, but instead of putting
the caption text into a cell, you put it inside a caption
element. It is by definition a caption for the entire table, but in this
case, the table has but one cell, containing the image.

By default, the caption would appear above the image, but the
attribute align="bottom" puts it below the image.
You could do the same in CSS using
table.image caption {
caption-side: bottom; },
but this is poorly supported: no support in Internet Explorer.

If you wish to affect the horizontal alignment in a
caption element, use the text-align
property in CSS. For example,
<caption align="bottom"
style="text-align: left">.

Using div elements

A Dalmatian

<div class="image">
<img ...>
<div>caption text</div>
</div>

This is the simplest method, using just div markup.
The inner div element is used for two reasons: to make
the caption text appear on a line of its own, and to make it an element,
so that it can be referred to in CSS (using a selector like
div.image div).

It might be argued that it is even simpler to omit the inner
div markup and use just <br> to create
a line break between the image and the caption. Even the outer div
markup could be omitted on similar grounds. However, the markup
presented here is the simplest reasonable alternative.
The use of div makes it possible to treat the caption text and
the combination of the image and caption as styleable elements.

A div element
has no top or bottom margin by default. You can change this in CSS. For example,
div.image { margin: 1em 0; } would set a top
and bottom margin of 1em. On the other hand, the construct is often
preceded by an element that has a bottom margin, or followed by
an element that has a top margin, such as a paragraph or a heading,
so it does not need margins of its own.

The caption text is left-aligned by default. This can be changed
in different ways, but note that if you use align="center"
for the inner div, the text will be horizontally
centered within the available space, not with respect to the image.

Notes on styling

These three approaches give a tolerable rendering in non-CSS situations
(showing the caption under the image), and they are each a relatively good
starting point for styling. When using a table, you need to consider cell
spacing and cell padding, which are by default nonzero. But there wouldn’t
be strange browser idiosyncrasies to worry about.
The rest really depends on the desired appearance as well as the
properties of the image and the text.

The font in caption texts

A Dalmatian

Typically we’d probably want to set
caption text size to a bit smaller than copy text, and maybe the font face
to something different too, and we might wish to center the text
(though this may depend on its length). In the first approach you could use
the following:

In the two other approaches, you would replace
.image .caption by
.image caption or
.image div,
respectively.

Wrapping long captions

For long caption texts, you need to decide whether
they should wrap according to the width of the image or be set to
some other width. It’s probably best to make the width the same
as that of the image or (for narrow images) just a little wider.

By default, browsers handle the second approach (using a table
and a caption element)
so that the the text is wrapped to the same width as the image.
This is because they determine the width of the table according
to the cell containing the image. If you wish to make sure of this,
you could explicitly set the table width to the same as the image width.
In the first approach, you would need to be explicit about the table
width, either in CSS or in HTML.

A Dalmatian dog. Drawing by Liisa Sarakontu.

In the above example, the caption element has
grey background to illustrate that it extends a bit to the left and to the right
of the image width. This is usually not serious when the text there is centered.
The phenomenon is caused by default cell padding and cell spacing that browsers
apply when rendering a table.
If it becomes a problem, you can fix it in HTML by setting
cellspacing="0" cellpadding="0" in the table element
or in CSS by setting
table { border-collapse: collapse; } td { padding: 0; }.

In the third approach, the caption text by default uses the
available width. The reason is that the width of a div element
by default extends across the available width.

A Dalmatian dog. Drawing by Liisa Sarakontu.

You could change the appearance by explicitly setting the width of the
outer div element, e.g.
<div class="image" style="width:200px">.
Using a style attribute is a practical choice here,
since the width needs to depend on the specific image that appears
inside the element.

Of course, in many cases you could meaningfully use explicit
line breaks (with <br>) markup inside the caption text,
especially if the text has fairly separate parts. For example, you could
write <div>A Dalmatian dog.<br><small>Drawing by Liisa Sarakontu.</small></div>.

A Dalmatian dog.Drawing by Liisa Sarakontu.

Centering

As described above, the caption text can be centered relative to the image
by setting a width the text and using align="center" (HTML) or
text-align: center (CSS) for it.

On the other hand, if you wish to center the image and
its caption as a whole horizontally, then you can simply use
align="center" in the table tag,
if you are using one of the table approaches. In the
div approach, you would use CSS. You could use CSS
in the table approach too, of course.
Note that centering tables and other blocks is surprisingly
problematic. Many constructs that might be expected to center a block
will actually center each line instead, depending on browser.
Please refer to the excellent treatises by Nick Theodorakis:
Centering tables
and
Centering blocks with CSS.

The following example shows an image as centered so that the
caption under it is left-aligned to the left edge of the image.
A simple way to achieve this is to use the two-cell table approach, with
align="center" for the table element
and with the alignment of cells (td) defaulted to
align="left".

A Dalmatian dog.Drawing by Liisa Sarakontu.

Floating the image and the caption

Using the align attribute in an img element,
you can float an image so that appears on the right or on the left of
some text, so that the text flows on the other side of the image.
You can use a more modern approach as well, the float
property in CSS. It’s more logically named as well, since this
is really not about alignment but about floating. Moreover, you should usually
set some left margin for an image floated on the right (and right margin for
an image floated on the left), and CSS is the only way to do this reasonably.
Thus, a simple way to float an image would be to use the attribute
style="float: right; margin-left: 0.5em"
in an img tag.

A Dalmatian

It is almost as easy to do the same when the image has a caption.
Actually, such techniques were already used previously on this page,
In the table-based approaches, you can just use align="right"
in the table, or float: right for it in CSS.
In the third approach, it is clearly best to use the CSS method, since
there is no direct way to float a div in HTML. Here, too,
CSS is the way to set a margin so that text does not come too close to the
image.

To end floating, you can either use
<br clear="all"> in HTML or
clear: both in CSS (for the first element that
should appear with no floating elements on either side).

Fluid galleries

If you have a set of images and you would like to present them as
a collection on one page so that there are several
images side by side, there are several approaches.

A common approch is to use a table, with images in one row,
captions in another, then more images in a third row, etc.
This approach does not linearize well, since when processed
rowwise, the connection between images and captions is lost.
But more importantly, it requires a fixed layout, with a fixed
amount of images in one row. This means that the page requires
a minimum width to be viewed without horizontal scrolling, and
on the other hand it does not utilize the full available width
in a wide window.

The goal here is to make an image gallery
adapt to the available width. For simplicity, let’s assume that the
images are of equal size.

In the simplest case, you could just write img elements in
succession. A browser will then present the images so that it puts as
many images side by side as fits to the available width.
In effect, a browser treats img elements as big letters
and processes a string of images as if it were text consisting of such
letters. The following string of identical images illustrates this.

I use a space between the img elements
in HTML source. This tends to cause some spacing between the images
on common browsers. Whether this is correct is debatable. In any
case, if you don’t want any spacing, don’t leave those
spaces or line breaks between img elements. Instead,
you can put line breaks e.g. after the element name img
before the attributes, where they cause no effect. And if exact spacing
is important, do the same and use CSS properties to suggest specific
margin or padding.

If we wish to put captions under the images,
things become more complicated, but not much.
We can float the elements that contain an
image and its caption. We would use the methods discussed above,
except that we float to the left, using
float: left in CSS or
align="left" in HTML for a table.

We probably want to have some spacing in the gallery.
A simple way is to put some margin on the right and below each image.
For this, we can wrap the elements inside a div
element with some class, say class="gallery", and use
CSS code like the following:

.gallery table { float: left;
margin: 0 5px 20px 0; }

This leaves a 5 pixel space on the right of each image
and 20px space below each each image:

An ornament

An ornament

An ornament

An ornament

An ornament

An ornament

An ornament

An ornament

Remember to stop floating after the
gallery, using the techniques mentioned above.

If the caption texts vary essentially in length,
you need to consider how to
make their boxes equal in size in rendering. This usually requires
you to guess a reasonable height for the boxes.
Moreover, to make the texts vertically aligned to the top
(that is, the bottom of the image), it is simplest to use the
two-cell table approach. In that case, you can simply use
valign="top" (in HTML) or
vertical-align: top
(in CSS) for the cells.
In the next
example, the height of caption cells has
been set to 4em.

An ornament

This is a caption that is essentially
longer than the other captions.

An ornament

An ornament

An ornament

Captions and accessibility

A caption should not be confused with an
alt attribute,
which specifies the textual alternative to be
presented in place of the image, when the image itself is not
presented (e.g., on a text-only browser). Neither of these
should be confused with the title attribute,
which specifies an “advisory title” for an element,
typically implemented as a tooltip that is displayed when the
pointer is moved over the element.

If an image is purely decorative or just visualizes
something that has been said in the text, it is appropriate
to use an empty alternate text, alt="".
In that case, when accessing the page without images,
the page would appear as if the image were not there at all.
This however creates problems if the image has a caption.
The caption text would appear on its own, leaving the user
in confusion: what does this relate to? Thus, in such cases,
it might be suitable to include the caption text into the
image itself, using image processing software.

Normally, on the other hand, if an image has a caption, it is probably a
content image and the caption text just describes what the image is
about, instead of conveying its full message. Then the odds are that
it would be better to have the caption read first, giving those
users who have some way of accessing images (maybe the user is
just surfing with images disabled?) a basis for deciding whether to
try to access this particular image.
The easiest way to achieve this (and still make the caption appear
below the image in visual rendering) is to use the method
of a single-cell table and a caption element with
align="bottom".

Unfortunately, there’s no way to suppress a caption in non-visual
rendering except by making the caption part of the image.
For example, if your page contains
some article that tells about some meeting and is illustrated
by a photo of the meeting, with a caption, then both the photo
and the caption should probably be omitted in non-visual rendering.
In that case it’s probably the least of evils to use a short alt
text like
"(photo of the meeting)".
Putting the caption into the image itself might not be practical enough,
and besides, it might be relevant to the user to know that an image
is available even if cannot (for now) see the image.

Why not dl markup?

For some odd reason, the suggestion to use dl
(Definition List) markup pops up fairly often. Logically, it makes no
sense; such markup
should be reserved for genuine definitions of terms,
as discussed in
Definition: a definition and an analysis.
Presentationally, it creates a rendering that is rather poor,
as shown below. Although it might be possible to tune the rendering
using CSS, this would be more difficult and less reliable than
styling simple div elements.

The reason why browsers render the construct that way has nothing
to do with images or captions. They render a dl element
so that the dt elements are indented somewhat and the
dd elements are indented even more, and each of those elements
starts on a new line:

term

a word or expression that has a precise meaning
in some uses or is peculiar to a science, art,
profession, or subject

terminology

terms of a particular subject area;
(study of) proper ways of creating and using terms

So this is why the caption text gets indented relative to the
image. Such indentation is generally not suitable, since normally
captions should be either left-aligned or centered with respect to
the image.
But if desired, the indentation
can be achieved very simply, and with a controlled amount of
indentation,
in the approaches described above,
e.g. by setting a left margin for the caption.

If a speech-based browser implemented a dl
element according to its defined
semantics (ignoring any examples in the specification that contradict
that), it would be natural to read<dl><dt>xxx</dt><dd>yyy</dd></dl>
as follows: “Definition list. Term: xxx. Definition data: yyy.
End of definition list.” Current browsers probably don’t do that,
but would you really like to fear that some browsers start
behaving by the specs? (Maybe there is no fear, because the HTML5 drafts
effectively turn dl to a list of paired items
with no real semantics.)

Using a definition list with a single dt
element and a single dd element inside would be
semantically odd. A list can have just one element, though
it’s a rather pathetic list and makes sense in special
case only. But this is not the main point. The point is that
neither an image nor its caption
is a term being defined. Well, except in a very special example like the
following, which illustrates the absurdity of using dl
markup for normal combinations of an image and its caption:

Here mass.gif would refer to an image that
consists of the word “mass”
in some appearance.

The dl element is in practice just a visual layout trick, and a coarse
and unreliable trick at that. Quite often
the layout would not even be suitable but needs tedious styling.
Besides, the dl
is more difficult to style than most elements, since its default
rendering is complicated and hard to describe, and there are quirks
in CSS implementations that make the styling even harder.

The HTML5 figure and
figcaption markup

According to HTML drafts,
figure markup can be used as a container for
an illustration (such as one or more images), with
figcaption element inside it giving a caption
for the image or images.
This means markup like the following:

You should probably also add some top and bottom margin
for figure and also some left margin.
HTML5 drafts suggest a left margin of 40px, but this is currently
not what browsers usually do. So you should explicitly specify
the left margin you want.

In order to have the caption rendered e.g. below the image
in a box that is as wide as the image, it is probably best
to use a small script on the page. The script can traverse
the figure elements on the page and set the width of such
an element equal to the img element contained in it,
if there is just one img element there.
Similar techniques can, of course, be also applied when some
other markup is used for image captions.

This is caption text for the image, to be rendered
inside a rectangle as wide as the image.

For comparison

For a different view on image captions, see
CSS: figures & captions
by Bert Bos. I don’t see any reason to use
paragraph (p) markup in a simple structure
consisting of an image and its caption. But if you use it,
note that paragraphs typically have default rendering that
involves top and bottom margins, though they might be suppressed
if the paragraph is inside a table cell.

See also
Scalable Figures and Captions with CSS and HTML
by Robert J. O’Hara.
It discusses, among other things, the distinction
between a legend (extended prose) and a caption
(a descriptive word or phrase only). Both are treated as
captions in my document, but it is useful to note that there
can be different “captions”
that should be styled differently.

Technically, it is possible to
include a caption into the
image itself
using a suitable graphics program. Although that’s
a simple approach and although many programs generate such images
automatically, it has essential drawbacks.
Text that has been “burned” into an image is not directly
accessible as text to programs, and its font cannot be changed the same
was as normal text font can. If you need to change the text, you need
to manipulate the image instead of simple text editing.
Thus, if you wish to use the image in documents in different languages,
things get awkward.
Moreover, e.g.
Google image search
is based on searching for images using keywords, and Google associates
words with images by their appearance near to each other (and in some
other ways). A caption text embedded into an image itself is of course
not accessible to Google, but if the caption text
appears as real text right after the
image, Google may find the image when someone searches with words
that appear in the caption.

Yet another approach is to wrap an image and its caption in a container and
declare it as an inline block, using
display: inline-block as described in the
CSS 2.1 draft. This approach would have some rather nice features especially
in fluid galleries, but unfortunately browser support is still too small.
In particular, IE has some bugs (e.g., the default width is 100% if the container
is block-level in the HTML sense) and Firefox 2 lacks support. In some years, though,
this might become a feasible alternative.