Publication
as
a
Working
Draft
does
not
imply
endorsement
by
the
W3C
membership.
This
is
a
draft
document
and
may
be
updated,
replaced
or
obsoleted
by
other
documents
at
any
time.
It
is
inappropriate
to
cite
W3C
Drafts
as
other
than
"work
in
progress".

Abstract

This
specification
defines
XHTML
1.0,
a
reformulation
of
HTML
4.0
as
an
XML
1.0
application,
and
three
DTDs
corresponding
to
the
ones
defined
by
HTML
4.0.
The
semantics
of
the
elements
and
their
attributes
are
defined
in
the
W3C
Recommendation
for
HTML
4.0.
These
semantics
provide
the
foundation
for
future
extensibility
of
XHTML.
Compatibility
with
existing
HTML
user
agents
is
possible
by
following
a
small
set
of
guidelines.

HTML
4.0
[HTML]
is
an
SGML
(Standard
Generalized
Markup
Language)
application
conforming
to
International
Standard
ISO
8879,
and
is
widely
regarded
as
the
standard
publishing
language
of
the
World
Wide
Web.

SGML
is
a
language
for
describing
markup
languages,
particularly
those
used
in
electronic
document
exchange,
document
management,
and
document
publishing.
HTML
is
an
example
of
a
language
defined
in
SGML.

SGML
has
been
around
since
the
middle
1980's
and
has
remained
quite
stable.
Much
of
this
stability
stems
from
the
fact
that
the
language
is
both
feature-rich
and
flexible.
This
flexibility,
however,
comes
at
a
price,
and
that
price
is
a
level
of
complexity
that
has
inhibited
its
adoption
in
a
diversity
of
environments,
including
the
World
Wide
Web.

HTML,
as
originally
conceived,
was
to
be
a
language
for
the
exchange
of
scientific
and
other
technical
documents,
suitable
for
use
by
non-document
specialists.
HTML
addressed
the
problem
of
SGML
complexity
by
specifying
a
small
set
of
structural
and
semantic
tags
suitable
for
authoring
relatively
simple
documents.
In
addition
to
simplifying
the
document
structure,
HTML
added
support
for
hypertext.
Multimedia
capabilities
were
added
later.

In
a
remarkably
short
space
of
time,
HTML
became
wildly
popular
and
rapidly
outgrew
its
original
purpose.
Since
HTML's
inception,
there
has
been
rapid
invention
of
new
elements
for
use
within
HTML
(as
a
standard)
and
for
adapting
HTML
to
vertical,
highly
specialized,
markets.
This
plethora
of
new
elements
has
led
to
compatibility
problems
for
documents
across
different
platforms.

As
the
heterogeneity
of
both
software
and
platforms
rapidly
proliferate,
it
is
clear
that
the
suitability
of
'classic'
HTML
4.0
for
use
on
these
platforms
is
somewhat
limited.

XML
™
is
the
shorthand
for
Extensible
Markup
Language,
and
is
an
acronym
of
eXtensible
Markup
Language
[XML]
.

XML
was
conceived
as
a
means
of
regaining
the
power
and
flexibility
of
SGML
without
most
of
its
complexity.
Although
a
restricted
form
of
SGML,
XML
nonetheless
preserves
most
of
SGML's
power
and
richness,
and
yet
still
retains
all
of
SGML's
commonly
used
features.

While
retaining
these
beneficial
features,
XML
removes
many
of
the
more
complex
features
of
SGML
that
make
the
authoring
and
design
of
suitable
software
both
difficult
and
costly.

First,
XHTML
is
designed
to
be
extensible.
This
extensibility
relies
upon
the
XML
requirement
that
documents
be
well-formed
.
Under
SGML,
the
addition
of
a
new
group
of
elements
would
mean
alteration
of
the
entire
DTD.
In
an
XML-based
DTD,
all
that
is
required
is
that
the
new
set
of
elements
be
internally
consistent
and
well-formed
to
be
added
to
an
existing
DTD.
The
greatly
eases
the
development
and
integration
of
new
collections
of
elements.

Second,
XHTML
is
designed
for
portability.
There
will
be
increasing
use
of
non-desktop
user
agents
to
access
Internet
documents.
Some
estimates
indicate
that
by
the
year
2002,
75%
of
Internet
document
viewing
will
be
carried
out
on
these
alternate
platforms.
In
most
cases
these
platforms
will
not
have
the
computing
power
of
a
desktop
platform,
and
will
not
be
designed
to
accommodate
ill-formed
HTML
as
current
user
agents
tend
to
do.
Indeed
if
these
user
agents
do
not
receive
well-formed
XHTML,
they
may
simply
not
display
the
document.

The
following
terms
are
used
in
this
specification.
These
terms
extend
the
definitions
in
[RFC2119]
in
ways
based
upon
similar
definitions
in
ISO/IEC
9945-1:1990
[POSIX.1]
:

Implementation-defined

A
value
or
behavior
is
implementation-defined
when
it
is
left
to
the
implementation
to
define
[and
document]
the
corresponding
requirements
for
correct
document
construction.

May

With
respect
to
implementations,
the
word
"may"
is
to
be
interpreted
as
an
optional
feature
that
is
not
required
in
this
specification
but
can
be
provided.
With
respect
to
Document
Conformance
,
the
word
"may"
means
that
the
optional
feature
must
not
be
used.
The
term
"optional"
has
the
same
definition
as
"may".

Must

In
this
specification,
the
word
"must"
is
to
be
interpreted
as
a
mandatory
requirement
on
the
implementation
or
on
Strictly
Conforming
XHTML
Documents,
depending
upon
the
context.
The
term
"shall"
has
the
same
definition
as
"must".

Reserved

A
value
or
behavior
is
unspecified,
but
it
is
not
allowed
to
be
used
by
Conforming
Documents
nor
to
be
supported
by
a
Conforming
User
Agents.

Should

With
respect
to
implementations,
the
word
"should"
is
to
be
interpreted
as
an
implementation
recommendation,
but
not
a
requirement.
With
respect
to
documents,
the
word
"should"
is
to
be
interpreted
as
recommended
programming
practice
for
documents
and
a
requirement
for
Strictly
Conforming
XHTML
Documents.

Supported

Certain
facilities
in
this
specification
are
optional.
If
a
facility
is
supported,
it
behaves
as
specified
by
this
specification.

Unspecified

When
a
value
or
behavior
is
unspecified,
the
specification
defines
no
portability
requirements
for
a
facility
on
an
implementation
even
when
faced
with
a
document
that
uses
the
facility.
A
document
that
requires
specific
behavior
in
such
an
instance,
rather
than
tolerating
any
behavior
when
using
that
facility,
is
not
a
Strictly
Conforming
XHTML
Document.

An
attribute
is
a
parameter
to
an
element
declared
in
the
DTD.
An
attribute's
type
and
value
range,
including
a
possible
default
value,
are
defined
in
the
DTD.

DTD

A
DTD,
or
document
type
definition,
is
a
collection
of
XML
declarations
that,
as
a
collection,
defines
the
legal
structure,
elements
,
and
attributes
that
are
available
for
use
in
a
document
that
complies
to
the
DTD.

Document

A
document
is
a
stream
of
data
that,
after
being
combined
with
any
other
streams
it
references,
is
structured
such
that
it
holds
information
contained
within
elements
that
are
organized
as
defined
in
the
associated
DTD
.
See
Document
Conformance
for
more
information.

Element

An
element
is
a
document
structuring
unit
declared
in
the
DTD
.
The
element's
content
model
is
defined
in
the
DTD
,
and
additional
semantics
may
be
defined
in
the
prose
description
of
the
element.

Functionality
includes
elements
,
attributes
,
and
the
semantics
associated
with
those
elements
and
attributes
.
An
implementation
supporting
that
functionality
is
said
to
provide
the
necessary
facilities.

Implementation

An
implementation
is
a
system
that
provides
collection
of
facilities
and
services
that
supports
this
specification.
See
User
Agent
Conformance
for
more
information.

Parsing

Parsing
is
the
act
whereby
a
document
is
scanned,
and
the
information
contained
within
the
document
is
filtered
into
the
context
of
the
elements
in
which
the
information
is
structured.

Rendering

Rendering
is
the
act
whereby
the
information
in
a
document
is
presented.
This
presentation
is
done
in
the
form
most
appropriate
to
the
environment
(e.g.
aurally,
visually,
in
print).

User
Agent

A
user
agent
is
an
implementation
that
retrieves
and
processes
XHTML
documents.
See
User
Agent
Conformance
for
more
information.

Validation

Validation
is
a
process
whereby
documents
are
verified
against
the
associated
DTD
,
ensuring
that
the
structure,
use
of
elements
,
and
use
of
attributes
are
consistent
with
the
definitions
in
the
DTD
.

A
document
is
well-formed
when
it
is
structured
according
to
the
rules
defined
in
Section
2.1
of
the
XML
1.0
Recommendation
[XML]
.
Basically,
this
definition
states
that
elements,
delimited
by
their
start
and
end
tags,
are
nested
properly
within
one
another.

This
version
of
XHTML
provides
a
definition
of
strictly
conforming
XHTML
documents,
which
are
restricted
to
tags
and
attributes
from
the
XHTML
1.0
namespace.
See
Section
3.1.2
for
information
on
using
XHTML
with
other
namespaces,
for
instance,
to
include
metadata
expressed
in
RDF
within
XHTML
documents.

The
root
element
of
the
document
must
designate
the
XHTML
1.0
namespace
using
the
xmlns
attribute
[
XMLNAMES
].
The
namespace
for
XHTML
1.0
is
defined
to
be:

http://www.w3.org/TR/xhtml1

There
must
be
a
DOCTYPE
declaration
in
the
document
prior
to
the
root
element.
If
present,
the
public
identifier
included
in
the
DOCTYPE
declaration
must
reference
one
of
the
three
DTDs
found
in
Appendix A
using
the
respective
Formal
Public
Identifier.
The
system
identifier
may
be
modified
appropriately.

XHTML
Documents
may
be
labeled
with
the
Internet
Media
Type
text/html
or
text/xml
.
When
labeled
as
text/html
,
documents
should
follow
the
guidelines
set
forth
in
Appendix C
.
Failure
to
follow
these
guidelines
will
almost
certainly
ensure
that
the
document
will
fail
to
be
processed
on
older
implementations.

The
XHTML
1.0
namespace
may
be
used
with
other
XML
namespaces
as
per
[
XMLNAMES
],
although
such
documents
are
not
strictly
conforming
XHTML
1.0
documents
as
defined
above.
Future
work
by
W3C
will
address
ways
to
specify
conformance
for
documents
involving
multiple
namespaces.

The
following
example
shows
the
way
in
which
XHTML
1.0
could
be
used
in
conjunction
with
the
MathML
Recommendation:

In
order
to
be
consistent
with
the
XML
1.0
Recommendation
[XML]
,
the
user
agent
must
parse
and
evaluate
an
XHTML
document
for
well-formedness.
If
the
user
agent
claims
to
be
a
validating
user
agent,
it
must
also
validate
documents
against
their
referenced
DTDs
according
to
[XML]
.

When
the
user
agent
claims
to
support
facilities
defined
within
this
specification
or
required
by
this
specification
through
normative
reference,
it
must
do
so
in
ways
consistent
with
the
facilities'
definition.

If
a
user
agent
encounters
an
element
it
does
not
recognize,
it
must
render
the
element's
content.

If
a
user
agent
encounters
an
attribute
it
does
not
recognize,
it
must
ignore
the
entire
attribute
specification
(i.e.,
the
attribute
and
its
value).

If
a
user
agent
encounters
an
attribute
value
it
doesn't
recognize,
it
must
use
the
default
attribute
value.

If
it
encounters
an
undeclared
entity,
the
entity
must
be
treated
as
character
data.

Well-formedness
is
a
new
concept
introduced
by
[XML]
.
Essentially
this
means
that
all
elements
must
either
have
closing
tags
or
be
written
in
a
special
form
(as
described
below),
and
that
all
the
elements
must
nest.

Although
overlapping
is
illegal
in
SGML,
it
was
widely
tolerated
in
SGML-based
browsers.

In
SGML-based
HTML
4.0
certain
elements
were
permitted
to
omit
the
end
tag;
with
the
elements
that
followed
implying
closure.
This
omission
is
not
permitted
in
XML-based
XHTML.
All
elements
other
than
those
declared
in
the
DTD
as
EMPTY
must
have
an
end
tag.

In
attribute
values,
user
agents
will
strip
leading
and
trailing
white-space
from
attribute
values
and
and
map
sequences
of
one
or
more
white
space
characters
(including
line
breaks)
to
a
single
inter-word
space
(an
ASCII
space
character
for
western
scripts).
See
Section
3.3.3
of
[XML]
.

In
XHTML,
the
script
and
style
elements
are
declared
as
having
#PCDATA
content.
As
a
result,
<
and
&
will
be
treated
as
the
start
of
markup,
and
entities
such
as
&lt;
and
&amp;
will
be
recognized
as
entity
references
by
the
XML
processor
to
<
and
&
respectively.
Wrapping
the
content
of
the
script
or
style
element
within
a
CDATA
marked
section
avoids
the
expansion
of
these
entities.

<script>
<![CDATA[
... unescaped script content ...
]]>
</script>

CDATA
sections
are
recognized
by
the
XML
processor
and
appear
as
nodes
in
the
Document
Object
Model,
see
Section
1.3
of
the
DOM
Level
1
Recommendation
[DOM]
.

SGML
gives
the
writer
of
a
DTD
the
ability
to
exclude
specific
elements
from
being
contained
within
an
element.
Such
prohibitions
(called
"exclusions")
are
not
possible
in
XML.

For
example,
the
HTML
4.0
Strict
DTD
forbids
the
nesting
of
an
'
a
'
element
within
another
'
a
'
element
to
any
descendant
depth.
It
is
not
possible
to
spell
out
such
prohibitions
in
XML.
Even
though
these
prohibitions
cannot
be
defined
in
the
DTD,
certain
elements
should
not
be
nested.
A
summary
of
such
elements
and
the
elements
that
should
not
be
nested
in
them
is
found
in
the
normative
Appendix B
.

The
current
HTML
4.0
DTDs
do
not
reflect
errata
changes
made
to
the
HTML
4.0
Recommendation
[HTML]
.
The
XHTML
DTDs
incorporate
these
errata,
and
thus
errors
in
HTML
4.0
DTDs
are
corrected
in
the
XHTML
DTDs.
The
errata
can
be
found
at
[
ERRATA
].

HTML
Tidy
is
W3C
sample
code
that
automatically
converts
existing
web
content
to
XHTML.
It
can
cope
with
a
wide
range
of
markup
errors,
and
offers
a
means
to
smoothly
transition
existing
HTML
documents
to
XHTML.
For
more
information,
see
[
TIDY
].

Although
there
is
no
requirement
for
XHTML
1.0
documents
to
be
compatible
with
existing
user
agents,
in
practice
this
is
easy
to
accomplish.
Guidelines
for
creating
compatible
documents
can
be
found
in
Appendix C
.

Work
is
currently
in
progress
to
determine
how
Internet
media
types
[
RFC2046
]
should
be
used
when
delivering
XML
documents,
and
this
will
be
the
subject
of
a
future
W3C
document.

Since
XHTML
is
an
XML
application,
XHTML
documents
may
be
delivered
using
the
Internet
media
type
text/xml
.
Additionally,
since
one
of
the
aims
of
XHTML
is
to
allow
migration
from
existing
HTML
user
agents
to
XHTML
user
agents,
XHTML
documents
may
be
delivered
using
the
Internet
media
type
text/html
.
In
this
case,
it
is
recommended
that
the
documents
follow
the
guidelines
in
Appendix
C
to
decrease
the
chance
of
document
processing
failure.

XHTML
1.0
provides
the
basis
for
a
family
of
document
types
that
will
extend
and
subset
XHTML,
in
order
to
support
a
wide
range
of
new
devices
and
applications,
by
defining
modules
and
specifying
a
mechanism
for
combining
these
modules.
This
mechanism
will
enable
the
extension
and
subsetting
of
XHTML
1.0
in
a
uniform
way
through
the
definition
of
new
modules.

As
the
use
of
XHTML
moves
from
the
traditional
desktop
user
agents
to
other
platforms,
it
is
clear
that
not
all
of
the
XHTML
elements
will
be
required
on
all
platforms.
For
example
a
hand
held
device
or
a
cell-phone
may
only
support
a
subset
of
XHTML
elements.

The
process
of
modularization
breaks
XHTML
up
into
a
series
of
smaller
element
sets.
These
elements
can
then
be
recombined
to
meet
the
needs
of
different
communities.

A
document
profile
specifies
the
syntax
and
semantics
of
a
set
of
documents.
Conformance
to
a
document
profile
provides
a
basis
for
interoperability
guarantees.
The
document
profile
specifies
the
facilities
required
to
process
documents
of
that
type,
e.g.
which
image
formats
can
be
used,
levels
of
scripting,
style
sheet
support,
and
so
on.

For
product
designers
this
enables
various
groups
to
define
their
own
standard
profile.

For
authors
this
will
obviate
the
need
to
write
several
different
versions
of
documents
for
different
clients.

For
special
groups
such
as
chemists,
medical
doctors,
or
mathematicians
this
allows
a
special
profile
to
be
built
using
standard
HTML
elements
plus
a
group
of
elements
geared
to
the
specialist's
needs.

These
DTDs
and
entity
sets
form
a
normative
part
of
this
specification.
The
complete
set
of
DTD
files
together
with
an
XML
declaration
and
SGML
Open
Catalog
is
included
in
the
zip
file
for
this
specification.

The
XHTML
entity
sets
are
the
same
as
for
HTML
4.0,
but
have
been
modified
to
be
valid
XML
1.0
entity
declarations.
Note
the
entity
for
the
Euro
currency
sign
(
&euro;
or
&#8364;
or
&#x20AC;
)
is
defined
as
part
of
the
special
characters.

Avoid
line
breaks
and
multiple
white
space
characters
within
attribute
values.
These
are
handled
inconsistently
by
user
agents.

Don't
include
more
than
one
isindex
element
in
the
document
head
.
The
isindex
element
is
deprecated
in
favor
of
the
input
element.

Use
both
the
lang
and
xml:lang
attributes
when
specifying
the
language
of
an
element.
The
value
of
the
xml:lang
attribute
takes
precedence.

In
XML,
URIs
that
end
with
fragment
identifiers
of
the
form
"#foo"
do
not
refer
to
elements
with
an
attribute
name="foo"
;
rather,
they
refer
to
elements
with
an
attribute
defined
to
be
of
type
ID
,
e.g.,
the
id
attribute
in
HTML
4.0.
Many
existing
HTML
clients
don't
support
the
use
of
ID
-type
attributes
in
this
way,
so
if
you
want
to
be
able
to
process
the
document
on
HTML
clients,
you
may
wish
to
supply
both
id
and
name
values
on
the
target
element,
e.g.,
<a
id="foo"
name="foo">...</a>

To
specify
a
character
encoding
in
the
document,
use
both
the
encoding
attribute
specification
on
the
xml
declaration
(e.g.
<?xml
version="1.0"
encoding="EUC-JP"?>
)
and
a
meta
http-equiv
statement
(e.g.
<meta
http-equiv="Content-type"
content='text/html;
charset="EUC-JP"' />
).
The
value
of
the
encoding
attribute
of
the
xml
processing
instruction
takes
precedence.

"Composite
Capability/Preference
Profiles
(CC/PP):
A
user
side
framework
for
content
negotiation",
F.
Reynolds,
J.
Hjelm,
S.
Dawkins,
S.
Singhal,
30
November
1998.
This
document
describes
a
method
for
using
the
Resource
Description
Format
(RDF)
to
create
a
general,
yet
extensible
framework
for
describing
user
preferences
and
device
capabilities.
Servers
can
exploit
this
to
customize
the
service
or
content
provided.
Available
at:
http://www.w3.org/TR/NOTE-CCPP

"HTML
Tidy"
is
a
tool
for
detecting
and
correcting
a
wide
range
of
markup
errors
prevalent
in
HTML.
It
can
also
be
used
as
a
tool
for
converting
existing
HTML
content
to
be
well
formed
XML.
Tidy
is
being
made
available
on
the
same
terms
as
other
W3C
sample
code,
i.e.
free
for
any
purpose,
and
entirely
at
your
own
risk.
It
is
available
from:
http://www.w3.org/Status.html#TIDY

"Associating
stylesheets
with
XML
documents
Version
1.0",
J.
Clark,
14
January
1999.
This
document
describes
a
means
for
a
stylesheet
to
be
associated
with
an
XML
document
by
including
one
or
more
processing
instructions
with
a
target
of
xml-stylesheet
in
the
document's
prolog.
Available
at:
http://www.w3.org/TR/PR-xml-stylesheet