Both DTDs and XML Schema are designed to accomplish the same fundamental task: to define the structure of XML document types. In this sense both are simply different text representations for the
same underlying data structures. However, Schema and DTDs differ significantly in several ways, both in structure and capabilities.

Some differences worth noting are:

Common XML features

XML Schema are XML documents themselves and therefore share many aspects of the languages they define.

Data typing

Schemas are designed with a much larger set of built-in data types than DTDs, and provide methods for creating user-defined types.

Namespaces

DTDs only partially support XML Namespaces [XMLNAMES], which are inherently a part of XML Schema.

Extension

XML Schema have a rich set of extension mechanisms including inheritance, redefinition, and substitution.

Entities

There is no mechanism in XML Schema corresponding to the use of entities for data abstraction in DTDs. In many cases the functionality of entities can be replaced through other XML-based
mechanisms. However, there is currently no support for named character entity references as used in XHTML within XML Schema. In the XML Schema modules described here, named character entities for
XHTML are included using a DTD.

DTDs and Document Order Dependence

A more subtle feature of modularized DTDs is their dependence on the document order; the order in which elements and entities are defined within DTD files has a large impact on language
development. XML Schema are far less dependent on document order.

XML language definitions, regardless of their text representation, contain at least three types of data structures. When combined into a coherent and consistent whole, they form a complete
language definition. These three components are:

Elements

Attributes

Content models

Additional abstract data structures may be defined for use in the language definition, such as common content models or attribute groups, whose use is shared by other data structures within the
language definition. The definition of these structures is the primary task of language development, and the core of the modularization framework.

A set of modularization conventions that describe how the individual modules work together, and how they can be modified or extended.

In XHTML-MOD, every object in the DTDs is represented by an XML entity. These entities are then composed into larger sets of entities and so on, resulting in a set of data abstractions that can
be generalized and used modularly. These multiple levels of abstraction are tied together by the use of a specific naming convention and a set of abstract modules.

Generic classes of entities (composed of sub- and sub-sub-entities) are used to create definitions of the three components listed above. Content models, attribute lists and elements are defined
separately, sometimes in separate modules, and the ordering of the modules in the DTD structure is strictly defined (due to document order dependence). They are then combined to form the resulting
document type. Extensibility is accomplished through the extensive use of INCLUDE/IGNORE sections in the DTD modules. How each of these structures relates to its Schema-based counterpart is
summarized in Table 1 below.

Both the DTD and schema-based modularization frameworks implement a set of formalized data structures, often in a conceptually similar way. The modularization framework described here is designed
around the use of similar data structures, which can be represented (more or less) equally well in either representation. This is accomplished through the use of a straightforward mapping of data
structures defined in the DTD modules onto equivalent data structures in the XML Schema language.

In XHTML-MOD, content models for elements are defined using three classes of entities, identified through the naming conventions by the suffixes ".content", ".class", and ".mix". Each of these
classes of entities is mapped onto a corresponding Schema counterpart in the following way:

".content" models - these models are used to define the contents of individual elements. For each element there is a corresponding ".content" object. IN XML Schema,
".content" entities are mapped directly onto groups:

".class" models - these models are used to define abstract classes of content models made up of either ".content" entities or other ".class" entities (or elements). In XML Schema
they correspond to groups that may also contain substitution groups:

".mix" models - these models correspond to content models that are mixed groupings of ".class", ".content", and ".mix" entities and serve as abstract content models often
used in common by many elements in the DTD. They correspond to groups in XML Schema:

In addition to these three content model groupings, XHTML-MOD includes an additional grouping ".extra". These are currently omitted from the schema modules. (If needed, a developer could add them
to the schema modules in a conformant way.)

Attributes and Attribute lists in DTDs correspond directly to attribute and attributeGroup elements in XML Schema. The translation from one to the other is relatively simple and straightforward.
Here is an example:

Complex attribute groups that are used by many different elements are grouped in the DTDs using entities suffixed with ".attrib". These attribute entities map directly onto attributeGroup
elements in XML Schema as shown above.

The XML Schema specification allows elements as well as attribute values to be strongly typed. In defining elements in the modularized schema, an element type is created for each element that is
a complex type composed of the content model (element.content) and the attribute list (element.attlist) as shown below:

XML Schema allows inheritance and redefinition of elements, groups, attributes and attributeGroups. In several cases modules require modification of previously declared attribute lists. This is
done by using the <xsd:redefine> element to redefine the attributeGroup that needs to be modified

Notations are an SGML feature that allows non-SGML data within documents to be interpreted locally [CATALOG]. Notations for XHTML are preserved in the
Schema modules using the notation element in a straightforward way.

The strong typing mechanism in XML Schema, along with the large set of intrinsic types and the ability to create user-defined types, provides for a high level of type safety in instance
documents. This feature can be used to express more strict data type constraints, such as those of attribute values, when using XML Schema for validation.

XML Schema provides no means of duplicating XHTML's named character entity mechanism. In most cases data abstraction through entities can be dispensed with in schemas. However, in the case of
named character references, no replacement method is available.

Character entities are used to represent characters that occur in document data that may not be processed natively on the user's machine, for instance the copyright symbol. XHTML makes use of 3
sets of named character entities: the ISO Latin 1, Symbols, and Special.

A general solution for the resolution of language-specific named character entities is outside the scope of this document.

Entities are currently referenced in this framework as using a DTD reference to three individual DTD modules that define them.

One further issue of note in the conversion of DTDs to XML Schema is that it is absolutely necessary to define all elements globally. Otherwise they are not considered to be in the XHTML
namespace but only "associated" [XMLSCHEMA_COMPOSITION] with it. This document does not make use of this association feature in XML Schema.

This modularization framework consists of a complete set of XHTML schema modules and a set of framework conventions that describe how to use them. The use of the framework conventions is required
for conformance.

The Schema hub document is the base document for the schema. It contains only annotations and modules, which in turn contain <xsd:include> statements referencing other modules. The hub
document corresponds to the DTD "driver" module in XHTML-MOD, but is much simpler. The hub document allows the author to modify the schema's contents by the simple expedient of commenting out modules
that are not used. Note that some modules are always required in order to ensure conformance.

The (non-normative) example hub document described here contains <include> elements for two modules, named "required" and "optional". Each of these included modules is itself a container
module.

Module containers, reasonably enough, include other modules. Modules and their containers are organized according to function. Including the hub document, which is a special case of a module
container, there are ten included module containers.

In addition to the module containers listed above, there are around forty schema modules which contain only element definitions and their associated attribute and content model definitions. By
convention, Schema modularizations may contain either <include> statements or element definitions but not both.

In order to easily identify the contents of any particular schema module, it is useful to provide here a module naming convention syntax. This syntax also provides a simple means of
distinguishing modules based on their language version, which may improve maintainability of the modules themselves.

The module naming convention adopted here is the same in almost all respects as that used in XHTML-MOD.

Schema modules for XHTML should have names that:

Are supported on all common platforms

Identify the contents of the modules

Identify the language version of the module

Modules used in this modularization framework must have names that conform to the following syntax:

Example 10 - Schema Module Naming Convention

Pattern

languagename-filecontentsdescription-versionnumber.xsd

Example

xhtml-table-01.xsd

Exceptions to this rule are made for the Schema hub modules whose names are the same as above but may omit the content description syllable for brevity.

Version numbers of hub modules may omit the leading zero in the version number, but should include the minor version number.

Example: xhtml-1.1.xsd

In the case where a hub module contains elements or attributes from external namespaces, the name(s) of the external module(s) should be appended to the base language name using the "+"
character.

Example: xhtml+fml-1.0.xsd

This module naming convention is intended also to comply with the required use of the media type in [XHTMLMIME].

In order to establish a physical structure for the composition of the Schema modules that corresponds to the abstract modules in XHTML, a module hierarchy structure has been used to organize the
physical modules. The hierarchy structure looks like this:

These correspond to the divisions of XHTML into abstract modules described in detail in Section 3.2. The hierarchy structure is intended to match the abstract module structure as closely as
possible. This feature is not present in DTD modularization, and is not required for Schema modularization. It does, however, allow the developer to organize the modules in accordance with their
hierarchical structure. The directories listed in Table 2 also correspond exactly to the module container modules in this framework.

Element names are not suffixed in XHTML-MOD. This document uses the notion of element types, which are complexTypes used to define elements and are suffixed with ".type". The ".type" suffix was
used in XHTML-MOD for attribute data types. This is superfluous in XML Schema (since attribute types are arguments to the "type" attribute) and so the suffix is used in a different way in this
framework.

This document establishes a convention for the internal structure of XHTML Schema modules. This convention provides a consistent and predictable way of organizing schema modules internally. This
convention applies also to the hub document, which is itself simply a module of modules, albeit a somewhat specialized one.

Each schema module is composed of several components, some of which are required for functional reasons and some of which provide metadata as a convenience to the author. Not every component is
included in every module.

A consistent commenting convention has been imposed on the modules described here. The purpose of a commenting convention is to allow for generating documentation from the comments (as well as
general comprehension). Documentation elements containing Annotation-level comments are assumed to be of the highest importance and should be used to denote information about the module itself, and
for important notes for developers.

ModuleF-level comments are denoted as usual with SGML comment delimiters "<!--" and "-->". By means of this convention, modules can become self-documenting. Tools for extracting these
comments and formatting them suitably may (hopefully) be developed in the future.