This document and translations of it may be copied and furnished
to others, and derivative works that comment on or otherwise explain
it or assist in its implementation may be prepared, copied, published
and distributed, in whole or in part, without restriction of any kind,
provided that the above copyright notice and this paragraph are
included on all such copies and derivative works. However, this
document itself may not be modified in any way, such as by removing
the copyright notice or references to OASIS, except as needed for the
purpose of developing OASIS specifications, in which case the
procedures for copyrights defined in the OASIS Intellectual Property
Rights document must be followed, or as required to translate it into
languages other than English.

The limited permissions granted above are perpetual and will not
be revoked by OASIS or its successors or assigns.

This document and the information contained herein is provided
on an "AS IS" basis and OASIS DISCLAIMS ALL WARRANTIES,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE
USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.

Abstract

Status of this Document

This Committee Specification was approved for publication by the
OASIS RELAX NG technical committee. It is a stable document which
represents the consensus of the committee. Comments on this document
may be sent to relax-ng-comment@lists.oasis-open.org.

1. Introduction

This specification describes a compact, non-XML syntax for [RELAX NG].

The goals of this syntax are to:

maximize readability;

support all features of RELAX NG; it must be possible
to translate a schema from the XML syntax to the compact syntax and
back without losing significant information;

support separate translation; a RELAX NG schema may be
spread amongst multiple files; it must be possible to represent each
of the files separately in the compact syntax; the representation of
each file must not depend on the other files.

The body of this document contains an informal description of
the syntax and how it maps onto the XML syntax. Developers should
consult Appendix A for a complete, rigorous
description. Appendix B contains an example in
the form of schema for RELAX NG.

2. Syntax

The following is a summary of the syntax in EBNF. Square
brackets are used to indicate optionality. The reader may find it
helpful to compare this with the syntax in Section 3 of [RELAX NG]. The start symbol is topLevel.

In order to use a keyword as an identifier, it must be quoted
with \. It is not necessary to quote a keyword that
is used as the name of an element or attribute or as datatype
parameter.

The value of a literal is the concatenation of the values of its
constituent literalSegments. A literalSegment is always terminated
only by an occurrence of the same delimiter that began it. The
delimiter used to begin a literalSegment may be either one or three
occurrences of a single or double quote character. Newlines are
allowed only in literalSegments delimited by three quote characters.
The value of a literal segment consists of the characters between its
delimiters. One way to get a literal whose value contains both a
single and a double quote is to divide the literal into multiple
literalSegments so that the single and double quote are in separate
literalSegments. Another way is to use a literalSegment delimited by
three single or double quotes.

There is no notion of operator precedence. It is an error
for patterns to combine the |, &,
, and - operators without using
parentheses to make the grouping explicit. For example,
foo | bar, baz is not allowed; instead,
either (foo | bar), baz or
foo | (bar, baz) must be used. A similar
restriction applies to name classes and the use of the
| and - operators. These
restrictions are not expressed in the above EBNF but they are made
explicit in the BNF in Section 1.

The value of an anyURILiteral specified with
include or external is a URI
reference to a grammar in the compact syntax.

3. Lexical structure

Whitespace is allowed between tokens. Tokens are the strings
occurring in double quotes in the EBNF in Section 2, except
that literalSegment, nsName, CName, identifier and quotedIdentifer are
single tokens.

Comments are also allowed between tokens. Comments start with a
# and continue to the end of the line. Comments
starting with ## are treated specially; see Section 5.

A Unicode character with hex code N
can be represented by the escape sequence
\x{N}. Using such an
escape sequence is completely equivalent to the entering the
corresponding character directly. For example,

element \x{66}\x{6f}\x{6f} { empty }

is equivalent to

element foo { empty }

4. Declarations

A datatypes declaration declares a prefix
used in a QName identifying a datatype. For example,

RELAX NG has the feature that if a file does not specify an
ns attribute then the ns
attribute can be inherited from the including file. To support this
feature, the keyword inherit can be specified in
place of the namespace URI in a namespace declaration. For
example,

In the absence of an inherit parameter on
include or external, the default
namespace will be inherited by the referenced file.

In the absence of a default namespace
declaration, a declaration of

default namespace = inherit

is assumed.

5. Annotations

The RELAX NG XML syntax allows foreign elements and attributes
to be used to annotate a RELAX NG schema. A schema in the compact
syntax can also have annotations, which will turn into foreign
elements and attributes when the schema is translated into XML syntax.
The way these annotations are specified depends on where the foreign
elements and attributes are to appear in the translated schema. There
is also a special shorthand syntax when the foreign element is a
documentation element as described in [Compatibility].

5.1. Initial annotations

An annotation in square brackets can be inserted immediately
before a pattern, param, nameClass, grammarContent or includeContent. It has
the following syntax:

annotation

::=

"[" annotationAttribute* annotationElement* "]"

annotationAttribute

::=

name "=" literal

annotationElement

::=

name "[" annotationAttribute* (annotationElement | literal)* "]"

Each of the annotationAttributes will turn into attributes on
the corresponding RELAX NG element. Each of the annotationElements
will turn into initial children of the corresponding RELAX NG element,
except in the case where the RELAX NG element cannot have children, in
which case they will turn into following elements.

5.2. Documentation shorthand

Comments starting with ## are used to specify
documentation elements from the
http://relaxng.org/ns/compatibility/annotations/1.0
namespace as described in [Compatibility]. For example,

## comments can only be used immediately
before before a pattern, nameClass, grammarContent or includeContent.
Multiple ## comments are allowed. Multiple adjacent
## comments without any intervening blank lines are
merged into a single
documentation element. Any ##
comments must precede any annotation in square brackets.

5.3. Following annotations

A pattern or nameClass may be followed by any number of
followAnnotations with the following syntax:

followAnnotation

::=

">>" annotationElement

Each such annotationElement turns into a following sibling of
the RELAX NG element representing the pattern or nameClass.

5.4. Grammar annotations

An annotationElement may be used in any place where
grammarContent or includeContent is allowed. For example,

If the name of such an element is a keyword, then it must be
quoted with \.

6. Conformance

There are three kinds of conformant implementation.

6.1. Validator

A validator conforming to this specification must be able to
determine whether a textual object is a correct RELAX NG Compact
Syntax schema as specified in Appendix A. It must also
be able to determine for any XML document and for any correct RELAX NG
Compact Syntax schema whether the document is valid (as defined in
[RELAX NG]) with respect to the translation of the schema
into XML syntax. It need not be able to output a representation of
the translation of the schema into XML syntax.

The requirements in the preceding paragraph are subject to the
provisions of the second paragraph of Section 8 of [RELAX NG].

6.2. Structure preserving translator

A structure preserving translator must be able to translate any
correct RELAX NG Compact Syntax schema into an XML document whose data
model is strictly equivalent to the translation specified in Appendix A. For this purpose, two instances of the data model
(as specified in Section 2 of [RELAX NG]) are considered
strictly equivalent if they are identical after applying the
simplifications specified in Sections 4.2, 4.3, 4.4, 4.8, 4.9 and 4.10
of [RELAX NG], with the exception that the base URI in the
context of elements may differ.

Note

The RELAX NG compact syntax is not a representation of the
XML syntax of a RELAX NG schema; rather it is a representation of the
semantics of a RELAX NG schema. Details of the XML syntax that were
judged to be insignificant are not captured in the compact syntax.
For example, in the XML syntax if the name class for an
element or attribute pattern
consists of just a single name, it can be expressed either as a
name attribute or as a name
element; however, in the compact syntax, there is only one way to
express such a name class. The simplifications listed in the previous
paragraph correspond to those syntactic details that are not captured
in the compact syntax.

When comparing two include or
externalRef patterns in the XML source for strict
equivalence, the value of the href attributes are
not compared; instead the referenced XML documents are compared for
strict equivalence.

6.3. Non-structure preserving translator

A non-structure preserving translator must be able to translate
any correct RELAX NG Compact Syntax schema into an XML document whose
data model is loosely equivalent to the translation specified in Appendix A. For this purpose, two instances of the data model
(as specified in Section 2 of [RELAX NG]) are considered
loosely equivalent if they are such that, after applying all the
simplifications specified in Section 4 of [RELAX NG], one
can be transformed into the other merely by reordering and renaming
definitions. After the simplifications have been applied, the context
of elements is ignored when comparing the two instances.

Note

A validator for the compact syntax can be implemented as a
combination of a non-structure preserving translator for the compact
syntax and a validator for the XML syntax.

A. Formal description

1. Syntax

The compact syntax is specified by a grammar in BNF. The
translation into the XML syntax is specified by annotations in the
grammar.

The BNF description consists of a set of production rules. Each
production rule has a left-hand side and right-hand side separated by
::=. The left-hand side specifies the name of a
non-terminal. The right-hand side specifies a list of one or more
alternatives separated by |. Each alternative
consists of a sequence of terminals and non-terminals. A non-terminal
is specified by a name in italics. A terminal is either a literal
string in quotes or a named terminal specified by a name in bold
italics. An alternative can also be specified as ε, which
denotes an empty sequence of tokens.

Each alternative may be followed by references to one or more
named constraints that apply to that alternative.

The translation into XML syntax is specified by associating a
value with each terminal and non-terminal in the derivation. Each
alternative in the BNF may be followed by an expression in curly
braces, which specifies how to compute the value associated with the
left-hand side non-terminal. Each terminal and non-terminal on the
right-hand side can be labelled with a subscript specifying a variable
name. When that variable name is used within the curly braces, it
refers to the value associated with that terminal or non-terminal. If
an alternative consists of a single terminal or non-terminal, then the
expression in curly braces can be omitted; in this case the value of
the left-hand side is the value of that terminal or
non-terminal.

The result of the translation is not a string containing the XML
representation of a RELAX NG schema, but rather is an instance of the
data model described in Section 2 of [RELAX NG]; this
instance will match the RELAX NG schema for RELAX NG.

A textual object is a correct RELAX NG Compact Syntax schema
if:

it matches the grammar specified in this section,

it satisfies all the constraints specified in this section, and

the result of the translation is a correct RELAX NG schema.

The computation of the value of a non-terminal may make use of
one or more arguments. When the name of such a non-terminal occurs on
the left-hand side of a production, it is followed by an argument list
that declares the formal arguments for the non-terminal; these formal
arguments may be referred to by expressions on the right-hand side,
as, for example, in simpleNameClass. When the name
occurs on the right-hand side of a production, it may be followed by
one or more assignments that specify the actual arguments which will
be bound to the formal arguments during the computation of the value
of the non-terminal. Arguments may be passed down implicitly: if
there is no actual argument corresponding to a particular formal
argument, then the formal argument is bound to the value of the
variable with the same name as the name of the formal argument. In
other words, for any variable x, a default actual
argument of x := x is assumed. For example, see nameClassChoice.

In addition to explicit arguments, every non-terminal implicitly
has an argument that specifies an environment for the interpretation
of a pattern. By default, the implicit environment argument to each
non-terminal is the same as its parent. This may be overridden for a
particular non-terminal by including environment in the
argument list. For example, see topLevel and preamble.

An environment specifies:

a mapping from datatype prefixes to
URIs;

a mapping from namespace prefixes to URIs; a namespace
prefix may be mapped to a special value inherit as well as to a
URI;

the default namespace; the default namespace is either
a URI or the special value inherit;

the base URI.

The special value inherit is used to indicate that a
namespace URI should be inherited from the referencing schema.

In the initial environment used for the start symbol,
xml is bound as a namespace prefix to
http://www.w3.org/XML/1998/namespace, and
xsd is bound as a datatype prefix to
http://www.w3.org/2001/XMLSchema-datatypes; the
base URI is determined as specified by [RFC 2396].

The value of an expression is one of the following:

the constants true, false or
inherit;

a string;

a name (a namespace URI/local name pair);

a qualified-name (a prefix/local name pair);

an XML fragment, where an XML fragment is a pair of a
set of zero or more attributes and a content sequence of zero or more
strings and elements, as described in the data model of [RELAX NG]; an XML fragment is thus the same kind of thing as
what is matched against a RELAX NG pattern;

an environment.

Each terminal and non-terminal has an associated type identified
by a name. A type is simply a set of values. The value of a terminal
or non-terminal is always a member of the set of values identified by
the name of its type. The name of the type of a terminal or
non-terminal is given following the keyword returns before ::= in the
production rule. Similarly, each argument has a type, which is given
immediately before the name of the argument. The value of a
non-terminal may also be specified to be void; no expression is given
for the value of such a non-terminal, nor will the value of such a
non-terminal be used in any expression.

The following types are all disjoint:

Boolean contains true and false;

Inherit contains inherit;

String contains all strings;

Name contains all names;

Qname contains all qualified-names;

Environment contains all enviroments;

Xml contains all XML fragments.

It is also useful to identify some subtypes of Xml. One type is
a subtype of another if the set of values of the one type is a subset
of the set of values of the other.

Content contains all XML fragments that have an empty
set of attributes;

Elements contains all XML fragments that have an empty
set of attributes and whose content sequence does not have any string
members; it is a subtype of content;

Element contains all XML fragments that have an empty
set of attributes and whose content sequence consists of a single
element; it is a subtype of elements;

Attributes contains all XML fragments that have an
empty content sequence;

Attribute contains all XML fragments that have an
empty content sequence and whose attribute set consists of a single
attribute.

In addition it is useful to have the following union type.

NamespaceURI is the union of String and
Inherit.

Expressions use the following notation:

x denotes the value of the variable
named x;

( ) denotes an empty
XML fragment;

(x, y) denotes the concatenation of the XML fragments
x and y; the attributes of the
resulting XML fragment consist of the union of the attributes of
x and y and the content sequence
consists of the concatenation of the content sequence of x and y (this is the same as the meaning of
the comma operator in the compact syntax);

environment denotes the value of the implicit
environment argument;

true, false and inherit are used
to denote the corresponding special constant;

"xyzzy" denotes a string
consisting of the characters
xyzzy;

f(x, y, . . . ) denotes the
value of the function f applied to the arguments x,
y, . . . ; the available functions are as
follows:

returns an XML fragment consisting of an element with
name y and attributes and children z; the namespace map of the context of the element contains
all the mappings from namespace prefixes to URIs from x except those mappings that map to inherit or to the
empty string; the namespace map may contain an additional mapping from
an implementation-dependent prefix to the compatibility annotations
URI; the default namespace of namespace map of the context of the
element is the RELAX NG namespace URI
http://relaxng.org/ns/structure/1.0; the base URI
of the context of the element is not
constrained;

returns a
URI; y is a URI reference of a resource containing a
schema in the syntax described by this specification; the returned URI
is the URI of a resource containing the translation of this schema
into RELAX NG XML syntax; y is resolved into an
absolute form as described in section 5.2 of [RFC 2396]
using the base URI from the environment x; the
restriction on the use of fragment identifiers specified in section
4.5 of [RELAX NG] applies to y;

returns an element
whose name is the name of y, whose attributes are
the union of the first member of x and the
attributes of y, and whose children are the
concatenation of the second member of x and the
children of y;

returns a
set of two attributes; both attributes have the empty string as their
namespace URI; one attribute has local name
datatypeLibrary and value x; the
other attribute has local name
type and value y;

returns the name of the documentation element
defined in [Compatibility], that is, the name with namespace
URI http://relaxng.org/ns/compatibility/annotations/1.0 and
local name documentation;

x ? y : z is a conditional
expression, which denotes y if x
is true and z if x is
false;

<foox>y</foo> denotes
an XML fragment containing an element from the RELAX NG namespace with
local name foo, attributes x and
content x; the context of the element is determined
from the implicit environment argument as specified for the element
function above.

It is an error if the value of the literal used with
external or include declaration
does not meet the requirements for the anyURI symbol specified in
Section 3 of [RELAX NG].

2. Lexical structure

This section describes how to transform the textual
representation of a RELAX NG schema in compact syntax into a sequence
of tokens, which can be parsed using the grammar specified in Section 1.

There are five distinct stages, which are logically consecutive;
the result of each stage is the input to the following stage.

2.1. Character encoding

The textual representation of the RELAX NG schema in compact
syntax may be either a sequence of Unicode characters or a sequence of
bytes. In the latter case, the first stage is to transform the
sequence of bytes to the sequence of characters. The sequence of
bytes may have associated metadata specifying the encoding. One
example of such metadata is the charset parameter
in a MIME media type [RFC 2046].
If there is such metadata, then the specified
encoding is used. Otherwise, the first two bytes of the sequence are
examined. If these are #xFF followed by #xFE or #xFE followed by
#xFF, then an encoding of UTF-16 [Unicode] will be
used, little-endian in the former case, big-endian in the latter case.
Otherwise an encoding of UTF-8 [Unicode] is used. It
is an error if the sequence of bytes is not a legal sequence in the
selected encoding.

2.2. BOM stripping

If the first character of the sequence is a byte order mark
(#xFEFF), then it is removed.

2.3. Newline normalization

Representations of newlines are normalized to a newline
marker. Specifically, each occurrence of

a #xA character,

a #xD character that is not followed by a #xA character, or

a #xD, #xA character pair

is transformed to a newline marker. The result of this stage is
thus a sequence whose members are Unicode characters and newline
markers.

2.4. Escape interpretation

In this stage, each escape sequence of the form
\x{n}, where
n is a hexadecimal number, is replaced by
the character with Unicode code n. The
escape sequence must match the production escapeSequence; the value computed in the BNF is the Unicode
code of the replacement character. It is an error if the replacement
character does not match the Char production of
[XML 1.0]. It is an error if the input character
sequence contains a character sequence escapeOpen
that does not start an escapeSequence. After an
escape sequence has been replaced, scanning for escape sequences
continues following the replacement character; thus
\x{5C}x{5C} is transformed to
\x{5C} not to \. The
replacement for \x{A} or \x{D}
is a character, as for all other escape sequences, not a newline
marker. Thus the sequence that results from this stage can contain #xA
and #xD characters as well as newline markers.

Note

The \ character that opens an escape
sequence may be followed by more than one x. This
makes it possible for there to be a reversible transformation that
maps a schema to a form containing only ASCII characters; the
transformation replaces adds an extra x to each
existing escape sequence, and replaces every non-ASCII character by an
escape sequence with exactly one x.

If the longest such initial subsequence matches separator, this subsequence is discarded. Otherwise, a
single non-terminal is produced from this initial subsequence. In
either case, the tokenization proceeds with the rest of the sequence.
It is an error if there is no such initial subsequence.

The production rules below use some additional notation. Square
brackets enclose a character class. A character class of the form
[^chars] specifies any
legal XML character that does not occur in
chars. A legal XML character is a
character that matches the Char production of [XML 1.0]. A character class of the form
[chars], where
chars does not being with
^, specifies any single character that occurs in
chars. XML hexadecimal character
references are used to denote a single character, as in XML. A
newline marker is denoted by &newline;. NCName
is defined in [XML Namespaces].

The value of a variable bound to a character class is a string
of length 1 containing the character that matched the character class;
if the character class matches a newline marker, then the string
contains the character #xA.