This specification is a draft reflecting consensus reached by members of
the yaml-core
mailing list. Any questions regarding this draft should be
raised on this list. We expect all further changes will be strictly
limited to wording corrections and fixing production bugs.

We wish to thank implementers who have tirelessly tracked earlier
versions of this specification, and our fabulous user community whose
feedback has both validated and clarified our direction.

Abstract

YAML™ (rhymes with “camel”) is a
human-friendly, cross language, Unicode based data serialization
language designed around the common native data structures of agile
programming languages. It is broadly useful for programming needs
ranging from configuration files to Internet messaging to object
persistence to data auditing. Together with the Unicode standard for characters,
this specification provides all the information necessary to understand
YAML Version 1.1 and to creating programs that process YAML
information.

ChapterÂ 1.Â Introduction

“YAML Ain't Markup Language” (abbreviated YAML) is a data
serialization language designed to be human-friendly and work well with
modern programming languages for common everyday tasks. This
specification is both an introduction to the YAML language and the
concepts supporting it and also a complete reference of the information
needed to develop applications
for processing YAML.

Open, interoperable and readily understandable tools have advanced
computing immensely. YAML was designed from the start to be useful and
friendly to people working with data. It uses Unicode printable characters, some of
which provide structural information and the rest containing the data
itself. YAML achieves a unique cleanness by minimizing the amount of
structural characters, and allowing the data to show itself in a natural
and meaningful way. For example, indentation may be used for structure, colons separate
mapping
key:Â value pairs, and dashes are used to
“bullet” lists.

There are myriad flavors of data structures, but they can all be
adequately represented with three
basic primitives: mappings (hashes/dictionaries), sequences
(arrays/lists) and scalars (strings/numbers). YAML leverages these
primitives and adds a simple typing system and aliasing mechanism to form a
complete language for serializing any data structure. While most
programming languages can use YAML for data serialization, YAML excels in
those languages that are fundamentally built around the three basic
primitives. These include the new wave of agile languages such as Perl,
Python, PHP, Ruby and Javascript.

There are hundreds of different languages for programming, but only a
handful of languages for storing and transferring data. Even though its
potential is virtually boundless, YAML was specifically created to work
well for common use cases such as: configuration files, log files,
interprocess messaging, cross-language data sharing, object persistence
and debugging of complex data structures. When data is easy to view and
understand, programming becomes a simpler task.

1.1.Â Goals

The design goals for YAML are:

YAML is easily readable by humans.

YAML matches the native data structures of agile languages.

YAML data is portable between programming languages.

YAML has a consistent model to support generic tools.

YAML supports one-pass processing.

YAML is expressive and extensible.

YAML is easy to implement and use.

1.2.Â Prior Art

YAML's initial direction was set by the data serialization and markup
language discussions among SML-DEV members. Later
on it directly incorporated experience from Brian Ingerson's Perl
module Data::Denter. Since then YAML has matured through ideas and
support from its user community.

The syntax of YAML was motivated by Internet Mail (RFC0822) and remains
partially compatible with that standard. Further, borrowing from MIME
(RFC2045), YAML's top-level production is a stream of independent documents;
ideal for message-based distributed processing systems.

YAML's core type system is based on the requirements of agile languages
such as Perl, Python, and Ruby. YAML directly supports both collection
(mapping, sequence) and scalar
content. Support for common types enables programmers to use
their language's native data structures for YAML manipulation, instead
of requiring a special document object model (DOM).

YAML was designed to support incremental interfaces that includes both
input pull-style and output push-style one-pass (SAX-like) interfaces.
Together these enable YAML to support the processing of large documents,
such as a transaction log, or continuous streams, such as a feed from a
production machine.

1.3.Â Relation to XML

Newcomers to YAML often search for its correlation to the eXtensible
Markup Language (XML). While the two languages may actually compete in
several application domains, there is no direct correlation between
them.

YAML is primarily a data serialization language. XML was designed to be
backwards compatible with the Standard Generalized Markup Language
(SGML) and thus had many design constraints placed on it that YAML does
not share. Inheriting SGML's legacy, XML is designed to support
structured documentation, where YAML is more closely targeted at data
structures and messaging. Where XML is a pioneer in many domains, YAML
is the result of lessons learned from XML and other technologies.

It should be mentioned that there are ongoing efforts to define
standard XML/YAML mappings. This generally requires that a subset of
each language be used. For more information on using both XML and YAML,
please visit http://yaml.org/xml/index.html.

1.4.Â Terminology

This specification uses key words based on RFC2119 to indicate
requirement level. In particular, the following words are used to
describe the actions of a YAML processor:

The word should, or the
adjective recommended,
mean that there could be reasons for a YAML processor to deviate from the
behavior described, but that such deviation could hurt
interoperability and should therefore be advertised with
appropriate notice.

Must

The word must, or the term
required or shall, mean that the behavior
described is an absolute requirement of the specification.

ChapterÂ 2.Â Preview

This section provides a quick glimpse into the expressive power of YAML.
It is not expected that the first-time reader grok all of the examples.
Rather, these selections are used as motivation for the remainder of the
specification.

2.2.Â Structures

YAML uses three dashesÂ (“---”) to separate documents
within a stream. Three dotsÂ ( “...”) indicate the end of
a document without starting a new one, for use in communication
channels. Comment lines begin with the Octothorpe (usually
called the “hash” or “pound” sign -
“#”).

ExampleÂ 2.7.Â
Two Documents in a Stream
(each with a leading comment)

ChapterÂ 3.Â Processing YAML Information

YAML is both a text format and a method for presenting any data structure in this format.
Therefore, this specification defines two concepts: a class of data
objects called YAML representations, and a syntax for
presenting YAML representations as a series of
characters, called a YAML stream. A YAML processor is a tool for converting
information between these complementary views. It is assumed that a YAML
processor does its work on behalf of another module, called an application. This chapter describes the
information structures a YAML processor must provide to or obtain from
the application.

YAML information is used in two ways: for machine processing, and for
human consumption. The challenge of reconciling these two perspectives is
best done in three distinct translation stages: representation, serialization, and presentation. Representation addresses how YAML
views native data structures to achieve portability between programming
environments. Serialization
concerns itself with turning a YAML representation into a serial form,
that is, a form with sequential access constraints. Presentation deals with the formatting
of a YAML serialization as a
series of characters in a human-friendly manner.

FigureÂ 3.1.Â Processing Overview

A YAML processor need not expose the serialization or representation stages. It may
translate directly between native data structures and a character
stream
(dump and load in the diagram above). However, such a
direct translation should take place so that the native data structures
are constructed only from
information available in the representation.

3.1.Â Processes

This section details the processes shown in the diagram above. Note a
YAML processor need not provide
all these processes. For example, a YAML library may provide only YAML
input ability, for loading configuration files, or only output ability,
for sending data to other applications.

3.1.1.Â Represent

YAML represents any native
data structure using three node
kinds: the sequence, the mapping and
the scalar. By sequence we mean an ordered
series of entries, by mapping we mean an unordered
association of uniquekeys to
values, and by scalar we mean any datum with
opaque structure presentable as
a series of Unicode characters. Combined, these primitives generate
directed graph structures. These primitives were chosen because they
are both powerful and familiar: the sequence corresponds to a
Perl array and a Python list, the mapping corresponds to a Perl
hash table and a Python dictionary. The scalar represents strings,
integers, dates and other atomic data types.

Each YAML node requires, in addition to its kind and content, a tag specifying
its data type. Type specifiers are either global URIs, or are local in scope to a single application. For example, an integer
is represented in YAML with a scalar plus the global tag
“tag:yaml.org,2002:int”. Similarly, an invoice object,
particular to a given organization, could be represented as a
mapping together with the local tag “!invoice”. This simple model
can represent any data structure independent of programming language.

3.2.1.Â Representation Graph

YAML's representation of
native data is a rooted, connected, directed graph of taggednodes. By
“directed graph” we mean a set of nodes and
directed edges (“arrows”), where each edge connects one
node
to another (see a formal
definition). All the nodes must be reachable from
the root node via such edges.
Note that the YAML graph may include cycles, and a node may have
more than one incoming edge.

3.2.1.1.Â Nodes

YAML nodes have content of one of three
kinds: scalar, sequence, or
mapping. In addition, each node has a tag which serves to
restrict the set of possible values which the node's content can
have.

Scalar

The content of a scalar node is an
opaque datum that can be presented as a series of zero or
more Unicode characters.

Sequence

The content of a sequence node is an
ordered series of zero or more nodes. In particular, a
sequence may contain the same node more than once or it could
even contain itself (directly or indirectly).

Mapping

The content of a mapping node is an
unordered set of key:Â value
node pairs, with the restriction that each of the keys is
unique. YAML places no
further restrictions on the nodes. In particular, keys may be
arbitrary nodes, the same node may be used as the value of
several key:Â value pairs, and a mapping could even
contain itself as a key or a value (directly or indirectly).

When appropriate, it is convenient to consider sequences and
mappings together, as collections. In this view,
sequences are treated as mappings with integer keys starting at
zero. Having a unified collections view for sequences and mappings
is helpful both for creating practical YAML tools and APIs and for
theoretical analysis.

3.2.1.2.Â Tags

YAML represents type
information of native data structures with a simple identifier,
called a tag. Global
tags are are URIs and hence
globally unique across all applications. The
“tag”: URI
scheme (mirror) is
recommended for all global YAML tags. In contrast, local tags are specific to a single
application. Local tags
start with “!”, are not URIs and are not
expected to be globally unique. YAML provides a “TAG” directive to
make tag notation less verbose; it also offers easy migration from
local to global tags. To ensure this, local tags are restricted to
the URI character set and use URI character escaping.

YAML does not mandate any special relationship between different
tags that begin with the same substring. Tags ending with URI
fragments (containing “#”) are no exception; tags that
share the same base URI but differ in their fragment part are
considered to be different, independent tags. By convention,
fragments are used to identify different “variants” of
a tag, while “/” is used to define nested tag
“namespace” hierarchies. However, this is merely a
convention, and each tag may employ its own rules. For example,
Perl tags may use “::” to express namespace
hierarchies, Java tags may use “.”, etc.

3.2.1.3.Â Nodes Comparison

Since YAML mappings require key uniqueness, representations must include a
mechanism for testing the equality of nodes. This is non-trivial
since YAML allows various ways to format a given scalar content. For
example, the integer eleven can be written as “013”
(octal) or “0xB” (hexadecimal). If both forms are
used as keys in the same mapping, only a YAML
processor which recognizes
integer formats would correctly
flag the duplicate key as an error.

Canonical Form

YAML supports the need for scalar equality by
requiring that every scalartag
must specify a mechanism to producing the canonical form of any
formatted content. This
form is a Unicode character string which presents the content and can be used for equality
testing. While this requirement is stronger than a well
defined equality operator, it has other uses, such as the
production of digital signatures.

Two nodes are identical only when they
represent the same
native data structure. Typically, this corresponds to a
single memory address. Identity should not be confused with
equality; two equal nodes need not have
the same identity. A YAML processor may treat equal
scalars as if they were identical. In
contrast, the separate identity of two distinct but equal
collections must be preserved.

3.2.2.Â Serialization Tree

To express a YAML representation using a serial API,
it necessary to impose an order on mapping keys and employ
alias
nodes to indicate a subsequent occurrence of a previously
encountered node. The result of this process is a serialization tree, where each
node
has an ordered set of children. This tree can be traversed for a
serial event based API. Construction of native structures from
the serial interface should not use key
order or anchors for the preservation of important data.

Tag resolution is specific to the application, hence a YAML processor should provide a mechanism
allowing the application to
specify the tag resolution rules. It is recommended that nodes having
the “!” non-specific tag should be resolved as
“tag:yaml.org,2002:seq”,
“tag:yaml.org,2002:map” or
“tag:yaml.org,2002:str” depending on the node's kind.
This convention allows the author of a YAML character stream to
exert some measure of control over the tag resolution process. By
explicitly specifying a plain scalar has the
“!” non-specific tag, the node is resolved as a string,
as if it was quoted or written in a block style. Note,
however, that each application may override this
behavior. For example, an application may automatically detect
the type of programming language used in source code presented as a non-plainscalar and resolve it accordingly.

When a node has more than one occurence (using an anchor and
alias
nodes), tag resolution must depend only on the path to the
first occurence of the node. Typically, the path leading to a node is
sufficient to determine its specific tag. In cases where the path
does not imply a single specific tag, the resolution also needs to
consider the node content to select amongst the set of possible
tags.
Thus, plain scalars may be matched against a set of
regular expressions to provide automatic resolution of integers,
floats, timestamps and similar types. Similarly, the content of
mapping
nodes may be matched against sets of expected keys to
automatically resolve points, complex numbers and similar types.

The combined effect of these rules is to ensure that tag resolution
can be performed as soon as a node is first encountered in
the stream, typically before its content is
parsed. Also, tag resolution only
requires refering to a relatively small number of previously parsed
nodes. Thus, tag resolution in one-pass processors is both possible and
practical.

A production matching one or more characters starting and ending
with a non-break
character.

s-

A production matching one or more characters starting and ending
with a space character.

ns-

A production matching one or more characters starting and ending
with a non-space character.

X-Y-

A production matching a sequence of one or more characters,
starting with an X-
character and ending with a
Y- character.

l-

A production matching one or more lines (shorthand for
s-b-).

X+,
X-Y+

A production as above, with the additional property that the
indentation level
used is greater than the specified n parameter.

Productions are generally introduced in a “bottom-up” order;
basic productions are specified before the more complex productions using
them. Examples accompanying the productions list display sample YAML text
side-by-side with equivalent YAML text using only flow collections and
double quoted
scalars. For improved readability, the equivalent YAML text
uses the “!!seq”, “!!map” and
“!!str” shorthands instead of the verbatim “!<tag:yaml.org,2002:seq>”,
“!<tag:yaml.org,2002:map>” and
“!<tag:yaml.org,2002:str>” forms. These types are
used to resolve all untagged nodes, except for a few
examples that use the “!!int” and “!!float”
types.

4.1.Â Characters

4.1.1.Â Character Set

YAML streams
use the printable
subset of the Unicode character set. On input, a YAML processor must accept all printable
ASCII characters, the space, tab,
line break, and
all Unicode characters beyond #x9F. On output, a YAML processor must only produce these
acceptable characters, and should also escape all non-printable Unicode
characters. The allowed character range explicitly excludes the
surrogate block #xD800-#xDFFF, DEL
#x7F, the C0 control block
#x0-#x1F, the C1 control block
#x80-#x9F, #xFFFE and
#xFFFF. Any such characters must be presented using escape
sequences.

4.1.2.Â Character Encoding

All characters mentioned in this specification are Unicode code
points. Each such code point is written as one or more octets
depending on the character
encoding used. Note that in UTF-16, characters above
#xFFFF are written as four octets, using a
surrogate pair. A YAML processor must support the UTF-16 and
UTF-8 character encodings. If a character stream does not begin with a byte order mark
(#FEFF), the character encoding shall be
UTF-8. Otherwise it shall be either UTF-8, UTF-16 LE or UTF-16 BE
as indicated by the byte order mark. On output, it is recommended
that a byte order mark should only be emitted for UTF-16 character
encodings. Note that the UTF-32 encoding is explicitly not
supported. For more information about the byte order mark and the
Unicode character encoding schemes see the Unicode FAQ.

4.1.5.Â Miscellaneous Characters

An ignored space character outside scalar content. Such spaces are
used for indentation and separation between tokens. To maintain
portability, tab characters
must not be used in these cases, since different systems treat
tabs differently. Note that most modern editors may be
configured so that pressing the tab key results in the
insertion of an appropriate number of spaces.

A URI character for tags, as specified in RFC2396 with
the addition of the “[” and “]” for presenting
IPv6 addresses as proposed in RFC2732. A
limited form of 8-bit escaping is available using the “%”
character. By convention, URIs containing 16 and 32 bit Unicode
characters are encoded in UTF-8, and then each octet is
written as a separate character.

4.2.Â Syntax Primitives

4.2.1.Â Production Parameters

As YAML's syntax is designed for maximal readability, it makes heavy
use of the context that each syntactical entity appears in. For
notational compactness, this is expressed using parameterized BNF
productions. The set of parameters and the range of allowed values
depend on the specific production. The full list of possible
parameters and their values is:

Indentation: n or m

Since the character stream depends upon indentation level to
delineate blocks, many productions are parameterized by it. In
some cases, the notations “production(<n)”,
“production(≤n)” and
“production(>n)” are used; these are
shorthands for “production(m)” for some specific
m where
0Â ≤Â mÂ <Â n,
0Â ≤Â mÂ ≤Â n and
mÂ >Â n,
respectively.

4.2.2.Â Indentation Spaces

In a YAML character stream, structure is often determined
from indentation,
where indentation is defined as a line break character (or the start of the
stream)
followed by zero or more space characters. Note that indentation
must not contain any tab
characters. The amount of indentation is a presentation detail used
exclusively to delineate structure and is otherwise ignored. In
particular, indentation characters must never be considered part of
a node's
content information.

In general, a node must be indented further than its
parent node. All
sibling nodes
must use the exact same indentation level, however the content of each
sibling node may
be further indented independently. The “-”, “?” and “:” characters used to denote
block
collection entries are perceived by people to be part of
the indentation. Hence the indentation rules are slightly more
flexible when dealing with these indicators. First, a block
sequence need not be indented relative to its parent
node, unless
that node is
also a block sequence. Second, compact in-line
notations allow a nested collection to begin immediately
following the indicator (where
the indicator is counted as
part of the indentation). This provides for an intuitive collection nesting
syntax.

YAML usually allows separation spaces to include a comment ending
the line and additional comment lines. Note that the token
following the separation comment lines must be properly
indented, even
though there is no such restriction on the separation comment lines
themselves.

Each directive is specified on a separate non-indented line starting with
the “%”
indicator, followed by the directive name and a
space-separated list of parameters. The semantics of these tokens
depend on the specific directive. A YAML processor should ignore unknown
directives with an appropriate warning.

4.3.1.1.Â “YAML” Directive

The “YAML”
directive specifies the version of YAML the document adheres
to. This specification defines version “1.1”. A
version 1.1 YAML processor
should accept documents with an explicit
“%YAML 1.1” directive, as well as documents lacking
a “YAML” directive. Documents with a
“YAML” directive specifying a higher minor version
(e.g. “%YAMLÂ 1.2”) should be processed with
an appropriate warning. Documents with a
“YAML” directive specifying a higher major version
(e.g. “%YAMLÂ 2.0”) should be rejected with an
appropriate error message.

If the prefix begins with a character other than “!”, it
must to be a valid URI prefix, and should contain at
least the scheme and the authority. Shorthands using the associated
handle are
expanded to globally unique URI tags, and their semantics
is consistent across applications. In
particular, two documents in different
streams must assign the same
semantics to the same global tag.

The primary tag
handle is a single “!” character. This
allows using the most compact possible notation for a
single “primary” name space. By default, the
prefix associated with this handle is “!”.
Thus, by default, shorthands using this handle are
interpreted as local
tags. It is possible to override this behavior
by providing an explicit “TAG” directive
associating a different prefix for this handle. This
provides smooth migration from using local tags to using
global tags by a
simple addition of a single “TAG”
directive.

The secondary tag
handle is written as “!!”. This allows
using a compact notation for a single
“secondary” name space. By default, the
prefix associated with this handle is
“tag:yaml.org,2002:” used by the YAML tag
repository providing recommended tags for increasing the portability of
YAML documents between different
applications. It
is possible to override this behavior by providing an
explicit “TAG” directive associating a
different prefix for this handle.

When YAML is used as the format of a communication channel, it is
useful to be able to indicate the end of a document without
closing the stream, independent of starting the
next document. Lacking such a marker, the
YAML processor reading the
stream would
be forced to wait for the header of the next document (that may
be long time in coming) in order to detect the end of the previous
one. To support this scenario, a YAML document may be terminated by an
explicit end line denoted by “...”, followed by
optional comments. To ease the task of
concatenating YAML streams, the end marker may be
repeated.

4.3.4.Â Complete Stream

A sequence of bytes is a YAML character stream if, taken as a whole, it
complies with the l-yaml-stream
production. The stream begins with a prefix containing an optional
byte order mark
denoting its character
encoding, followed by optional comments. Note that the stream may
contain no documents, even if it contains a
non-empty prefix. In particular, a stream containing no chareacters
is valid and contains no documents.

4.4.1.Â Node Anchors

The anchor
property marks a node for future reference. An anchor
is denoted by the “&” indicator. An alias node can then be
used to indicate additional inclusions of the anchored node by
specifying its anchor. An anchored node need not be referenced by
any alias
node; in particular, it is valid for all nodes to be anchored.

4.4.4.Â Alias Nodes

Subsequent occurrences of a previously serialized node are presented as alias nodes, denoted by the “*” indicator. The first
occurrence of the node must be marked by an anchor to allow
subsequent occurrences to be presented as alias nodes. An alias node
refers to the most recent preceding node having the same anchor. It is an
error to have an alias node use an anchor that does not previously occur
in the document. It is not an error to
specify an anchor that is not used by any alias
node. Note that an alias node must not specify any properties or content, as these
were already specified at the first occurrence of the node.

4.5.1.Â Flow Scalar Styles

All flow
scalar styles may span multiple lines, except when used
in simple keys. Flow scalars
are subject to (flow) line
folding. This allows flow scalar content to be broken
anywhere a single space character (#x20)
separates non-space characters, at the cost of requiring an empty line to present each line feed character.

4.5.1.2.Â Single Quoted

The single quoted style is specified by
surrounding “'” indicators. Therefore, within
a single quoted scalar such characters need to be repeated. This
is the only form of escaping performed in single quoted scalars. In
particular, the “\” and “"” characters may
be freely used. This restricts single quoted scalars to printable characters.

All leading and trailing white
space of inner lines is excludced from the content. Note that
while prefix white
space may contain tab
characters, line indentation is restricted to space characters
only. Unlike double quoted scalars, it is
impossible to force the inclusion of the leading or trailing
spaces in the content. Therefore, single quoted
scalars lines can only be broken where a single space character
separates two non-space characters.

The first plain character is further restricted to avoid most
indicators as these would
cause ambiguity with various YAML structures. However, the first
character may be “-”, “?” or “:” provided it is followed by a
non-space character.

Thus, a single line plain scalar is a sequence of valid plain
non-breakprintable
characters, beginning and ending with non-space character and not
conflicting with a document boundary markers. All characters are
considered content, including any inner space
characters.

In a multi-line plain scalar, line breaks are subject to (flow) line folding. Any prefix and trailing
spaces are excluded from the content. Like single quoted
scalars, in plain scalars it is impossible to force the
inclusion of the leading or trailing spaces in the content.
Therefore, plain scalars lines can only be broken where a single
space character separates two non-space characters.

Folded content may start with either line type. If the content begins
with a “more indented” line (starting with spaces),
an indentation
indicator must be specified in the block header. Note
that leading empty lines
and empty lines
separating lines of a different type are never folded.

A simple key has no
identifying mark. It is recognized as being a key either
due to being inside a flow mapping, or by being followed by
an explicit value. Hence, to avoid unbound lookahead in
YAML processors,
simple keys are restricted to a single line and must not
span more than 1024 stream characters (hence the
need for the flow-key
context). Note the 1024 character limit is in
terms of Unicode characters rather than stream octets, and
that it includes the separation following the key itself.

YAML also allows omitting the surrounding “{” and
“}” characters when nesting a flow mapping in a
flow
sequence if the mapping consists of a single
key:Â value pair and neither the mapping nor the
key have any properties specified. In this case, only
three of the combinations may be used, to prevent ambiguity.

In an explicit key entry, value nodes begin on a separate
line and are denoted by by the “:” character. Here again
YAML allows the use of the inline compact notation which
case the “:” character is considered
part of the values's indentation.

YAML allows the “?” character to be omitted
for simple keys.
Similarly to flow mapping, such a key is recognized by a
following “:” character. Again, to
avoid unbound lookahead in YAML processors, simple keys are
restricted to a single line and must not span more than
1024 stream characters. Again, this
limit is in terms of Unicode characters rather than stream
octets, and includes the separation following the key, if any.

In a simple key entry, an explicit value node may be presented in the same line. Note
however that in this case, the key is not considered to be a
form of indentation, hence the compact in-line
notation must not be used. The value following the simple key
may also be completely empty.