Abstract

This module describes, in
general terms, the basic structure and syntax of CSS stylesheets. It
defines, in detail, the syntax and parsing of CSS - how to turn a stream
of bytes into a meaningful stylesheet. CSS is a language for describing the
rendering of structured documents (such as HTML and XML) on screen, on
paper, in speech, etc.

Status of this document

This section describes the status of this document at the time of
its publication. Other documents may supersede this document. A list of
current W3C publications and the latest revision of this technical report
can be found in the W3C technical reports
index at http://www.w3.org/TR/.

A Candidate Recommendation is a document that has been widely reviewed
and is ready for implementation. W3C encourages everybody to implement
this specification and return comments to the (archived)
public mailing list
www-style@w3.org (see instructions). When sending
e-mail, please put the text “css-syntax-3” in the subject, preferably
like this: “[css-syntax-3] …summary of
comment…”

Publication as a Candidate Recommendation does not imply endorsement by
the W3C Membership. This is a draft document and may be updated, replaced
or obsoleted by other documents at any time. It is inappropriate to cite
this document as other than work in progress.

1. Introduction

This module defines the abstract syntax and parsing of CSS stylesheets
and other things which use CSS syntax (such as the HTML style
attribute).

It defines algorithms for converting a stream of Unicode code points
(in other words, text) into a stream of CSS tokens, and then further into
CSS objects such as stylesheets, rules, and declarations.

1.1. Module interactions

This module defines the syntax and parsing of CSS stylesheets. It
supersedes the lexical scanner and grammar defined in CSS 2.1.

2.
Description of CSS’s Syntax

This section is not normative.

A CSS document is a series of qualified rules, which
are usually style rules that apply CSS properties to elements, and at-rules, which
define special processing rules or values for the CSS document.

A qualified rule starts with a prelude then has a {}-wrapped block
containing a sequence of declarations. The meaning of the prelude varies
based on the context that the rule appears in - for style rules, it’s a
selector which specifies what elements the declarations will apply to.
Each declaration has a name, followed by a colon and the declaration
value. Declarations are separated by semicolons.

A typical rule might look something like this:

p > a {
color: blue;
text-decoration: underline;
}

In the above rule, "p > a" is the selector, which, if
the source document is HTML, selects any <a> elements
that are children of a <p> element.

"color: blue;" is a declaration specifying that, for the
elements that match the selector, their color property should have the value blue. Similarly, their text-decoration property should have the value
underline.

At-rules are all different, but they have a basic structure in common.
They start with an "@" code point followed by their name. Some at-rules are simple
statements, with their name followed by more CSS values to specify their
behavior, and finally ended by a semicolon. Others are blocks; they can
have CSS values following their name, but they end with a {}-wrapped
block, similar to a qualified rule. Even the contents of these
blocks are specific to the given at-rule: sometimes they contain a sequence of
declarations, like a qualified rule; other times, they may contain
additional blocks, or at-rules, or other structures altogether.

Here are several examples of at-rules that illustrate the varied syntax they may
contain.

@import "my-styles.css";

The @importat-rule is a simple statement. After its name, it takes
a single string or url() function to indicate the stylesheet that it
should import.

@page :left {
margin-left: 4cm;
margin-right: 3cm;
}

The @pageat-rule consists of
an optional page selector (the :left pseudoclass), followed by a block of
properties that apply to the page when printed. In this way, it’s very
similar to a normal style rule, except that its properties don’t apply
to any "element", but rather the page itself.

@media print {
body { font-size: 10pt }
}

The @mediaat-rule begins with a media type and a list of optional
media queries. Its block contains entire rules, which are only applied
when the @medias conditions are fulfilled.

Property names and at-rule names are always identifiers, which have to start
with a letter or a hyphen followed by a letter, and then can contain
letters, numbers, hyphens, or underscores. You can include any code point at
all, even ones that CSS uses in its syntax, by escaping it.

The syntax of selectors is defined in the Selectors spec. Similarly, the
syntax of the wide variety of CSS values is defined in the Values & Units spec. The
special syntaxes of individual at-rules can be found in the specs that define them.

Or one to six hex digits, followed by an optional whitespace.
The escape sequence is replaced by the Unicode code point whose value is given
by the hexadecimal digits. This optional whitespace allow hexadecimal
escape sequences to be followed by "real" hex digits.

An identifier with the value "&B" could be written
as \26 B or
\000026B.

A "real" space after the escape sequence must be doubled.

2.2. Error Handling

This section is not normative.

When errors occur in CSS, the parser attempts to recover gracefully,
throwing away only the minimum amount of content before returning to
parsing as normal. This is because errors aren’t always mistakes - new
syntax looks like an error to an old parser, and it’s useful to be able
to add new syntax to the language without worrying about stylesheets that
include it being completely broken in older UAs.

The precise error-recovery behavior is detailed in the parser itself,
but it’s simple enough that a short description is fairly accurate:

At the "top level" of a stylesheet, an <at-keyword-token> starts an
at-rule. Anything else starts a qualified rule, and is included in the
rule’s prelude. This may produce an invalid selector, but that’s not
the concern of the CSS parser — at worst, it means the selector will
match nothing.

Once an at-rule starts, nothing is invalid from the parser’s
standpoint; it’s all part of the at-rule’s prelude. Encountering a <semicolon-token> ends the at-rule
immediately, while encountering an opening curly-brace <{-token> starts the at-rule’s
body. The at-rule seeks forward, matching blocks (content surrounded by
(), {}, or []) until it finds a closing curly-brace <}-token> that isn’t matched by
anything else or inside of another block. The contents of the at-rule are
then interpreted according to the at-rule’s own grammar.

Qualified rules work similarly, except that semicolons don’t end
them; instead, they are just taken in as part of the rule’s prelude.
When the first {} block is found, the contents are always interpreted as
a list of declarations.

When interpreting a list of declarations, unknown syntax at any point
causes the parser to throw away whatever declaration it’s currently
building, and seek forward until it finds a semicolon (or the end of the
block). It then starts fresh, trying to parse a declaration again.

If the stylesheet ends while any rule, declaration, function, string,
etc. are still open, everything is automatically closed. This doesn’t
make them invalid, though they may be incomplete and thus thrown away
when they are verified against their grammar.

3.
Tokenizing and Parsing CSS

User agents must use the parsing rules described in this specification
to generate the CSSOM trees from text/css resources. Together, these rules
define what is referred to as the CSS parser.

This specification defines the parsing rules for CSS documents, whether
they are syntactically correct or not. Certain points in the parsing
algorithm are said to be a parse errors. The error handling for parse errors is
well-defined: user agents must either act as described below when
encountering such problems, or must abort processing at the first error
that they encounter for which they do not wish to apply the rules
described below.

Conformance checkers must report at least one parse error condition to
the user if one or more parse error conditions exist in the document and
must not report parse error conditions if none exist in the document.
Conformance checkers may report more than one parse error condition if
more than one parse error condition exists in the document. Conformance
checkers are not required to recover from parse errors, but if they do,
they must recover in the same way as user agents.

3.1.
Overview of the Parsing Model

The input to the CSS parsing process consists of a stream of Unicode code points,
which is passed through a tokenization stage followed by a tree
construction stage. The output is a CSSStyleSheet object.

Note: Implementations that do not support scripting do not
have to actually create a CSSOM CSSStyleSheet object, but the CSSOM tree
in such cases is still used as the model for the rest of the
specification.

3.2.
The input byte stream

When parsing a stylesheet, the stream of Unicode code points that comprises the
input to the tokenization stage might be initially seen by the user agent
as a stream of bytes (typically coming over the network or from the local
file system). If so, the user agent must decode these bytes into code points
according to a particular character encoding.

To decode the stream of bytes into a stream of code points, UAs must use the
decode algorithm defined in [ENCODING], with the fallback encoding determined as
follows.

Note: The decode algorithm gives precedence to a byte order mark
(BOM), and only uses the fallback when none is found.

To determine the fallback encoding:

If HTTP or equivalent protocol defines an encoding (e.g. via the
charset parameter of the Content-Type header), get an
encoding[ENCODING] for the specified value. If that does not
return failure, use the return value as the fallback encoding.

Otherwise, check the byte stream. If the first 1024 bytes of the
stream begin with the hex sequence

40 63 68 61 72 73 65 74 20 22 XX* 22 3B

where each XX byte is a value between 016 and
2116 inclusive or a value between 2316 and
7F16 inclusive, then get an encoding for
the sequence of XX bytes, interpreted as
ASCII.
What does that byte sequence mean?

The byte sequence above, when decoded as ASCII, is the string "@charset "…";", where the
"…" is the sequence of bytes corresponding to the encoding’s label.

If the return value was utf-16be or
utf-16le, use utf-8 as the fallback encoding;
if it was anything else except failure, use the return value as the
fallback encoding.
Why use utf-8 when the declaration says
utf-16?

The bytes of the encoding declaration spell out “@charset "…";” in ASCII,
but UTF-16 is not ASCII-compatible. Either you’ve typed in complete
gibberish (like 䁣桡牳整•utf-16be∻) to get the
right bytes in the document, which we don’t want to encourage, or your
document is actually in an ASCII-compatible encoding and your encoding
declaration is lying.

Either way, defaulting to UTF-8 is a decent answer.

As well, this mimics the behavior of HTML’s <meta
charset> attribute.

Note: Note that the syntax of an encoding declaration
looks like the syntax of an @charset rule, but it’s actually much more
restrictive. A number of things you can do in CSS that would produce a
valid @charset rule, such as
using multiple spaces, comments, or single quotes, will cause the
encoding declaration to not be recognized. This behavior keeps the
encoding declaration as simple as possible, and thus maximizes the
likelihood of it being implemented correctly.

Otherwise, if an environment encoding is provided by the
referring document, use that as the fallback encoding.

Otherwise, use utf-8 as the fallback encoding.

Though UTF-8 is the default encoding for the web, and many newer
web-based file formats assume or require UTF-8 encoding, CSS was created
before it was clear which encoding would win, and thus can’t
automatically assume the stylesheet is UTF-8.

Stylesheet authors should author their stylesheets in UTF-8,
and ensure that either an HTTP header (or equivalent method) declares the
encoding of the stylesheet to be UTF-8, or that the referring document
declares its encoding to be UTF-8. (In HTML, this is done by adding a
<meta charset=utf-8> element to the head of the
document.)

If neither of these options are available, authors should begin the
stylesheet with a UTF-8 BOM or the exact characters

@charset "utf-8";

Document languages that refer to CSS stylesheets that are decoded from
bytes may define an environment encoding for each such stylesheet, which is
used as a fallback when other encoding hints are not available or can not
be used.

The concept of environment encoding only exists for
compatibility with legacy content. New formats and new linking mechanisms
should not provide an environment
encoding, so the stylesheet defaults to UTF-8 instead in the absence
of more explicit information.

3.3.
Preprocessing the input stream

The input stream consists of the code points pushed into it as
the input byte stream is decoded.

Before sending the input stream to the tokenizer, implementations must
make the following code point substitutions:

Replace any U+000D CARRIAGE RETURN (CR) code point, U+000C FORM FEED
(FF) code
point, or pairs of U+000D CARRIAGE RETURN (CR) followed by U+000A
LINE FEED (LF) by a single U+000A LINE FEED (LF) code point.

Replace any U+0000 NULL code point with U+FFFD REPLACEMENT CHARACTER
(�).

4. Tokenization

Implementations must act as if they used the following algorithms to
tokenize CSS. To transform a stream of code points into a stream of
tokens, repeatedly consume a token until an <EOF-token> is reached, collecting the
returned tokens into a stream. Each call to the consume a token
algorithm returns a single token, so it can also be used "on-demand" to
tokenize a stream of code pointsduring parsing, if so
desired.

Note: The type flag of hash tokens is used in the Selectors
syntax [SELECT]. Only hash tokens with the "id"
type are valid ID
selectors.

Note: As a technical note, the tokenizer defined here
requires only three code points of look-ahead. The tokens it produces
are designed to allow Selectors to be parsed with one token of look-ahead,
and additional tokens may be added in the future to maintain this
invariant.

4.1. Token Railroad
Diagrams

This section is non-normative.

This section presents an informative view of the tokenizer, in the form
of railroad diagrams. Railroad diagrams are more compact than an explicit
parser, but often easier to read than an regular expression.

These diagrams are informative and incomplete; they
describe the grammar of "correct" tokens, but do not describe
error-handling at all. They are provided solely to make it easier to get
an intuitive grasp of the syntax of each token.

Diagrams with names such as <foo-token> represent
tokens. The rest are productions referred to by other diagrams.

If the next input code point is U+002A
ASTERISK (*), consume it and all following code points up to and
including the first U+002A ASTERISK (*) followed by a U+002F SOLIDUS (/),
or up to an EOF code point. Then consume a token and
return it.

Note: This algorithm assumes that the initial "u+" has been
consumed, and the next code point verified to be a hex digit or a "?".

Execute the following steps in order:

Consume as many hex digits as possible, but no more than 6. If
less than 6 hex digits were consumed, consume as many U+003F
QUESTION MARK (?) code points as possible, but no more than enough
to make the total of hex digits and U+003F QUESTION MARK (?) code points
equal to 6.

This section describes how to consume an escaped code point. It
assumes that the U+005C REVERSE SOLIDUS (\) has already been consumed and
that the next input code point has already been verified to not be a newline. It will
return a code
point.

This section describes how to consume a name from a stream of code points. It returns a
string containing the largest name that can be formed from adjacent code points
in the stream, starting from the first.

This section describes how to consume a number from a stream of code points.
It returns a 3-tuple of a string representation, a numeric value, and a
type flag which is either "integer" or "number".

Note: This algorithm does not do the verification of the
first few code points that are necessary to ensure a number
can be obtained from the stream. Ensure that the stream starts with a number before calling this
algorithm.

Execute the following steps in order:

Initially set repr to the empty string and type
to "integer".

If the next input code point is U+002B PLUS
SIGN (+) or U+002D HYPHEN-MINUS (-), consume it and append it to
repr.

If the next 2 or 3 input code points are
U+0045 LATIN CAPITAL LETTER E (E) or U+0065 LATIN SMALL LETTER E (e),
optionally followed by U+002D HYPHEN-MINUS (-) or U+002B PLUS SIGN (+),
followed by a digit,
then:

This section describes how to convert a string to a number. It
returns a number.

Note: This algorithm does not do any verification to ensure
that the string contains only a number. Ensure that the string contains
only a valid CSS number before calling this algorithm.

Divide the string into seven components, in order from left to right:

A sign: a single U+002B PLUS SIGN (+) or U+002D HYPHEN-MINUS
(-), or the empty string. Let s be the number -1 if the sign
is U+002D HYPHEN-MINUS (-); otherwise, let s be the number 1.

An integer part: zero or more digits. If there is at least one digit,
let i be the number formed by interpreting the digits as a
base-10 integer; otherwise, let i be the number 0.

A decimal point: a single U+002E FULL STOP (.), or the empty
string.

A fractional part: zero or more digits. If there is at least one digit,
let f be the number formed by interpreting the digits as a
base-10 integer and d be the number of digits; otherwise, let
f and d be the number 0.

An exponent indicator: a single U+0045 LATIN CAPITAL LETTER E
(E) or U+0065 LATIN SMALL LETTER E (e), or the empty string.

An exponent sign: a single U+002B PLUS SIGN (+) or U+002D
HYPHEN-MINUS (-), or the empty string. Let t be the number -1
if the sign is U+002D HYPHEN-MINUS (-); otherwise, let t be
the number 1.

An exponent: zero or more digits. If there is at least one digit, let e
be the number formed by interpreting the digits as a base-10 integer;
otherwise, let e be the number 0.

This section describes how to consume the remnants of a bad url from
a stream of code points, "cleaning up" after the tokenizer
realizes that it’s in the middle of a <bad-url-token> rather than a <url-token>. It returns nothing; its sole
use is to consume enough of the input stream to reach a recovery point
where normal tokenizing can resume.

5. Parsing

The input to the parsing stage is a stream or list of tokens from the
tokenization stage. The output depends on how the parser is invoked, as
defined by the entry points listed later in this section. The parser
output can consist of at-rules, qualified rules, and/or declarations.

The parser’s output is constructed according to the fundamental
syntax of CSS, without regards for the validity of any specific item.
Implementations may check the validity of items as they are returned by
the various parser algorithms and treat the algorithm as returning nothing
if the item was invalid according to the implementation’s own grammar
knowledge, or may construct a full tree as specified and "clean up"
afterwards by removing any invalid items.

The items that can appear in the tree are:

at-rule

An at-rule has a name, a prelude consisting of a list of component
values, and an optional block consisting of a simple {} block.

Note: This specification places no limits on what an
at-rule’s block may contain. Individual at-rules must define whether
they accept a block, and if so, how to parse it (preferably using one of
the parser algorithms or entry points defined in this specification).

qualified
rule

A qualified rule has a prelude consisting of a list of component
values, and a block consisting of a simple {} block.

A declaration has a name, a value consisting of a list of component
values, and an important flag which is initially unset.

Declarations are further categorized as "properties" or
"descriptors", with the former typically appearing in qualified rules and the latter appearing in
at-rules. (This
categorization does not occur at the Syntax level; instead, it is a
product of where the declaration appears, and is defined by the
respective specifications defining the given rule.)

component
value

A component value is one of the preserved tokens, a function, or a
simple block.

Note: The non-preserved tokens listed above are always
consumed into higher-level objects, either functions or simple blocks,
and so never appear in any parser output themselves.

Note: The tokens <}-token>s, <)-token>s, <]-token>, <bad-string-token>, and <bad-url-token> are always parse
errors, but they are preserved in the token stream by this specification
to allow other specs, such as Media Queries, to define more
fine-grainted error-handling than just dropping an entire declaration or
block.

function

A function has a name and a value consisting of a list of component
values.

simple block

A simple block has an associated token (either a <[-token>, <(-token>, or <{-token>) and a value consisting
of a list of component values.

5.1. Parser Railroad
Diagrams

This section is non-normative.

This section presents an informative view of the parser, in the form of
railroad diagrams. Railroad diagrams are more compact than a
state-machine, but often easier to read than a regular expression.

These diagrams are informative and incomplete; they
describe the grammar of "correct" stylesheets, but do not describe
error-handling at all. They are provided solely to make it easier to get
an intuitive grasp of the syntax.

Stylesheet

<whitespace-token>
<CDC-token>
<CDO-token> Qualified rule
At-rule

Rule list

<whitespace-token> Qualified rule
At-rule

At-rule

<at-keyword-token> Component
value {}
block
;

Qualified rule

Component
value {}
block

Declaration list

ws* Declaration
; Declaration list At-rule Declaration list

Declaration

<ident-token> ws*
: Component
value
!important

!important

! ws* <ident-token
"important"> ws*

Component value

Preserved token {}
block ()
block []
block Function
block

{} block

{ Component
value
}

() block

( Component
value
)

[] block

[ Component
value
]

Function block

<function-token> Component
value
)

5.2.
Definitions

current
input token

The token or component value currently being operated on,
from the list of tokens produced by the tokenizer.

When two strings are to be matched ASCII case-insensitively,
temporarily convert both of them to ASCII lower-case form by adding 32
(0x20) to the value of each code point between U+0041 LATIN CAPITAL LETTER A
(A) and U+005A LATIN CAPITAL LETTER Z (Z), inclusive, and check if this
results in identical sequences of code point.

5.3.
Parser Entry Points

The algorithms defined in this section produce high-level CSS objects
from lower-level objects. They assume that they are invoked on a token
stream, but they may also be invoked on a string; if so, first perform input preprocessing to produce a code point
stream, then perform tokenization to produce a
token stream.

Note: Despite the name, this actually parses a mixed list of
declarations and at-rules, as CSS 2.1 does for @page. Unexpected at-rules
(which could be all of them, in a given context) are invalid and should be
ignored by the consumer.

5.4.
Parser Algorithms

The following algorithms comprise the parser. They are called by the
parser entry points above.

These algorithms may be called with a list of either tokens or of
component values. (The difference being that some tokens are replaced by
functions and
simple
blocks in a list of component values.) Similar to how the input stream
returned EOF code points to represent when it was empty during the
tokenization stage, the lists in this stage must return an <EOF-token> when the next token is requested
but they are empty.

An algorithm may be invoked with a specific list, in which case it
consumes only that list (and when that list is exhausted, it begins
returning <EOF-token>s).
Otherwise, it is implicitly invoked with the same list as the invoking
algorithm.

Several things in CSS, such as the :nth-child() pseudoclass, need to indicate
indexes in a list. The An+B microsyntax is
useful for this, allowing an author to easily indicate single elements or
all elements at regularly-spaced intervals in a list.

The An+B notation defines an integer step
(A) and offset (B), and represents the An+Bth elements in a list, for every positive
integer or zero value of n, with the first element in the list
having index 1 (not 0).

For values of A and B greater than 0, this effectively divides the
list into groups of A elements (the last group
taking the remainder), and selecting the Bth
element of each group.

The An+B notation also accepts the even and odd keywords, which have the same
meaning as 2n and
2n+1,
respectively.

Examples:

2n+0 /* represents all of the even elements in the list */
even /* same */
4n+1 /* represents the 1st, 5th, 9th, 13th, etc. elements in the list */

The values of A and B can be negative, but only the positive results
of An+B, for n ≥ 0, are used.

Example:

-n+6 /* represents the first 6 elements of the list */

If both A and B
are 0, the pseudo-class represents no element in the list.

6.1. Informal Syntax
Description

This section is non-normative.

When A is 0, the An part may be
omitted (unless the B part is already
omitted). When An is not included and B is non-negative, the + sign before B (when allowed) may also be omitted. In this
case the syntax simplifies to just B.

The An+B notation was originally defined
using a slightly different tokenizer than the rest of CSS, resulting in a
somewhat odd definition when expressed in terms of CSS tokens. This
section describes how to recognize the An+B
notation in terms of CSS tokens (thus defining the <an+b> type for CSS grammar
purposes), and how to interpret the CSS tokens to obtain values for A and B.

A is the dimension’s value or 1,
respectively. B is the dimension’s unit or
ident’s value, respectively, with the first code point removed and the
remainder interpreted as a base-10 number. B is
negative.

A is the dimension’s value, 1, or -1,
respectively. B is the integer’s value. If
a - was provided
between the two, B is instead the negation of
the integer’s value.

7. Defining Grammars for Rules and
Other Values

The Values spec defines
how to specify a grammar for properties. This section does the same, but
for rules.

Just like in property grammars, the notation <foo>
refers to the "foo" grammar term, assumed to be defined elsewhere.
Substituting the <foo> for its definition results in a
semantically identical grammar.

Several types of tokens are written literally, without quotes:

<ident-token>s (such as auto, disc, etc), which are simply
written as their value.

<at-keyword-token>s, which are
written as an @ character followed by the token’s value, like "@media".

<function-token>s, which are written
as the function name followed by a ( character, like "translate(".

Although it is possible, with escaping, to construct an <ident-token>
whose value ends with ( or starts with @, such a
tokens is not a <function-token> or an <at-keyword-token> and does not match
corresponding grammar definitions. <delim-token>s are written with their
value enclosed in single quotes. For example, a <delim-token>
containing the "+" code point is written as '+'.
Similarly, the <[-token> and <]-token>s must be written in
single quotes, as they’re used by the syntax of the grammar itself to
group clauses. <whitespace-token> is never indicated
in the grammar; <whitespace-token>s are allowed
before, after, and between any two tokens, unless explicitly specified
otherwise in prose definitions. (For example, if the prelude of a rule is
a selector, whitespace is significant.)

When defining a function or a block, the ending token must be specified
in the grammar, but if it’s not present in the eventual token stream, it
still matches.

The CSS parser parses this as a style rule containing one declaration,
whose value is a function named "translate". This matches the above
grammar, even though the ending token didn’t appear in the token
stream, because by the time the parser is finished, the presence of the
ending token is no longer possible to determine; all you have is the fact
that there’s a block and a function.

The CSS parser is agnostic as to the contents of blocks, such as those
that come at the end of some at-rules. Defining the generic grammar of the
blocks in terms of tokens is non-trivial, but there are dedicated and
unambiguous algorithms defined for parsing this.

The <declaration-list> production represents a list
of declarations. It may only be used in grammars as the sole value in a
block, and represents that the contents of the block must be parsed using
the consume a list of declarations
algorithm.

Similarly, the <rule-list> production represents a list of
rules, and may only be used in grammars as the sole value in a block. It
represents that the contents of the block must be parsed using the consume a list of rules algorithm.

Finally, the <stylesheet> production represents a list of
rules. It is identical to <rule-list>, except that blocks using it
default to accepting all rules that aren’t otherwise limited to a
particular context.

For example, the @font-face rule is defined to have an empty
prelude, and to contain a list of declarations. This is expressed with the
following grammar:

For rules that use <declaration-list>, the spec for the
rule must define which properties, descriptors, and/or at-rules are valid
inside the rule; this may be as simple as saying "The @foo rule accepts
the properties/descriptors defined in this specification/section.", and
extension specs may simply say "The @foo rule additionally accepts the
following properties/descriptors.". Any declarations or at-rules found
inside the block that are not defined as valid must be removed from the
rule’s value.

Within a <declaration-list>,
!important is automatically invalid on any descriptors. If
the rule accepts properties, the spec for the rule must define whether the
properties interact with the cascade, and with what specificity. If they
don’t interact with the cascade, properties containing
!important are automatically invalid; otherwise using
!important is valid and has its usual effect on the cascade
origin of the property.

For example, the grammar for @font-face in the previous example must, in
addition to what is written there, define that the allowed declarations
are the descriptors defined in the Fonts spec.

For rules that use <rule-list>, the
spec for the rule must define what types of rules are valid inside the
rule, same as <declaration-list>, and unrecognized
rules must similarly be removed from the rule’s value.

For example, the grammar for @keyframes in the previous
example must, in addition to what is written there, define that the only
allowed rules are <keyframe-rule>s, which are defined as:

The content of the qualified rule’s block is parsed as a list of declarations. Unless
defined otherwise by another specification or a future level of this
specification, at-rules in that list are invalid and must be ignored.
Declaration for an unknown CSS property or whose value does not match the
syntax defined by the property are invalid and must be ignored. The
validity of the style rule’s contents have no effect on the validity of
the style rule itself. Unless otherwise specified, property names are ASCII case-insensitive.

The @charset rule is an artifact of the algorithm used to determine the fallback
encoding for the stylesheet. That algorithm looks for a specific byte
sequence as the very first few bytes in the file, which has the syntactic
form of an @-rule. Those bytes are not discarded from the input, whether
or not they influence the encoding actually used to process the
stylesheet.

Therefore, the stylesheet parser recognizes an @-rule with the general
syntax

and, for backward compatibility, includes it in the object model for
the stylesheet. Modifying, adding, or removing an @charset rule via the
object model has no effect (in particular it does not
cause the stylesheet to be rescanned in a different encoding). The
@charset rule is invalid if it is not the very first, top-level rule in
the stylesheet, but it is parsed according to the normal syntax for
@-rules, which are less restrictive than the algorithm that determines the
fallback encoding. Therefore, an @charset rule may appear in the object
model even if it was ignored by that algorithm. (For instance, if it was
written with extra whitespace or with single rather than double quotes.)

9. Serialization

The tokenizer described in this specification does not produce tokens
for comments, or otherwise preserve them in any way. Implementations may
preserve the contents of comments and their location in the token stream.
If they do, this preserved information must have no effect on the parsing
step.

This specification does not define how to serialize CSS in general,
leaving that task to the CSSOM and individual feature specifications. In
particular, the serialization of comments an whitespace is not defined.

The only requirement for serialization is that it must "round-trip"
with parsing, that is, parsing the stylesheet must produce the same data
structures as parsing, serializing, and parsing again, except for
consecutive <whitespace-token>s, which may be
collapsed into a single token.

Note: This exception can exist because CSS grammars always
interpret any amount of whitespace as identical to a single space.

To satisfy this requirement:

A <delim-token> containing U+005C REVERSE
SOLIDUS (\) must be serialized as U+005C REVERSE SOLIDUS followed by a
newline. (The
tokenizer only ever emits such a token followed by a <whitespace-token> that starts with
a newline.)

A <hash-token>
with the "unrestricted" type flag may not need as much escaping as the
same token with the "id" type flag.

The unit of a <dimension-token> may need escaping
to disambiguate with scientific notation.

For any consecutive pair of tokens, if the first token shows up in
the row headings of either of the following two tables, and the second
token shows up in the column headings, and there’s a ✗ in the cell
denoted by the intersection of the chosen row and column, the pair of
tokens must be serialized with a comment between them.

If the tokenizer preserves comments, the preserved comment should be
used; otherwise, an empty comment (/**/) must be inserted.
(Preserved comments may be reinserted even if the following tables
don’t require a comment between two tokens.)

Single characters in the row and column headings represent a <delim-token> with that value, except
for "(", which represents a (-token.

Serialize A and append it to
s. Append "n" to s. If B is positive, append "+" to s
Serialize B and append it to s.

Return s.

10. Changes

This section is non-normative.

10.1.
Changes from the 5 November 2013 Last Call Working Draft

The Serialization section has been
rewritten to make only the "round-trip" requirement normative, and move
the details of how to achieve it into a note. Some corner cases in these
details have been fixed.

[ENCODING] has been added to the list
of normative references. It was already referenced in normative text
before, just not listed as such.

In the algorithm to determine the fallback
encoding of a stylesheet, limit the @charset byte sequence to
1024 bytes. This aligns with what HTML does for <meta
charset> and makes sure the size of the sequence is bounded.
This only makes a difference with leading or trailing whitespace in the
encoding label:

@charset " (lots of whitespace) utf-8";

10.2.
Changes from the 19 September 2013 Working Draft

The concept of environment encoding was added. The
behavior does not change, but some of the definitions should be moved to
the relevant specs.

10.3. Changes from CSS 2.1 and
Selectors Level 3

Note: The point of this spec is to match reality; changes
from CSS2.1 are nearly always because CSS 2.1 specified something that
doesn’t match actual browser behavior, or left something unspecified. If
some detail doesn’t match browsers, please let me know as it’s almost
certainly unintentional.

Tokenization does not emit COMMENT or BAD_COMMENT tokens anymore.
BAD_COMMENT is now considered the same as a normal token (not an error).
Serialization is responsible for inserting
comments as necessary between tokens that need to be separated, e.g. two
consecutive <ident-token>s.

The <unicode-range-token> is now more
restrictive, and doesn’t include non-sensical patterns that the 2.1
definition allowed.

The BAD_URI token (now <bad-url-token>) is "self-contained".
In other words, once the tokenizer realizes it’s in a <bad-url-token> rather than a <url-token>, it just seeks forward to look
for the closing ), ignoring everything else. This behavior is simpler
than treating it like a <function-token> and paying attention
to opened blocks and such. Only WebKit exhibits this behavior, but it
doesn’t appear that we’ve gotten any compat bugs from it.

<number-token>, <number-token>, and <dimension-token> have been changed
to include the preceding +/- sign as part of their value (rather than as
a separate <delim-token>
that needs to be manually handled every time the token is mentioned in
other specs). The only consequence of this is that comments can no longer
be inserted between the sign and the number.

Scientific notation is supported for numbers/percentages/dimensions
to match SVG, per WG resolution.

<column-token> has been added, to keep
Selectors parsing in single-token lookahead.

Hexadecimal escape for surrogate code
points now emit a replacement character rather than the surrogate.
This allows implementations to safely use UTF-16 internally.

Parsing changes:

Any list of declaration now also accepts at-rules, like @page, per WG
resolution. This makes a difference in error handling even if no such
at-rules are defined yet: an at-rule, valid or not, ends and lets the
next declaration being at a {} block without a <semicolon-token>.

The handling of some miscellanous "special" tokens (like an unmatched
<}-token>) showing up in
various places in the grammar has been specified with some reasonable
behavior shown by at least one browser. Previously, stylesheets with
those tokens in those places just didn’t match the stylesheet grammar
at all, so their handling was totally undefined. Specifically:

Acknowledgments

Conformance

10.4. Document
conventions

Conformance requirements are expressed with a combination of descriptive
assertions and RFC 2119 terminology. The key words "MUST", "MUST NOT",
"REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED",
"MAY", and "OPTIONAL" in the normative parts of this document are to be
interpreted as described in RFC 2119. However, for readability, these
words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections
explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words "for
example" or are set apart from the normative text with
class="example", like this:

This is an example of an informative example.

Informative notes begin with the word "Note" and are set apart from the
normative text with class="note", like this:

Note, this is an informative note.

10.5. Conformance classes

Conformance to this specification is defined for three conformance
classes:

A style sheet is conformant to this specification if all of its
statements that use syntax defined in this module are valid according to
the generic CSS grammar and the individual grammars of each feature
defined in this module.

A renderer is conformant to this specification if, in addition to
interpreting the style sheet as defined by the appropriate specifications,
it supports all the features defined by this specification by parsing them
correctly and rendering the document accordingly. However, the inability
of a UA to correctly render a document due to limitations of the device
does not make the UA non-conformant. (For example, a UA is not required to
render color on a monochrome monitor.)

An authoring tool is conformant to this specification if it writes style
sheets that are syntactically correct according to the generic CSS grammar
and the individual grammars of each feature in this module, and meet all
other conformance requirements of style sheets as described in this
module.

10.6. Partial
implementations

So that authors can exploit the forward-compatible parsing rules to
assign fallback values, CSS renderers must treat as
invalid (and ignore as
appropriate) any at-rules, properties, property values, keywords, and
other syntactic constructs for which they have no usable level of support.
In particular, user agents must not selectively ignore
unsupported component values and honor supported values in a single
multi-value property declaration: if any value is considered invalid (as
unsupported values must be), CSS requires that the entire declaration be
ignored.

10.7. Experimental
implementations

To avoid clashes with future CSS features, the CSS2.1 specification
reserves a prefixed
syntax for proprietary and experimental extensions to CSS.

Prior to a specification reaching the Candidate Recommendation stage in
the W3C process, all implementations of a CSS feature are considered
experimental. The CSS Working Group recommends that implementations use a
vendor-prefixed syntax for such features, including those in W3C Working
Drafts. This avoids incompatibilities with future changes in the draft.

10.8. Non-experimental
implementations

Once a specification reaches the Candidate Recommendation stage,
non-experimental implementations are possible, and implementors should
release an unprefixed implementation of any CR-level feature they can
demonstrate to be correctly implemented according to spec.

To establish and maintain the interoperability of CSS across
implementations, the CSS Working Group requests that non-experimental CSS
renderers submit an implementation report (and, if necessary, the
testcases used for that implementation report) to the W3C before releasing
an unprefixed implementation of any CSS features. Testcases submitted to
W3C are subject to review and correction by the CSS Working Group.

10.9. CR exit criteria

For this specification to be advanced to Proposed Recommendation, there
must be at least two independent, interoperable implementations of each
feature. Each feature may be implemented by a different set of products,
there is no requirement that all features be implemented by a single
product. For the purposes of this criterion, we define the following
terms:

independent

each implementation must be developed by a different party and cannot
share, reuse, or derive from code used by another qualifying
implementation. Sections of code that have no bearing on the
implementation of this specification are exempt from this requirement.

interoperable

passing the respective test case(s) in the official CSS test suite,
or, if the implementation is not a Web browser, an equivalent test. Every
relevant test in the test suite should have an equivalent test created if
such a user agent (UA) is to be used to claim interoperability. In
addition if such a UA is to be used to claim interoperability, then there
must one or more additional UAs which can also pass those equivalent
tests in the same way for the purpose of interoperability. The equivalent
tests must be made publicly available for the purposes of peer review.

implementation

a user agent which:

implements the specification.

is available to the general public. The implementation may be a
shipping product or other publicly available version (i.e., beta
version, preview release, or "nightly build"). Non-shipping product
releases must have implemented the feature(s) for a period of at least
one month in order to demonstrate stability.

is not experimental (i.e., a version specifically designed to pass
the test suite and is not intended for normal usage going forward).

The specification will remain Candidate Recommendation for at least six
months.