where V is a singlenonterminal symbol, and w is a string of terminals and/or nonterminals (w can be empty). A formal grammar is considered "context free" when its production rules can be applied regardless of the context of a nonterminal. No matter which symbols surround it, the single nonterminal on the left hand side can always be replaced by the right hand side. This is what distinguishes it from a context-sensitive grammar.

Languages generated by context-free grammars are known as context-free languages (CFL). Different context-free grammars can generate the same context-free language. It is important to distinguish properties of the language (intrinsic properties) from properties of a particular grammar (extrinsic properties). The language equality question (do two given context-free grammars generate the same language?) is undecidable.

Since the time of Pāṇini, at least, linguists have described the grammars of languages in terms of their block structure, and described how sentences are recursively built up from smaller phrases, and eventually individual words or word elements. An essential property of these block structures is that logical units never overlap. For example, the sentence:

A context-free grammar provides a simple and mathematically precise mechanism for describing the methods by which phrases in some natural language are built from smaller blocks, capturing the "block structure" of sentences in a natural way. Its simplicity makes the formalism amenable to rigorous mathematical study. Important features of natural language syntax such as agreement and reference are not part of the context-free grammar, but the basic recursive structure of sentences, the way in which clauses nest inside other clauses, and the way in which lists of adjectives and adverbs are swallowed by nouns and verbs, is described exactly.

Block structure was introduced into computer programming languages by the Algol project (1957–1960), which, as a consequence, also featured a context-free grammar to describe the resulting Algol syntax. This became a standard feature of computer languages, and the notation for grammars used in concrete descriptions of computer languages came to be known as Backus-Naur Form, after two members of the Algol language design committee.[2] The "block structure" aspect that context-free grammars capture is so fundamental to grammar that the terms syntax and grammar are often identified with context-free grammar rules, especially in computer science. Formal constraints not captured by the grammar are then considered to be part of the "semantics" of the language.

Context-free grammars are simple enough to allow the construction of efficient parsing algorithms which, for a given string, determine whether and how it can be generated from the grammar. An Earley parser is an example of such an algorithm, while the widely used LR and LL parsers are simpler algorithms that deal only with more restrictive subsets of context-free grammars.

is a finite set; each element is called a non-terminal character or a variable. Each variable represents a different type of phrase or clause in the sentence. Variables are also sometimes called syntactic categories. Each variable defines a sub-language of the language defined by .

is a finite set of terminals, disjoint from , which make up the actual content of the sentence. The set of terminals is the alphabet of the language defined by the grammar .

is a finite relation from to , where the asterisk represents the Kleene star operation. The members of are called the (rewrite) rules or productions of the grammar. (also commonly symbolized by a )

is the start variable (or start symbol), used to represent the whole sentence (or program). It must be an element of .

A production rule in is formalized mathematically as a pair , where is a non-terminal and is a string of variables and/or terminals; rather than using ordered pair notation, production rules are usually written using an arrow operator with as its left hand side and as its right hand side: .

It is allowed for to be the empty string, and in this case it is customary to denote it by ε. The form is called an ε-production.[5]

It is common to list all right-hand sides for the same left-hand side on the same line, using | (the pipe symbol) to separate them. Rules and can hence be written as . In this case, and is called the first and second alternative, respectively.

For any strings we say yields, written as (or in some textbooks), if such that . In this case, if (i.e., ), the relation holds. In other words, and are the reflexive transitive closure (allowing a word to yield itself) and the transitive closure (requiring at least one step) of , respectively.

Every context-free grammar can be effectively transformed into a weakly equivalent one without unreachable symbols,[7] a weakly equivalent one without unproductive symbols,[8] and a weakly equivalent one without cycles.[9] Every context-free grammar not producing ε can be effectively transformed into a weakly equivalent one without ε-productions;[10] altogether, every such grammar can be effectively transformed into a weakly equivalent proper CFG.

The canonical example of a context free grammar is parenthesis matching, which is representative of the general case. There are two terminal symbols "(" and ")" and one nonterminal symbol S. The production rules are

S → SS

S → (S)

S → ()

The first rule allows Ss to multiply; the second rule allows Ss to become enclosed by matching parentheses; and the third rule terminates the recursion.

A second canonical example is two different kinds of matching nested parentheses, described by the productions:

S → SS

S → ()

S → (S)

S → []

S → [S]

with terminal symbols [ ] ( ) and nonterminal S.

The following sequence can be derived in that grammar:

([ [ [ ()() [ ][ ] ] ]([ ]) ])

However, there is no context-free grammar for generating all sequences of two different types of parentheses, each separately balanced disregarding the other, but where the two types need not nest inside one another, for example:

Here is a context-free grammar for syntactically correct infix algebraic expressions in the variables x, y and z:

S → x

S → y

S → z

S → S + S

S → S - S

S → S * S

S → S / S

S → ( S )

This grammar can, for example, generate the string

( x + y ) * x - z * y / ( x + x )

as follows:

S (the start symbol)

→ S - S (by rule 5)

→ S * S - S (by rule 6, applied to the leftmost S)

→ S * S - S / S (by rule 7, applied to the rightmost S)

→ ( S ) * S - S / S (by rule 8, applied to the leftmost S)

→ ( S ) * S - S / ( S ) (by rule 8, applied to the rightmost S)

→ ( S + S ) * S - S / ( S ) (etc.)

→ ( S + S ) * S - S * S / ( S )

→ ( S + S ) * S - S * S / ( S + S )

→ ( x + S ) * S - S * S / ( S + S )

→ ( x + y ) * S - S * S / ( S + S )

→ ( x + y ) * x - S * y / ( S + S )

→ ( x + y ) * x - S * y / ( x + S )

→ ( x + y ) * x - z * y / ( x + S )

→ ( x + y ) * x - z * y / ( x + x )

Note that many choices were made underway as to which rewrite was going to be performed next. These choices look quite arbitrary. As a matter of fact, they are, in the sense that the string finally generated is always the same. For example, the second and third rewrites

→ S * S - S (by rule 6, applied to the leftmost S)

→ S * S - S / S (by rule 7, applied to the rightmost S)

could be done in the opposite order:

→ S - S / S (by rule 7, applied to the rightmost S)

→ S * S - S / S (by rule 6, applied to the leftmost S)

Also, many choices were made on which rule to apply to each selected S. Changing the choices made and not only the order they were made in usually affects which terminal string comes out at the end.

Let's look at this in more detail. Consider the parse tree of this derivation:

Starting at the top, step by step, an S in the tree is expanded, until no more unexpanded Ses (non-terminals) remain. Picking a different order of expansion will produce a different derivation, but the same parse tree. The parse tree will only change if we pick a different rule to apply at some position in the tree.

But can a different parse tree still produce the same terminal string, which is ( x + y ) * x - z * y / ( x + x ) in this case? Yes, for this particular grammar, this is possible. Grammars with this property are called ambiguous.

For example, x + y * z can be produced with these two different parse trees:

However, the language described by this grammar is not inherently ambiguous: an alternative, unambiguous grammar can be given for the language, for example:

T → x

T → y

T → z

S → S + T

S → S - T

S → S * T

S → S / T

T → ( S )

S → T

(once again picking S as the start symbol). This alternative grammar will produce x + y * z with a parse tree similar to the left one above, i.e. implicitly assuming the association (x + y) * z, which is not according to standard operator precedence. More elaborate, unambiguous and context-free grammars can be constructed that produce parse trees that obey all desired operator precedence and associativity rules.

A context-free grammar for the language consisting of all strings over {a,b} containing an unequal number of a's and b's:

S → U | V

U → TaU | TaT | UaT

V → TbV | TbT | VbT

T → aTbT | bTaT | ε

Here, the nonterminal T can generate all strings with the same number of a's as b's, the nonterminal U generates all strings with more a's than b's and the nonterminal V generates all strings with fewer a's than b's. Omitting the third alternative in the rule for U and V doesn't restrict the grammar's language.

A derivation of a string for a grammar is a sequence of grammar rule applications that transforms the start symbol into the string. A derivation proves that the string belongs to the grammar's language.

The distinction between leftmost derivation and rightmost derivation is important because in most parsers the transformation of the input is defined by giving a piece of code for every grammar rule that is executed whenever the rule is applied. Therefore it is important to know whether the parser determines a leftmost or a rightmost derivation because this determines the order in which the pieces of code will be executed. See for an example LL parsers and LR parsers.

A derivation also imposes in some sense a hierarchical structure on the string that is derived. For example, if the string "1 + 1 + a" is derived according to the leftmost derivation:

S → S + S (1)

→ 1 + S (2)

→ 1 + S + S (1)

→ 1 + 1 + S (2)

→ 1 + 1 + a (3)

the structure of the string would be:

{ { 1 }S + { { 1 }S + { a }S }S }S

where { ... }S indicates a substring recognized as belonging to S. This hierarchy can also be seen as a tree:

S
/|\
/ | \
/ | \
S '+' S
| /|\
| / | \
'1' S '+' S
| |
'1' 'a'

This tree is called a parse tree or "concrete syntax tree" of the string, by contrast with the abstract syntax tree. In this case the presented leftmost and the rightmost derivations define the same parse tree; however, there is another (rightmost) derivation of the same string

S → S + S (1)

→ S + a (3)

→ S + S + a (1)

→ S + 1 + a (2)

→ 1 + 1 + a (2)

and this defines the following parse tree:

S
/|\
/ | \
/ | \
S '+' S
/|\ |
/ | \ |
S '+' S 'a'
| |
'1' '1'

If, for certain strings in the language of the grammar, there is more than one parsing tree, then the grammar is said to be an ambiguous grammar. Such grammars are usually hard to parse because the parser cannot always decide which grammar rule it has to apply. Usually, ambiguity is a feature of the grammar, not the language, and an unambiguous grammar can be found that generates the same context-free language. However, there are certain languages that can only be generated by ambiguous grammars; such languages are called inherently ambiguous languages.

Every context-free grammar that does not generate the empty string can be transformed into one in which there is no ε-production (that is, a rule that has the empty string as a product). If a grammar does generate the empty string, it will be necessary to include the rule , but there need be no other ε-rule. Every context-free grammar with no ε-production has an equivalent grammar in Chomsky normal form or Greibach normal form. "Equivalent" here means that the two grammars generate the same language.

The especially simple form of production rules in Chomsky Normal Form grammars has both theoretical and practical implications. For instance, given a context-free grammar, one can use the Chomsky Normal Form to construct a polynomial-time algorithm that decides whether a given string is in the language represented by that grammar or not (the CYK algorithm).

Some questions that are undecidable for wider classes of grammars become decidable for context-free grammars; e.g. the emptiness problem (whether the grammar generates any terminal strings at all), is undecidable for context-sensitive grammars, but decidable for context-free grammars.

However, many problems are undecidable even for context-free grammars. Examples are:

Given a CFG, does it generate the language of all strings over the alphabet of terminal symbols used in its rules?[17][18]

A reduction can be demonstrated to this problem from the well-known undecidable problem of determining whether a Turing machine accepts a particular input (the halting problem). The reduction uses the concept of a computation history, a string describing an entire computation of a Turing machine. A CFG can be constructed that generates all strings that are not accepting computation histories for a particular Turing machine on a particular input, and thus it will accept all strings only if the machine doesn't accept that input.

An obvious way to extend the context-free grammar formalism is to allow nonterminals to have arguments, the values of which are passed along within the rules. This allows natural language features such as agreement and reference, and programming language analogs such as the correct use and definition of identifiers, to be expressed in a natural way. E.g. we can now easily express that in English sentences, the subject and verb must agree in number. In computer science, examples of this approach include affix grammars, attribute grammars, indexed grammars, and Van Wijngaarden two-level grammars. Similar extensions exist in linguistics.

An extended context-free grammar (or regular right part grammar) is one in which the right-hand side of the production rules is allowed to be a regular expression over the grammar's terminals and nonterminals. Extended context-free grammars describe exactly the context-free languages.[20]

Another extension is to allow additional terminal symbols to appear at the left hand side of rules, constraining their application. This produces the formalism of context-sensitive grammars.

LR parsing extends LL parsing to support a larger range of grammars; in turn, generalized LR parsing extends LR parsing to support arbitrary context-free grammars. On LL grammars and LR grammars, it essentially performs LL parsing and LR parsing, respectively, while on nondeterministic grammars, it is as efficient as can be expected. Although GLR parsing was developed in the 1980s, many new language definitions and parser generators continue to be based on LL, LALR or LR parsing up to the present day.

Such rules are another standard device in traditional linguistics; e.g. passivization in English. Much of generative grammar has been devoted to finding ways of refining the descriptive mechanisms of phrase-structure grammar and transformation rules such that exactly the kinds of things can be expressed that natural language actually allows. Allowing arbitrary transformations doesn't meet that goal: they are much too powerful, being Turing complete unless significant restrictions are added (e.g. no transformations that introduce and then rewrite symbols in a context-free fashion).

Chomsky's general position regarding the non-context-freeness of natural language has held up since then,[21] although his specific examples regarding the inadequacy of context-free grammars in terms of their weak generative capacity were later disproved.[22]Gerald Gazdar and Geoffrey Pullum have argued that despite a few non-context-free constructions in natural language (such as cross-serial dependencies in Swiss German[21] and reduplication in Bambara[23]), the vast majority of forms in natural language are indeed context-free.[22]

Each category of languages, except those marked by a *, is a proper subset of the category directly above it.Any language in each category is generated by a grammar and by an automaton in the category in the same line.