1.
Cross-serial dependencies
–
In linguistics, cross-serial dependencies occur when the lines representing the dependency relations between two series of words cross over each other. They are of particular interest to linguists who wish to determine the structure of natural language. By this fact, Dutch and Swiss-German have been proved to be non-context-free, as Swiss-German allows verbs and their arguments to be ordered cross-serially, we have the following example, taken from Shieber, That is, we help Hans paint the house. Notice that the noun phrases em Hans and s huus. Notice also that the dative verb hälfed and the accusative verb aastriiche take the dative em Hans and accusative s huus as their arguments, in Swiss-German sentences, the number of verbs of a grammatical case must match the number of objects of that case. Additionally, a sentence containing a number of such objects is admissible. Hence, the formal language is grammatical, L = De Jan s a ¨ it das mer m n s huus h a ¨ nd wele m n aastriiche. It can be seen that L is of the x a m b n y c m d n z. By taking another image to remove the x, y and z, all spoken languages which contain cross-serial dependencies also contain a language of a form similar to L ′. For example cross-serial dependencies can be expressed in linear context-free rewriting systems, one can write a LCFRS grammar for for example

2.
Abstract rewriting system
–
In mathematical logic and theoretical computer science, an abstract rewriting system is a formalism that captures the quintessential notion and properties of rewriting systems. In its simplest form, an ARS is simply a set together with a relation, traditionally denoted with →. Despite its simplicity, an ARS is sufficient to describe important properties of rewriting systems like normal forms, termination, historically, there have been several formalizations of rewriting in an abstract setting, each with its idiosyncrasies. This is due in part to the fact that some notions are equivalent, the formalization that is most commonly encountered in monographs and textbooks, and which is generally followed here, is due to Gérard Huet. Abstract reduction system, is the most general notion about specifying a set of objects, more recently authors use abstract rewriting system as well. An ARS is a set A, whose elements are called objects, together with a binary relation on A, traditionally denoted by →. This terminology using reduction is a misleading, because the relation is not necessarily reducing some measure of the objects. Consequently, some authors define the reduction relation → as the union of some relations, for instance if →1 ∪ →2 =→. The focus of the study, and the terminology are different however, in a state transition system one is interested in interpreting the labels as actions, whereas in an ARS the focus is on how objects may be transformed into others. Suppose the set of objects is T = and the relation is given by the rules a → b, b → a, a → c. Observe that these rules can be applied to both a and b to get c, note also, that c is, in a sense, a simplest object in the system, since nothing can be applied to c to transform it any further. Such a property is clearly an important one, example 1 leads us to define some important notions in the general setting of an ARS. First we need some basic notions and notations. → ∗ is the closure of → ∪ =. It is also known as the reflexive transitive closure of →. An object x in A is called if there exist some other y in A and x → y. An object y is called a form of x if x → ∗ y. If x has a normal form, then this is usually denoted with x ↓. In example 1 above, c is a form, and c = a ↓= b ↓

3.
Ambiguous grammar
–
Many languages admit both ambiguous and unambiguous grammars, while some languages admit only ambiguous grammars. Any non-empty language admits an ambiguous grammar by taking an unambiguous grammar, a language that only admits ambiguous grammars is called an inherently ambiguous language, and there are inherently ambiguous context-free languages. Deterministic context-free grammars are always unambiguous, and are an important subclass of unambiguous grammars, there are non-deterministic unambiguous grammars, for computer programming languages, the reference grammar is often ambiguous, due to issues such as the dangling else problem. If present, these ambiguities are resolved by adding precedence rules or other context-sensitive parsing rules. Thus the empty string has leftmost derivations of length 1,2,3, in the same way, any grammar for a non-empty language can be made ambiguous by adding duplicates. In many languages, the else in an If–then statement is optional, some ambiguous phrase structures can appear. This is resolved in ways in different languages. Sometimes the grammar is modified so that it is unambiguous, such as by requiring an endif statement or making else mandatory. In other cases the grammar is ambiguous, but the ambiguity is resolved by making the overall phrase grammar context-sensitive. In this latter case the grammar is unambiguous, but the grammar is ambiguous. The simple grammar S → A + A A →0 |1 is a grammar for the language. The decision problem of whether an arbitrary grammar is ambiguous is undecidable because it can be shown that it is equivalent to the Post correspondence problem, at least, there are tools implementing some semi-decision procedure for detecting ambiguity of context-free grammars. The efficiency of context-free grammar parsing is determined by the automaton that accepts it, deterministic context-free grammars are accepted by deterministic pushdown automata and can be parsed in linear time, for example by the LR parser. This is a subset of the context-free grammars which are accepted by the automaton and can be parsed in polynomial time. Unambiguous context-free grammars can be nondeterministic, for example, the language of even-length palindromes on the alphabet of 0 and 1 has the unambiguous context-free grammar S → 0S0 | 1S1 | ε. Nevertheless, removing grammar ambiguity may produce a deterministic context-free grammar, compiler generators such as YACC include features for resolving some kinds of ambiguity, such as by using the precedence and associativity constraints. The existence of inherently ambiguous languages was proven with Parikhs theorem in 1961 by Rohit Parikh in an MIT research report, while some context-free languages have both ambiguous and unambiguous grammars, there exist context-free languages for which no unambiguous context-free grammar can exist. An example of an ambiguous language is the union of with

4.
Definite clause grammar
–
A definite clause grammar is a way of expressing grammar, either for natural or formal languages, in a logic programming language such as Prolog. It is closely related to the concept of attribute grammars / affix grammars from which Prolog was originally developed, DCGs are usually associated with Prolog, but similar languages such as Mercury also include DCGs. They are called definite clause grammars because they represent a grammar as a set of clauses in first-order logic. The term DCG refers to the type of expression in Prolog and other similar languages. However, all of the capabilities or properties of DCGs will be the same for any grammar that is represented with definite clauses in essentially the same way as in Prolog. This has the advantage of making it so that recognition and parsing of expressions in a language becomes a matter of proving statements. The history of DCGs is closely tied to the history of Prolog, according to Robert Kowalski, an early developer of Prolog, the first Prolog system was developed in 1972 by Alain Colmerauer and Phillipe Roussel. The first program written in the language was a large natural-language processing system, fernando Pereira and David Warren at the University of Edinburgh were also involved in the early development of Prolog. Colmerauer had previously worked on a language processing system called Q-systems that was used to translate between English and French, in 1978, Colmerauer wrote a paper about a way of representing grammars called metamorphosis grammars which were part of the early version of Prolog called Marseille Prolog. In this paper, he gave a description of metamorphosis grammars. Fernando Pereira and David Warren, two other architects of Prolog, coined the term definite clause grammar and created the notation for DCGs that is used in Prolog today. They gave credit for the idea to Colmeraur and Kowalski, and they introduced the idea in an article called Definite Clause Grammars for Language Analysis, where they describe DCGs as a formalism. In which grammars are expressed clauses of first-order predicate logic that constitute effective programs of the programming language Prolog, Pereira, Warren, and other pioneers of Prolog later wrote about several other aspects of DCGs. Pereira and Warren wrote an article called Parsing as Deduction, describing such as how the Earley Deduction proof procedure is used for parsing. A basic example of DCGs helps to illustrate what they are and this generates sentences such as the cat eats the bat, a bat eats the cat. One can generate all of the expressions in the language generated by this grammar at a Prolog interpreter by typing sentence. Similarly, one can test whether a sentence is valid in the language by typing something like sentence, DCG notation is just syntactic sugar for normal definite clauses in Prolog. Using Prologs notation for lists, a singleton list prefix P = can be seen as the difference between and X, and thus represented with the pair, for instance, saying that P is the difference between A and B is the same as saying that append holds

5.
Antimatroid
–
Antimatroids can be viewed as a special case of greedoids and of semimodular lattices, and as a generalization of partial orders and of distributive lattices. Antimatroids are equivalent, by complementation, to convex geometries, an abstraction of convex sets in geometry. An antimatroid can be defined as a finite family F of sets, called feasible sets and that is, F is closed under unions. If S is a nonempty set, then there exists some x in S such that S \ is also feasible. That is, F is a set system. Antimatroids also have an equivalent definition as a language, that is. A language L defining an antimatroid must satisfy the following properties, each word of L contains at most one copy of any symbol. Every prefix of a string in L is also in L, If s and t are strings in L, and s contains at least one symbol that is not in t, then there is a symbol x in s such that tx is another string in L. If L is an antimatroid defined as a language, then the sets of symbols in strings of L form an accessible union-closed set system. Thus, these two lead to mathematically equivalent classes of objects. A chain antimatroid has as its formal language the prefixes of a single word, for instance the chain antimatroid defined by the word abcd has as its formal language the strings and as its feasible sets the sets Ø, and. A poset antimatroid has as its feasible sets the sets of a finite partially ordered set. By Birkhoffs representation theorem for distributive lattices, the sets in a poset antimatroid form a distributive lattice. Thus, antimatroids can be seen as generalizations of distributive lattices, a chain antimatroid is the special case of a poset antimatroid for a total order. Equivalently, p must be a vertex of the hull of it. The partial shelling sequences of a point set form an antimatroid, the feasible sets of the shelling antimatroid are the intersections of U with the complement of a convex set. Every antimatroid is isomorphic to a shelling antimatroid of points in a sufficiently high-dimensional space. A perfect elimination ordering of a graph is an ordering of its vertices such that, for each vertex v

6.
Abstract syntax tree
–
In computer science, an abstract syntax tree, or just syntax tree, is a tree representation of the abstract syntactic structure of source code written in a programming language. Each node of the tree denotes a construct occurring in the source code, the syntax is abstract in not representing every detail appearing in the real syntax. For instance, grouping parentheses are implicit in the structure. This distinguishes abstract syntax trees from concrete syntax trees, traditionally designated parse trees, once built, additional information is added to the AST by means of subsequent processing, e. g. contextual analysis. Abstract syntax trees are used in program analysis and program transformation systems. Abstract syntax trees are data structures used in compilers, due to their property of representing the structure of program code. An AST is usually the result of the analysis phase of a compiler. It often serves as a representation of the program through several stages that the compiler requires. Being the product of the analysis phase of a compiler. Compared to the code, an AST does not include certain elements, such as inessential punctuation. A more important difference is that the AST can be edited and enhanced with information such as properties, such editing and annotation is impossible with the source code of a program, since it would imply changing it. This information may be used to notify the user of the location of an error in the code, aSTs are needed because of the inherent nature of programming languages and their documentation. Languages are often ambiguous by nature, in order to avoid this ambiguity, programming languages are often specified as a context-free grammar. However, there are aspects of programming languages that a CFG cant express. These are details that require a context to determine their validity, for example, if a language allows new types to be declared, a CFG cannot predict the names of such types nor the way in which they should be used. Even if a language has a set of types, enforcing proper usage usually requires some context. Another example is duck typing, where the type of an element can change depending on context, operator overloading is yet another case where correct usage and final function are determined based on the context. Java provides an excellent example, where the + operator is both numerical addition and concatenation of strings, although there are other data structures involved in the inner workings of a compiler, the AST performs a unique function

7.
Chomsky hierarchy
–
In the formal languages of computer science and linguistics, the Chomsky hierarchy is a containment hierarchy of classes of formal grammars. This hierarchy of grammars was described by Noam Chomsky in 1956 and it is also named after Marcel-Paul Schützenberger, who played a crucial role in the development of the theory of formal languages. A rule may be applied by replacing an occurrence of the symbols on its side with those that appear on its right-hand side. A sequence of rule applications is called a derivation, such a grammar defines the formal language, all words consisting solely of terminal symbols which can be reached by a derivation from the start symbol. Nonterminals are often represented by letters, terminals by lowercase letters. For example, the grammar with terminals, nonterminals, production rules S → AB S → ε A → aS B → b and start symbol S, other sequences that can be derived from this grammar are, ideas hate great linguists, and ideas generate. While these sentences are nonsensical, they are syntactically correct, a syntactically incorrect sentence cannot be derived from this grammar. The following table summarizes each of Chomskys four types of grammars, the class of language it generates, the type of automaton that recognizes it, and the form its rules must have. Note that the set of corresponding to recursive languages is not a member of this hierarchy. Every regular language is context-free, every context-free language is context-sensitive, every context-sensitive language is recursive, Type-0 grammars include all formal grammars. They generate exactly all languages that can be recognized by a Turing machine and these languages are also known as the recursively enumerable or Turing-recognizable languages. Note that this is different from the languages, which can be decided by an always-halting Turing machine. Type-1 grammars generate the context-sensitive languages and these grammars have rules of the form α A β → α γ β with A a nonterminal and α, β and γ strings of terminals and/or nonterminals. The strings α and β may be empty, but γ must be nonempty, the rule S → ϵ is allowed if S does not appear on the right side of any rule. The languages described by these grammars are exactly all languages that can be recognized by a linear bounded automaton Type-2 grammars generate the context-free languages and these are defined by rules of the form A → γ with A a nonterminal and γ a string of terminals and/or nonterminals. These languages are exactly all languages that can be recognized by a non-deterministic pushdown automaton, often a subset of grammars are used to make parsing easier, such as by an LL parser. Type-3 grammars generate the regular languages, such a grammar restricts its rules to a single nonterminal on the left-hand side and a right-hand side consisting of a single terminal, possibly followed by a single nonterminal. Alternatively, the side of the grammar can consist of a single terminal

8.
Closest string
–
In theoretical computer science, the closest string is an NP-hard computational problem, which tries to find the geometrical center of a set of input strings. To understand the word center, it is necessary to define a distance between two strings, usually, this problem is studied with the Hamming distance in mind. More formally, given n length-m strings s1, s2, sn, the closest string problem seeks for a new length-m string s such that d ≤ k for all i, where d denotes the Hamming distance, and where k is as small as possible. A decision problem version of the closest string problem, which is NP-complete, instead takes k as another input, the closest string problem can be seen as an instance of the 1-center problem in which the distances between elements are measured using Hamming distance. In bioinformatics, the closest string problem is an intensively studied facet of the problem of finding signals in DNA, instances of closest string may contain information that is not essential to the problem. In some sense, the input of closest string contains information. When all input strings that share the same length are written on top of each other, certain row types have essentially the same implications to the solution. An input instance can be normalized by replacing, in column, the character that occurs the most often with a, the character that occurs the second most often with b. Given a solution to the instance, the original instance can be found by remapping the characters of the solution to its original version in every column. The order of the columns does not contribute to the hardness of the problem and that means, if we permute all input strings according to a certain permutation π and obtain a solution string s to that modified instance, then π−1 will be a solution to the original instance. Given an instance with three input strings uvwx, xuwv, and xvwu and this could be written as a matrix like this, The first column has the values. As x is the character that appears the most often, we replace it by a, and we replace u, the second column has the values. As for the first column, v is replaced by a and u is replaced by b, doing the same with all columns gives the normalized instance Normalizing the input reduces the alphabet size to at most the number of input strings. This can be useful for algorithms whose running times depend on the alphabet size, li et al. evolved a polynomial-time approximation scheme which is practically unusable because of the large hidden constants. Closest String can be solved in O, where k is the number of strings, L is the length of all strings. Closest string is a case of the more general closest substring problem. While closest string turns out to be fixed-parameter tractable in a number of ways, closest substring is W-hard with regard to these parameters

9.
Controlled grammar
–
Controlled grammars are a class of grammars that extend, usually, the context-free grammars with additional controls on the derivations of a sentence in the language. Because indexed grammars are so well established in the field, this article will address only the three kinds of controlled grammars. Grammars with prescribed sequences are grammars in which the sequence of rule application is constrained in some way, there are four different versions of prescribed sequence grammars, language controlled grammars, matrix grammars, vector grammars, and programmed grammars. Productions over such a grammar are sequences of rules in P that and that is, one can view the set of imaginable derivations in G as the set, and the language of G as being the set of terminal strings L =. Control grammars take seriously this definition of the language generated by a grammar, the set R, due to its infinitude, is almost always described via some more convenient mechanism, such as a grammar, or a set of matrices or vectors. The different variations of prescribed sequence grammars thus differ by how the sequence of derivations is defined on top of the context-free base, because matrix grammars and vector grammars are essentially special cases of language controlled grammars, examples of the former two will not be provided below. They also often have a set in the grammar tuple, making it G =. This version of language controlled grammars, ones with what is called appearance checking, is the one henceforth and this requirement that the non-vacuously applicable rules must apply is the appearance checking aspect of such a grammar. The language for this kind of grammar is simply set of terminal strings L =. A simple modification to this grammar, changing is control sequence set R into the set ∗ l ∗, to see how, lets consider the general case of some string with n instances of S in it, i. e. S n. If we chose some arbitrary production sequence f u g h v k, lastly, then n > u, we rewrite u instances of S, leaving at least one instance of S to be rewritten by the subsequent application of g, rewriting S as X. Given that no rule of this grammar ever rewrites X, such a derivation is destined to never produce a terminal string, thus only derivations with n = u will ever successfully rewrite the string S n. Similar reasoning holds of the number of As and v, in general, then, we can say that the only valid derivations have the structure S n ⇒ f. ⇒ h S2 n ⇒ k S2 n will produce terminal strings of the grammar, finally, the Ss are rewritten as as. In this way, the number of Ss doubles each for each instantiation of f 8 g h ∗ k that appears in a terminal-deriving sequence. For convenience, such a grammar is not represented with a grammar over P, but rather with just a set of the matrices in place of both the language and the production rules. Thus, a grammar is the 5-tuple G =, where N, T, S, and F are defined essentially as previously done. P i, n i where each p i, j is a context-free production rule, the derives relation in a matrix grammar is thus defined simply as, Given some strings x and y, both in ∗, and some matrix m = p 1 p 2

In linguistics, cross-serial dependencies (also called crossing dependencies by some authors) occur when the lines …

A schematic showing cross-serial dependencies. Notice that the w's and v's, which represent words, each form respective series. Notice also that the lines representing the dependency relations mutually overlap.

A Swiss-German sentence containing cross-serial dependencies (shown as lines between the verbs and their objects). The English translation with its dependencies, which do not cross, is shown for comparison.