where each $latex {\Sigma_i \in \{<,=,>\}}&fg=000000$ and each $latex {p_i \in {\mathbb R}[x][y_1, \dots, y_m]}&fg=000000$ is a polynomial over the reals. Why? Well, if one quantifier can be eliminated, then we can inductively eliminate all of them. And we can always convert the single quantifier to an $latex {\exists}&fg=000000$ by negation if necessary and then distribute the $latex {\exists}&fg=000000$ over disjunctions to arrive at a boolean combination of formulas of the type above.

I will try to describe now how we could “rediscover” a quantifier elimination procedure due to Albert Muchnik. My description is based on a very nice exposition by Michaux and Ozturk but with some shortcuts and a more staged revealing of the full details. The specific algorithm for quantifier elimination we describe will be much less efficient than the current state-of-the-art described in the previous post. The focus here is on clarity of presentation instead of performance.

We will complete the proof in three stages.

1. $latex {m = 0}&fg=000000$ and linear polynomials

First, let’s restrict ourselves to the simplest case: $latex {m = 0}&fg=000000$ and all the polynomials $latex {p_i}&fg=000000$ are of degree one. In fact, let’s even start with a specific example:

We will partition the real line into a constant number of intervals so that none of the functions changes sign inside any of the intervals. If we do this, then to decide $latex {\phi}&fg=000000$, we merely have to examine the finitely many intervals to see if on any of them, the three functions ($latex {x+1, -2x+3}&fg=000000$, and $latex {x}&fg=000000$) are all positive.

Each of the linear functions has one root and is always negative on one side of the root and always positive on the other. We will keep ourselves from using the exact value of the root, even though finding it here is trivial, because this will help us in the general case. Whether the function is negative to the left or right side of the root depends on the sign of the leading coefficient. So, the first function $latex {x+1}&fg=000000$ has a root at $latex {\gamma_1}&fg=000000$, is always negative to the left of $latex {\gamma_1}&fg=000000$ and always positive to the left.

The second function, $latex {-2x+3}&fg=000000$, has another root $latex {\gamma_2}&fg=000000$, is always negative to its right, and always positive to its left. How do we know if $latex {\gamma_2}&fg=000000$ is to the left or right of $latex {\gamma_1}&fg=000000$? Well, $latex {-2x+3 = -2(x+1) + 5}&fg=000000$, and since $latex {\gamma_1}&fg=000000$ is a root of $latex {x+1}&fg=000000$, it follows that $latex {-2\gamma_1 + 3 = 5 > 0}&fg=000000$. So, $latex {\gamma_1}&fg=000000$ is to the left of $latex {\gamma_2}&fg=000000$. This results in the following sign configuration diagram (for each pair of signs, the first indicates the sign of $latex {x+1}&fg=000000$, the second the sign of $latex {-2x+3}&fg=000000$):

Finally, the third function $latex {x}&fg=000000$ has a root $latex {\gamma_3}&fg=000000$, is always negative to the left, and always positive to the right. Also, $latex {x = (x+1)-1}&fg=000000$ and $latex {x=-\frac{1}{2}(-2x+3) + 1.5}&fg=000000$, and so $latex {x}&fg=000000$ is negative at $latex {\gamma_1}&fg=000000$ and positive at $latex {\gamma_2}&fg=000000$. This means the following sign configuration diagram:

Since there exists an interval $latex {(\gamma_3,\gamma_2)}&fg=000000$ on which the sign configuration is the desired $latex {(+,+,+)}&fg=000000$, it follows that $latex {\phi}&fg=000000$ evaluates to true. The algorithm we described via this example will obviously work for any $latex {\phi}&fg=000000$ with $latex {m=0}&fg=000000$ and linear inequalities.

2. $latex {m=0}&fg=000000$ and general polynomials

Now, let’s see what changes if some of the polynomials are of higher degree with $latex {m}&fg=000000$ still equal to $latex {0}&fg=000000$:

where each $latex {\Sigma_i \in \{<,>,=\}}&fg=000000$. Again, we’ll try to find a sign configuration diagram and check at the end if any of the intervals is described by the wanted sign configuration.

If we try to implement this approach, a fundamental issue crops up. Suppose $latex {p_i}&fg=000000$ is a degree $latex {d}&fg=000000$ polynomial and we know that $latex {p_i(\gamma_1) > 0}&fg=000000$ and $latex {p_i(\gamma_2)>0}&fg=000000$. In the case $latex {p_i}&fg=000000$ was linear, we could safely conclude that $latex {p_i(x)}&fg=000000$ is positive everywhere in the interval $latex {(\gamma_1,\gamma_2)}&fg=000000$. Not so fast here! It could very well be for a higher degree polynomial $latex {p_i}&fg=000000$ that it becomes negative somewhere in between $latex {\gamma_1}&fg=000000$ and $latex {\gamma_2}&fg=000000$. But note that this would mean $latex {p_i’ = \frac{\partial}{\partial x} p_i}&fg=000000$ must change sign somewhere at $latex {\gamma_3 \in (\gamma_1, \gamma_2)}&fg=000000$ by Rolle’s theorem. New idea then! Let’s also partition according to the sign changes of $latex {p_i’}&fg=000000$, a degree $latex {(d-1)}&fg=000000$ polynomial. Suppose we have done so successfully by an inductive argument on the degree, and $latex {\gamma_3}&fg=000000$ is the only sign change occurring in $latex {(\gamma_1, \gamma_2)}&fg=000000$ for $latex {p_i’}&fg=000000$.

We now want to also know the sign of $latex {p_i(\gamma_3)}&fg=000000$. Imitating what we did for $latex {d=1}&fg=000000$, note that if $latex {q_i = (p_i \text{ mod } p_i’)}&fg=000000$, then $latex {p_i(\gamma_3) = q_i(\gamma_3)}&fg=000000$ since $latex {\gamma_3}&fg=000000$ is a root of $latex {p_i’}&fg=000000$. The polynomial $latex {q_i}&fg=000000$ is degree less than $latex {d-1}&fg=000000$, so let’s assume by induction that we know the sign pattern of $latex {q_i}&fg=000000$ as well. Thus, we have at hand the sign of $latex {p_i(\gamma_3)}&fg=000000$.

There are now two situations that can happen. First is if $latex {p_i(\gamma_3) > 0}&fg=000000$ or $latex {p_i(\gamma_3) = 0}&fg=000000$.

This is because $latex {p_i}&fg=000000$ reaches the minimum in $latex {(\gamma_1, \gamma_2)}&fg=000000$ at $latex {\gamma_3}&fg=000000$, and since $latex {p_i(\gamma_3) \geq 0}&fg=000000$, $latex {p_i}&fg=000000$ is not negative anywhere in $latex {(\gamma_1, \gamma_2)}&fg=000000$.

The other situation is the case $latex {p_i(\gamma_3)<0}&fg=000000$. By the intermediate value theorem, $latex {p_i}&fg=000000$ has a root $latex {\gamma_4}&fg=000000$ in $latex {(\gamma_1,\gamma_3)}&fg=000000$ and a root $latex {\gamma_5}&fg=000000$ in $latex {(\gamma_3, \gamma_2)}&fg=000000$.

Note again that because $latex {\gamma_3}&fg=000000$ is the only root of $latex {p_i’}&fg=000000$ in the interval $latex {(\gamma_1, \gamma_2)}&fg=000000$, there cannot be more than one sign change of $latex {p_i}&fg=000000$ between $latex {\gamma_1}&fg=000000$ and $latex {\gamma_3}&fg=000000$ and between $latex {\gamma_3}&fg=000000$ and $latex {\gamma_2}&fg=000000$.

So, from these considerations, it’s clear that knowing the signs of the derivatives of the functions $latex {p_i}&fg=000000$ and of the remainders with respect to these derivatives is very helpful. But of course, obtaining these signs might recursively entail working with the second derivatives and so forth. Thus, let’s define the following hierarchy of collections of polynomials:

The degree of the new polynomials added decreases at each stage. So, there is some $latex {k}&fg=000000$ such that $latex {\mathcal{P}_k}&fg=000000$ doesn’t grow anymore. Let $latex {\mathcal{P}_k = \{t_1, t_2, \dots, t_N\}}&fg=000000$ with $latex {\deg(t_1) \leq \deg(t_2) \leq \cdots \leq \deg(t_N)}&fg=000000$. Note that $latex {N}&fg=000000$ is bounded as a function of $latex {n}&fg=000000$ (though the bound is admittedly pretty poor).

Now, we construct the sign configuration diagram for the polynomials $latex {\{t_1, t_2, \dots, t_N\}}&fg=000000$, starting from that of $latex {t_1}&fg=000000$ (a constant), then adding $latex {t_2}&fg=000000$, and so on. When adding $latex {t_i}&fg=000000$, the real line has already been partitioned according to the roots of $latex {t_1, \dots, t_{i-1}}&fg=000000$. As described earlier, $latex {\text{sign}(t_i)}&fg=000000$ at a root $latex {\gamma}&fg=000000$ of $latex {t_j}&fg=000000$ is the sign of $latex {(t_i \text{ mod } t_j)}&fg=000000$ at $latex {\gamma}&fg=000000$, which has already been computed inductively. Then, we can proceed according to the earlier discussion by partitioning into two any interval where there’s a change in the sign of $latex {t_i}&fg=000000$. The only other technicality is that the sign of $latex {t_i}&fg=000000$ at $latex {-\infty}&fg=000000$ and its sign at $latex {+\infty}&fg=000000$ are not determined by the other polynomials. But these are easy to find given the parity of the degree of $latex {t_i}&fg=000000$ and the sign of its leading coefficient.

3. $latex {m>0}&fg=000000$ and general polynomials

Finally, consider the most general case, $latex {m>0}&fg=000000$ and arbitrary bounded degree polynomials. Let’s interpret $latex {{\mathbb R}[x,y_1,y_2,\dots, y_m]}&fg=000000$ as polynomials in $latex {x}&fg=000000$ with coefficients that are polynomials in $latex {y_1, \dots, y_m}&fg=000000$, and try to push through what we did above. We can start by extending the definitions of the hierarchy of collections of polynomials. Suppose $latex {p(x,\mathbf{y}) = p_D(\mathbf{y}) x^D + p_{D-1}(\mathbf{y}) x^{D-1} + \cdots + p_0(\mathbf{y})}&fg=000000$ and $latex {q(x,\mathbf{y}) = q_E(\mathbf{y}) x^E + q_{E-1}(\mathbf{y}) x^{E-1} + \cdots + q_0(\mathbf{y})}&fg=000000$ with $latex {d \leq D}&fg=000000$ the largest integer such that $latex {p_d(\mathbf{y}) \neq 0}&fg=000000$, $latex {e \leq E}&fg=000000$ the largest integer such that $latex {q_e(\mathbf{y}) \neq 0}&fg=000000$, and $latex {d \geq e}&fg=000000$. One problem is that $latex {(p \text{ mod } q)}&fg=000000$, as a polynomial in $latex {x}&fg=000000$, may not have polynomials in $latex {y}&fg=000000$ as its coefficients. To skirt this issue, let:

Then, whenever $latex {q(x,\mathbf{y}) = 0}&fg=000000$, the sign of $latex {p(x,\mathbf{y})}&fg=000000$ equals the sign of $latex {(p\text{ zmod } q)}&fg=000000$ at $latex {(x,\mathbf{y})}&fg=000000$ divided by the sign of $latex {q_e(\mathbf{y})}&fg=000000$. Moreover, $latex {(p \text{ zmod } q)}&fg=000000$ is a degree $latex {(d-1)}&fg=000000$ polynomial. So, we will want to replace mod by zmod when constructing the collections $latex {\mathcal{P}_i}&fg=000000$. All this is fine except that we have no idea what $latex {d}&fg=000000$ and $latex {e}&fg=000000$ are and what specifically the sign of $latex {q_e(\mathbf{y})}&fg=000000$ is. Recall that $latex {\mathbf{y}}&fg=000000$ is an unknown parameter! So, what to do? Well, the most naive thing: for each polynomial $latex {p}&fg=000000$, let’s just guess whether each coefficient of $latex {p}&fg=000000$ is positive, negative, or zero.

Now, we can define the hierarchy of collections of polynomials $latex {\mathcal{P}_0, \mathcal{P}_1, \dots}&fg=000000$ in the same way as in the previous section, except that here, the polynomials are in $latex {{\mathbb R}[x,y_1, \dots, y_m]}&fg=000000$, we use zmod instead of mod, and each time we add a polynomial, we make a guess about the signs of its coefficients. The degree of the newly added polynomials decreases at each stage, so that there’s some finite bound $latex {k}&fg=000000$ such that $latex {\mathcal{P}_k}&fg=000000$ doesn’t grow anymore. We can now run the procedure described in the previous section to find the sign configuration diagram for $latex {\mathcal{P}_k}&fg=000000$. In the course of this procedure, whenever we need to know the sign of the leading coefficient of a polynomial, we use the value already guessed for it. Finally, we check whether any interval in the sign configuration diagram contains the desired sign pattern.

Consider the whole set of guesses we’ve made in the course of arriving at this sign configuration diagram. Notice that the total number of guesses is finite, bounded as a function of $latex {n}&fg=000000$. Now, let’s call the set of guesses good if it results in a sign configuration diagram that indeed contains an interval with the wanted sign pattern. Our final quantifier-free formula is a disjunction over all good sets of guesses that checks whether the given input values of $latex {y_1, \dots, y_m}&fg=000000$ is consistent with a good set of guesses.

About a month ago, Ankur Moitra dropped by my office. We started chatting about what each of us was up to. He told me a story about a machine learning problem that he was working on with Sanjeev Arora, Rong Ge, and Ravi Kannan. On its face, it was not even clear that the problem (non-negative rank) was decidable, let alone solvable in polynomial time. But on the other hand, they observed that previous work had already shown the existence of an algorithm using quantifier elimination. Ankur was a little taken aback by the claim, by the power of quantifier elimination. He knew of the theory somewhere in the back of his mind, in the same way that you probably know of Brownian motion or universal algebra (possible future topics in this “Something you should know about” series!), but he’d never had the occasion to really use it till then. On the train ride back home, he realized that quantifier elimination not only showed decidability of the problem but could also be helpful in devising a more efficient algorithm.

Quantifier elimination has a bit of a magical feel to it. After the conversation with Ankur, I spent some time revisiting the area, and this post is a consequence of that. I’ll mainly focus on the theory over the reals. It’s a remarkable result that you definitely should know about!

1. What is Quantifier Elimination?

A zeroth-order logic deals with declarative propositions that evaluate to either true or false. It is defined by a set of symbols, a set of logical operators, some inference rules and some axioms. A first-order logic adds functions, relations and quantifiers to the mix. Some examples of sentences in a first-order logic:

The above are examples of sentences, meaning that they contain no free variables (i.e., variables that are not bounded by the quantifiers), whereas a formula $latex {\phi(x_1,\dots,x_n)}&fg=000000$ has $latex {x_1, \dots, x_n}&fg=000000$ as free variables. A quantifier-free formula is one in which no variable in the formula is quantified. Thus, note that a quantifier-free sentence is simply a proposition.

Definition 1A first-order logic is said to admit quantifier elimination if for any formula $latex {\phi(x_1,\dots,x_n)}&fg=000000$, there exists a quantifier-free formula $latex {\psi(x_1,\dots,x_n)}&fg=000000$ which is logically equivalent to $latex {\phi(x_1,\dots,x_n)}&fg=000000$.

If the quantifier elimination process can be described algorithmically, then decidability of sentences in the logic reduces to decidability of quantifier-free sentences which is often a much easier question. (Note though that algorithmic quantifier elimination of formulas is a stronger condition than decidability of sentences.)

2. Quantifier Elimination over the Reals

The real numbers, being an infinite system, cannot be exactly axiomatized using first-order logic because of the Löwenheim-Skolem Theorem. But the axioms of ordered fields along with the intermediate value theorem yields a natural first-order logic, called the real closed field. The real closed field has the same first-order properties as the reals.

Tarski (1951) showed that the real closed field admits quantifier elimination. As a consequence, one has the following (Seidenberg was responsible for popularizing Tarski’s result):

Theorem 2 (Tarski-Seidenberg) Suppose a formula $latex {\phi(y_1,\dots,y_m)}&fg=000000$ over the real closed field is of the following form: \begin{equation*} Q_1x_1~Q_2x_2~\cdots Q_nx_n~(\rho(y_1,\dots,y_m,x_1,\dots,x_n)) \end{equation*} where $latex {Q_i \in \{\exists,\forall\}}&fg=000000$ and $latex {\rho}&fg=000000$ is a boolean combination of equalities and inequalities of the form:

$latex \displaystyle f_i(y_1,\dots,y_m,x_1,\dots,x_n) = 0&fg=000000$

$latex \displaystyle g_i(y_1,\dots,y_m,x_1,\dots,x_n) > 0&fg=000000$

where each $latex {f_i}&fg=000000$ and $latex {g_i}&fg=000000$ is a polynomial with coefficients in $latex {{\mathbb R}}&fg=000000$, mapping $latex {{\mathbb R}^{m+n}}&fg=000000$ to $latex {{\mathbb R}}&fg=000000$. Then, one can explicitly construct a logically equivalent formula $latex {\phi’(y_1, \dots, y_m)}&fg=000000$ of the same form but quantifier-free. Moreover, there is a proof of the equivalence that uses only the axioms of ordered fields and the intermediate value theorem for polynomials.

I should be more explicit about what “explicitly construct” means. Usually, it is assumed that the coefficients of the polynomials $latex {f_i}&fg=000000$ and $latex {g_i}&fg=000000$ are integers, so that there is a bound on the complexity of the quantifier elimination algorithm in terms of the size of the largest coefficient. (If the coefficients are integers, all the computations only involve integers.) But, even when the coefficients are real numbers, there is a bound on the “arithmetic complexity” of the algorithm, meaning the number of arithmetic operations $latex {+, -, \times, \div}&fg=000000$ performed with infinite precision.

A quick example. Suppose we are given an $latex {r}&fg=000000$-by-$latex {s}&fg=000000$ matrix $latex {M = (M_{i,j})}&fg=000000$, and we’d like to find out whether the rows of $latex {M}&fg=000000$ are linearly dependent, i.e. the following condition:

Tarski-Seidenberg immediately asserts an algorithm to transform the above formula into one that does not involve the $latex {\lambda_i}&fg=000000$’s, only the entries of the matrix. Of course, for this particular example, we have an extremely efficient algorithm (Gaussian elimination) but quantifier elimination gives a much more generic explanation for the decidability of the problem.

2.1. Semialgebraic sets

If $latex {m=0}&fg=000000$ in Theorem 2, then Tarski-Seidenberg produces a proposition with no variables that can then be evaluated directly, such as in the linear dependency example. This shows that the feasibility of semialgebraic setsis a decidable problem. A semialgebraic set $latex {S}&fg=000000$ is a finite union of sets of the form:

where $latex {f_1,\dots, f_{\ell_1}, g_1,\dots, g_{\ell_2}: {\mathbb R}^n \rightarrow {\mathbb R}}&fg=000000$ are $latex {n}&fg=000000$-variate polynomials over the reals. The feasibility problem for a semialgebraic set $latex {S}&fg=000000$ is deciding if $latex {S}&fg=000000$ is empty or not. An algorithm to decide feasibility directly follows from applying quantifier elimination.

Note that this is in stark contrast to the first-order theory over the integers for which Godel’s Incompleteness Theorem shows undecidability. Also, over the rationals, Robinson proved undecidability (in her Ph.D. thesis advised by Tarski) by showing how to express any first-order sentence over the integers as a first-order sentence over the rationals. This latter step now has several different proofs. Poonen has written a nice survey of (un)decidability of first-order theories over various domains.

2.2. Efficiency

What about the complexity of quantifier elimination over the reals? The algorithm proposed by Tarski does not even show complexity that is elementary recursive! The situation was much improved by Collins (1975) who gave a doubly-exponential algorithm using a technique called “cylindrical algebraic decomposition”. More precisely, the running time of the algorithm is $latex {poly(C,(\ell d)^{2^{O(n)}})}&fg=000000$, where $latex {C}&fg=000000$ measures the size of the largest coefficient in the polynomials, $latex {\ell}&fg=000000$ is the number of polynomials, $latex {d}&fg=000000$ is the maximum degree of the polynomials, and $latex {n}&fg=000000$ is the total number of variables.

A more detailed understanding of the complexity of quantifier elimination emerged from an important work of Ben-Or, Kozen and Reif (1986). Their main contribution was an ingenious polynomial time algorithm to test the consistency of univariate polynomial constraints: given a set of univariate polynomials $latex {{f_i}}&fg=000000$ and a system of constraints of the form $latex {f_i(x) \leq 0}&fg=000000$, $latex {f_i(x) = 0}&fg=000000$ or $latex {f_i(x) > 0}&fg=000000$, does the system have a solution $latex {x}&fg=000000$? Such an algorithm exists despite the fact that it’s not known how to efficiently find an $latex {x}&fg=000000$ that makes the signs of the polynomials attain a given configuration. Ben-Or, Kozen and Reif also claimed an extension of their method to multivariate polynomials, but this analysis was later found to be flawed.

Nevertheless, subsequent works have found clever ways to reduce to the univariate case. These more recent papers have shown that one can make the complexity of the quantification elimination algorithm be only singly exponential if the number of quantifier alternations (number of switches between $latex {\exists}&fg=000000$ and $latex {\forall}&fg=000000$) is bounded or if the model of computation is parallel. See Basu (1999) for the current record and for references to prior work.

As for lower bounds, Davenport and Heintz (1988) showed that doubly-exponential time is required for quantifier elimination, by explicitly constructing formulas for which the length of the quantifier-free expression blows up doubly-exponentially. Brown and Davenport (2007) showed that the doubly exponential dependence is necessary even when all the polynomials in the first order formula are linear and there is only one free variable. I do not know if a doubly exponential lower bound is known for the decision problem when there are no free variables.

Thanks to Ankur for helpful suggestions. Part II of this post will contain a relatively quick proof of Tarski’s theorem! Stay tuned.