Solving Real Polynomial Equations Numerically

Learn how to solve real univariate polynomial equations numerically by a bisection algorithm using Sturm chains.

Note: At the bottom of every page, there also is a link to a full implementation of this algorithm.

Introduction

Solving equations numerically is a very common problem. I will discuss and implement an algorithm to solve real polynomial equations, whereas you can fix the index of the solution.

The focus of this article lies on a sample implementation and possible problems implementing the algorithm given by Sturm rather than on details about the mathmatical background.

Polynomial Equations

It often turns out that the solution of a problem demands solving a polynomial equation. Sometimes, this equation is given directly by the problem; sometimes it is the result of trying to solve the problem by the help of an algebra tool (you know those "RootOf" expressions produced by Maple). Anyway, it may be necessary to compute a solution of an equation of the formp1(x) = p2(x)
where p1 and p2 are real univariate polynomials (if you are lucky). As this equation is equivalent to the equationp1(x) - p2(x) = 0
you can reduce this problem to finding zeroes of a polynomial.

One algorithm is the bisection procedure based on Sturm's theorem, which is presented here. It takes a necessarily square-free (explained later) polynomial, an interval, where to look for the zero, and the index of the zero you are searching (counted from the lower bound of the interval on) and provides you with an approximation of the zero.

Sturm Chains

A polynomial p is called "square-free", if there is no square, which divides it. This implies that there are also no factors of higher order. This again implies that every zero has multiplicity 1.

As already stated, the algorithm you are dealing with assumes that the polynomial is square-free. This isn't really a limitation because every polynomial can be factorized to square-free polynomials (if you find out that the square b*b divides your polynomial, just divide your polynomial by b and perform this iteratively) and finding zeroes of factors means finding zeroes of the original polynomial. But as non-square-free polynomials occur pretty improbably, I don't deal with it here.

Given a square-free polynomial q_0, a Sturm chain is the inverse (!) sequence p of intermediary results when applying Euclid's algorithm (iterative modulo-division) to q_0 and its derivative q_1 = q_0'.

Sturm's Theorem

So, what is all this about? Sturm's theorem tells you that you can use Sturm chains to count zeroes of the given polynomial q_0. It says that if w(x) is the number of sign changes in the Sturm chain evaluated at a given point x, the number of zeroes of q_0 in the interval ]a,b] is w(a)-w(b).

If you are able to count sign changes in a Sturm chain (and it is relatively easy to do so), you are able to determine the number of zeroes in every given interval and can enclose certain zeroes by controlled downsizing of an interval.

Because the polynomials are able to become zero instead of being just positive or negative and recursion is not always very transparent, the mathmatical definition of the "number of sign changes"-function w is defined a bit complicated:

But already, the representation in pseudo-code using a for-loop looks much more attractive:

Bisection algorithm

Now, because you are able to determine how many zeroes there are in an interval, you are in a state where it is possible for you to enclose a zero with a given index. Starting with a user-defined interval, you split it iteratively into two equal-sized parts and decide by the help of the w-function applied to the border of the intervals, in which half you can find your zero.

Let n be the index of the wanted zero, ref the reference point (where to start counting the zeroes) and ]lower, upper] the current interval, you have to check whether w(ref) - w(middle) >= n. In the first case (which means that there are at least n zeroes left of middle), you continue with the interval ]lower,middle], in the second case with the interval ]middle, upper].

The algorithm stops after a given number of iterations or when it reaches machine precision.

Note: In fact, the bisection algorithm works under even looser conditions for the Sturm chain, namely the axioms for "Generalized Sturm chains". But, as your Sturm chains fulfill those conditions and you know how to constuct them, you don't need to waste much thought about the more general concept.

Programming Polynomials

Determining the requirements

If you inspect the algorithm above regarding which operations are needed, you find out that your polynomial implementations must be capable of the following operations:

Determining the Sturm chain:

For determining P1: Symbolic derivation

For determining P2,...,Pn: "mod"; in other words, calculating the remainder of a polynomial division

For determining P2,...,Pn: Sign change; in other words, Multiplication of polynomials by -1 (more general: Multiplication by scalars)

For determining n: Checking whether a polynomial is constant (more general: Determination of the degree of a polynomial)

For the bisection algorithm, you need the following two operations:

Deciding whether two polynomials have different signs at a given point

Determining whether a polynomial evaluates to zero at a given point

Both operations easily can be deduced from an evaluation operation. This tells you the value of the polynomial at a given point.

Polynomial Interface

These demands lead you to the following interfaces:

public interface Function {
/**
* Evaluates a function at a given point
*
* @param x
* the point at which the function has to be evaluated
* @return the function value at x
*/
public double valueAt(double x);
}

public interface Polynomial extends Function {
/**
* Returns the coefficients array of the polynomial
*
* @return the coefficients array of the polynomial, where
* result[n] is the coefficient before X^n
*/
public double[] toArray();
/**
* Calculates the first derivation of the polynomial
*
* @return the derivation
*/
public Polynomial diff();
/**
* Calculates the remainder of the polynomial division of this
* by other
*
* @param other
* the divisor (must not be constant)
* @return the remainder of the polynomial division
*/
public Polynomial mod(Polynomial other);
/**
* Multiplies the polynomial by a real scalar
*
* @param scalar
* the scalar to multiply the polynomial by
* @return the multiplied polynomial
*/
public Polynomial multiply(double scalar);
/**
* Determines the degree of the polynomial
*
* @return the degree of the polynomial, where the degree of
* the zeropolynomial is defined as -1
*/
public int degree();
}

The distinction between functions in general and polynomials is not necessary for your purposes, but may be useful for future extensions.

Likewise, the toArray() method is not needed definitely, but will be useful if you want to implement different polynomial classes (for example, polynomials of degree less or equal than a given constant). The toArray() method then provides you with a common representation and makes you able to use different polynomial implementations in the same operation (for example, you can calculate the remainder of a Polynomial4 and a general Polynomial).

Implementation

In this tutorial, you confine yourself to polynomials of degree less or equal than 4. So, you need 5 coefficients as attributes of the class and constructors for initially setting them. The toArray() method and the toString() method (used for debugging, not necessary) are also easy to implement.

As firstly the derivation of a4*X*X*X*X + a3*X*X*X + a2*X*X + a1*X + a0 is apparrently 4*a4*X*X*X + 3*a3*X*X + 2*a2*X + a1 (as every primary-school pupil can approve), secondly multiplication of a polynomial by a scalar is defined as multiplication of every coefficient by the scalar and thirdly the degree of a polynomial is the maximum index of a non-zero coefficient, you can implement those three methods:

The algorithm for calculating the remainder of a polynomial division becomes more difficult. Furthermore, here one cannot make much benefit (in form of efficiency) of the fact that the polynomials have a degree less or equal then 4. So, you implement a more general algorithm.

So first of all, you have to implement the polynomial reduction. The reduction of a by b is defined as:a' = a - b * (hc(a) / hc(b)) * X^(deg(a)-deg(b))
where hc(p) means the non-zero coefficient of p, which has got the highest index ("highest coefficient").
One can proove that deg(a') < deg(a), which will be needed later.

As you know that polynomial reduction reduces the degree (see above), termination of this algorithm is assured.

Polynomial division by a constant polynomial is undefined (IllegalArgumentException)

Because the function above still works on double arrays instead of polynomial objects, you have to introduce one (or two) wrapper method(s):

// Actually a degree4-result is sufficient (and we want to make
// our users benefit from this fact, so we define a
// Polynomial4-mod-method)...
public Polynomial4 mod(Polynomial4 other) {
return new Polynomial4(mod(toArray(), degree(),
other.toArray(), other.degree()));
}
// ...but to satisfy the interface we need also a method, which
// returns a general polynomial.
public Polynomial mod(Polynomial other) {
return new Polynomial4(mod(toArray(), degree(),
other.toArray(), other.degree()));
}

Implementation of the Bisection Algorithm

By using the available polynomial methods, you now are able to create a Sturm chain from a given polynomial by simply using the definition:

For simplicity reasons (you don't know the length of the Sturm chain at first), you use a list at first. But for efficiency purposes (we are going to iterate through the Sturm chain very often), you then convert this list to an array.

The implementation of the "w" method used in Sturm's theorem is relatively straightforward using the already given pseudo-code (see page 1):

Implementation note: As you have to live and calculate with the impreciseness of the machine operations, I introduced a tolerance factor precision, which is be used when you check a function for being zero. This should be a very small positive value. Defining it to large or too small will reduce the precision of the result in some cases. More facts on this issue are given in the last section.

So, you have got everything needed for your bisection algorithm, which is not very complicated any more: Take the interval and halve it depending on whether the "w" function tells you that the wanted zero is left or right of the middle. The rest should be self-explanatory:

Because the user is usually not expected to calculate the Sturm chain itself or give all necessary parameters, you define some wrapper functions to support him/her and place all the stuff into a new class:

So, apparantly the algorithm is working and producing good approximations of the zeroes.

What happens in the second test case? you print out two solutions, though your polynomial has got only one. So, the bisection algorithm always knows that the demanded zero is not in the left interval and therefore chooses the right interval. This leads to producing a result near the upper bound 10 of the given interval [-10, 10].

If you consider the last test case, you may notice that this polynomial is not square-free. Nevertheless, the algorithm returns the correct zeroes. So, after all, it seems to be robust against non-square-free polynomials.

Imaginable Modifications/Extensions

Performing some considerations about limits should enable you to find zeroes counted from minus infinity instead of a real reference point.

As pointed out in the section about Sturm's theorem, one could extend the algorithm to work with non-square-free polynomials, too.

A check whether there exists a zero with the given index in the given interval would also be nice (instead of just returning a value near the upper bound of the interval).