Abstract

It is shown how the mid-point iterative method with cubical rate of convergence can be applied for finding the principal matrix square root. Using an identity between matrix sign function and matrix square root, we construct a variant of mid-point method which is asymptotically stable in the neighborhood of the solution. Finally, application of the presented approach is illustrated in solving a matrix differential equation.

1. Introductory Notes

Let us consider a scalar function and a matrix and specify to be a matrix function of the same dimensions as . Higham in [1] indicates that if is defined on the spectrum of , then one has the following properties for the matrix function :(i) commutes with ;(ii);(iii);(iv)the eigenvalues of are , where are the eigenvalues of ;(v)if commutes with , then commutes with .

In this paper, we are interested in numerical computation of matrix square root, which is one of the most fundamental matrix functions with potential applications. An example involving the computation of matrix square roots appears in a semidefinite programming algorithm [2]. A step length is calculated using a line search algorithm that involves computing the unique symmetric positive definite (SPD) square roots of a sequence of SPD matrices, where each matrix differs little from the previous one. That is, given a SPD and the SPD square root of , one wishes to find the SPD square root of , where is small. This has to be done repeatedly with changing slightly each time.

It is known that [3] has a square root if and only if in the ascent sequence of integers defined by
no two terms are the same odd integer.

Among the square roots of a matrix, the principal square root is distinguished by its usefulness in theory and applications. Let have no eigenvalues on (the closed negative real axis). There is a unique square root of that all of its eigenvalues lie in the open right half-plane, and it is a primary matrix function of [4]. We refer to as the principal square root of and write . Note that if is real, then is real.

Björck and Hammarling [5] developed a numerical method for the principal square root via Schur form. Among all the available numerical procedures, the Newton’s matrix method and its variants have received many attractions in order to tackle this problem. To be more precise, applying the famous Newton’s method to the matrix equation
would yield the following (simplified) matrix iteration method [4]:
for suitable initial values , wherein the commutativity of matrices and has been considered to simplify the procedure as much as possible. Unfortunately, this iteration becomes unstable in most cases as pointed out by Higham [4].

To remedy this, Denman and Beavers (DB) in [6] proposed a quadratically convergent variant of Newton’s matrix method (3) which is numerically stable in what follows:
This method generates the sequence , which converges to . For further readings, one may refer to [7, 8].

In the next section, we first apply the following mid-point cubically convergent scheme of Frontini and Sormani (given for scalar nonlinear equations [9])
to derive a new iterative scheme for the numerical computation of . In Section 2, we theoretically prove that the scheme is convergent under some proper conditions. A variant of the mid-point rule (5) is brought forward to provide a stable iteration for finding both the principal matrix square root and the principal inverse matrix square root. We note that our results in this work are some extensions to [10]. In order to derive a higher order new version of mid-point method for computing square root, we use some of the methods in the literature and obtain a new scheme for as well. Section 3 is devoted to the application of the discussed matrix methods in solving some examples. Finally, Section 4 draws a conclusion of this paper.

2. Extension of the Mid-Point Method

Multipoint nonlinear solvers such as (5) belong to the class of the most efficient methods for nonlinear scalar equations. Recent interests in the research and development of this type of methods have arisen from their capability to overcome theoretical limits of one-point methods concerning the convergence order and computational efficiency.

Let us apply (5) for solving the nonlinear matrix equation (2). We attain the following scheme:
considering the fact that the initial approximation is of the following forms:
or
wherein and are suitable positive parameters to ensure convergence. The derivation of (6) may not be clear yet. Hence, we give a Mathematica code written for obtaining it as fast as possible in a symbolical language as follows:

ClearAll["Global‘*"]

f[X_]:= X^2 - A;

Y=X - f[X]/(2f’[X]) // FullSimplify;

FullSimplify[X - f[X]/f’[Y]]

Lemma 1. Let be a nonsingular complex matrix with no eigenvalues on . If is valid, then for the sequence of of (6), one has that
holds, for all .

Proof. The proof is by induction. Let be one of the initial values (7) or (8); then it is obvious that . For the inductive hypothesis, we take and and show that
Note that, from , we obtain that . This completes the proof.

Let be a metric space. We say that a sequence of points of converges to a point of if
The convergence properties will depend on the choice of a distance in , but for a given distance the speed of convergence of the sequence is characterized by the speed of convergence of the sequence of nonnegative numbers [11]. In what follows, we investigate the rate of convergence for (6).

Theorem 2. Let have no eigenvalues on . If the initial approximation is according to the rules (7) or (8), then the iterative method (6) converges to the principal with local third order of convergence (without considering the round-off error).

Proof. Using the methodology of [10], we now prove the convergence order of (6). If is considered to be diagonalizable and nonsingular, then
where is a nonsingular matrix. In addition assume that . From the iterative method (6), we attain
is the sequence of real block diagonal matrices . Note that if is a diagonal matrix, then all successive are diagonal too. The recursive relation (13) is equivalent to scalar equations as follows:
The sequence is real and it is obvious that
Then the fact that yields . Therefore,
Using a transformation by , we have
Taking norm of (17), we obtain that
Therefore, the method (6) possesses the local third order of convergence. Equation (11) reveals, in a metric space, how convergence could be understood from (17). Furthermore since has no nonpositive real eigenvalues, so the mid-point approach computes the principal matrix square root.We remark that all iterates generated by (6) are well defined when the function is not zero for all nonnegative numbers . In fact, the initial values actually determine a unique real number for each . In general, well defined means that some object was given a description up to some arbitrary choices but that the arbitrary choices have no material effect. Accordingly, using the initial choices and , the sequence is well defined.

Under the assumptions of Theorem 2. (without considering the round-off error), and when is nonsingular, tends to . This means that tends to , that is, the principal inverse matrix square root.

Unfortunately, scheme (6) has turned out to be unstable when the size of the input matrix is high or is ill-conditioned. A similar reasoning as in [4] could reveal this point using . Therefore, (6) can only be employed for symmetric positive definite matrices with extremely well-conditioned structure.

Iannazzo in [12] showed that in the case that is the identity matrix, the Newton’s method converges for any matrix having eigenvalues with modulus less than 1 and with positive real parts. Similar results can be deduced for (6).

In order to reach third order of convergence by the mid-point approach in solving (2) as well as to obtain a stable sequence, we use a fact (identity) in what follows:
which indicates an important relationship between matrix square root and the matrix sign function .

We remark that (19) has an extra important advantage which is the computation of the inverse matrix square root along with the matrix square root at the same time.

Applying (5) to solve the matrix equation yields in its reciprocal form
This method which is similar to the application of Halley’s method for solving (see, e.g., [13, 14]) computes the matrix sign of the nonsingular matrix
and in the convergence phase gives
It is easy to deduce the following convergence theorem for (20).

Theorem 3. Let have no eigenvalues on . If is (21), then the iterative method (20) converges to (22).

Proof. The proof of this theorem is based on the diagonalization property of the sign matrix . It is hence omitted.

Lemma 4. The sequence generated by (20) using (21) is asymptotically stable.

Proof. Let be the numerical perturbation introduced at the th iterate of (20). Next, one has
Here, we perform a first-order error analysis; that is, we formally neglect quadratic terms such as . We have
where the following identity will be used (for any nonsingular matrix and the matrix ):
Further simplifying yields to
where , , and, for enough large , we have considered . After some algebraic manipulations and using , we conclude that
Applying (27) recursively, and after some algebraic manipulations, we have
From (28), we can conclude that the perturbation at the iterate is bounded. Therefore, the sequence generated by (20) is asymptotically stable.

The iteration (20) requires one matrix inversion per computing step and obtains both and which are of interest in most practical problems. The implementation of (20) for computing principal square roots requires a sharp attention so as to save much effort. Since the intermediate matrices are all sparse (at least half of the entries are zero), then one could simply use sparse approximation techniques to save up the memory and time.

2.1. A Novel Variant of Mid-Point Scheme

The analysis of the method from applying mid-point method to was of limited interest, because it yields an iteration method that is not in general stable. The method with cubic convergence derived via the matrix sign function (20) is more useful. However, this method is somewhat not challenging, since it could be a special case of a family of iteration methods due to Kenney and Laub [15] and derived from the (explicitly known) rational Padé approximations for . These methods are surveyed in Higham [1, Section 5.4] and the application to the matrix square root is discussed in Section 6.7. General results on the stability and rate of convergence for this family of methods are given in Theorems 6.11 and 6.12 of [1].

To move forward, we recall that recently authors in [16] proposed an efficient and interesting variant of mid-point method for nonlinear equations and extended it for polar decomposition, while authors of [17] developed that variant for matrix sign function as follows:
where .

Theorem 5. Let have no eigenvalues on . If one uses (21), then the iterative method (29) converges to (22).

Proof. The proof of this theorem is similar to the proof of Theorem 3.

Lemma 6. The sequence generated by (29) using (21) is asymptotically stable.

Note that in general the inverse of the nonsingular matrix
is of the form
In fact, all matrices in (29) would be of the form
This itself might be helpful to reduce the computational time of matrix inversion problem.

An implementation of (29) to compute and for a square matrix having no eigenvalues on , in the programming package Mathematica, is brought forward as follows:

sqrtMatrix2[A_,maxIterations_,tolerance_]

:= Module[{k=0},{n,n}=Dimensions[A];

Id=SparseArray[{{i_,i_} ->1.},{n,n}];

A1=SparseArray@ArrayFlatten[{{0,A},

{Id,0}}];

Y[0]=A1;R[0]=1;

Id2=SparseArray[{{i_,i_} ->1.},

{2n,2n}];While[k<maxIterations && R[k]

>=tolerance,

Y2=SparseArray[Y[k].Y[k]];

l1=SparseArray[Y[k].(7Id2+Y2).(Id2+3Y2)];

l2=SparseArray@ArrayFlatten[{{0,

Inverse@l1[[n+1;;2n,1;;n]]},

{Inverse@l1[[1;;n,n+1;;2n]],0}}];

Y[k+1]=SparseArray[(Id2+18Y2

+13Y2.Y2).l2];R[k+1]=Norm[Y[k+1]-Y[k],

Infinity]/Norm[Y[k+1],

Infinity];k++];Y[k]];

The above three-argument function sqrtMatrix2 computes the principal matrix square root and its inverse at the same time by entering the three arguments “matrix with no eigenvalue in ,” “the maximum number of iterations that (29) could take,” and the tolerance of the stopping termination in the infinity norm . Clearly sqrtMatrix2[A,maxIterations,tolerance] [[1;;n,n+1;;2n]] and sqrtMatrix2[A,maxIterations, tolerance][[n+1;;2n,1;;n]] give and , respectively, wherein is the size of the input matrix .

2.2. Extension to Matrix th Root

Computing the th roots of matrices can arise in certain computations. The unique matrix such that , () and whose eigenvalues are in the segment
is called the principal th root and is denoted by .

Many authors have investigated methods for computing th root of matrices. The methods are normally based on iteration or Schur normal form; see, for example, [18] and the references therein. In general, number of th roots for a given matrix may be zero, finite, or infinite.

In this subsection, we provide a combination of Algorithm 7.9 in [1] and our method (29) to compute principal matrix th root in what follows.

Given having no eigenvalues on , , the following lines compute using (29):(1)compute by (29) and set ;(2)set , where is any upper bound for (e.g., );(3)use the coupled Newton iteration (35) with to compute
(4),where
when , .

3. Applications

We test the contributed method (29) denoted by PM1 using Mathematica 8 in machine precision [19]. Apart from these schemes, several iterative methods (3), (4), (20) denoted by PM and the method of Meini originally developed in [20] based on the cyclic reduction algorithm (CR) have been considered as follows:
This method generates the sequence , which converges to . Note that (36) only computes while (4), (20), and (29) produce and .

The computational order of convergence for computing the principal matrix square root by matrix iterations can be estimated by
wherein , , are the last three approximations for finding in the convergence phase.

Example 7. As the first example, we consider a matrix
whereas its eigenvalues are . The principal matrix square root of is

We compare the behavior of different methods and report the numerical results using for all norms involved with the stopping criterion
in Table 1. The numerical results are in harmony with the theoretical aspects of Section 2.

A differential equation is a mathematical equation for an unknown function of one or several variables that relate the values of the function itself and its derivatives of various orders. A matrix differential equation contains more than one function stacked into vector form with a matrix relating the functions to their derivatives. We now illustrate the application of the new solver for solving matrix differential equations.

Example 8. In this test, we compare the behavior of different methods for solving the following matrix differential equation:
where its exact solution can be expressed as
and the sparse SPD matrix in Mathematica is defined as follows:A=SparseArray[{{i_,i_} ->10.,{i_,j_} /; Abs[i-j]==1 ->-5.},{n,n}];

The interesting point in this example is that we need both and at the same time and this paves up the application of (29). We compare the behavior of different methods in Figure 1 for finding the principal matrix square root of . The results are promising for the proposed method in terms of the number of iterations. Figure 1 reveals that the fastest method is (29), while the Newton’s matrix method is numerically unstable and this makes it diverge after the seventh iterate.

Figure 1: The comparison of different methods for finding in Example 8.

Note again that among the compared methods only the method of DB and our proposed method PM1 are able to find at the same time which is fruitful in solving the matrix differential equation (41). Due to page limitation, we could not present the final solution of (41) explicitly here. Clearly, using the obtained principal matrix square root and the principal inverse matrix square root produces the final solution.

4. Conclusion

In this paper, we have developed and discussed the mid-point iterative root-finding method for solving some special and important matrix problems. We discussed under what conditions the obtained method is convergent to the principal matrix square root, that is, when has no nonpositive real eigenvalues.

Furthermore, an asymptotically stable variant of the mid-point method using an identity between the matrix square root and the matrix sign function has been derived. The numerical experiments which we have performed, reported at the end of the paper, show that the presented procedure is a reliable tool for computing the principal matrix square root and its inverse.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgment

The authors would like to record their sincerest thanks to the referees which their comments have helped a lot in enhancing the readability of this paper.