This is a very weird notion of "elementary", isn't it? Defining the complex numbers using the reals takes hardly more than 1 page. There is a real-analysis proof of the spectral theorem which never uses complex numbers; instead, it uses induction and Lagrange multipliers to find the maximum of $\left|\left|Ax\right|\right|$ over $x\in S\left(0,1\right)$ (the sphere with center $0$ and radius $1$). This maximum is then shown to be an eigenvalue of $A$, and the vector $x$ for which the maximum is achieved is an eigenvector. ...
–
darij grinbergJan 11 '13 at 13:28

13

What does "has real eigenvalues" mean? Apparently it is not to be understood as "has no nonreal eigenvalues", since mention of complex numbers is forbidden. Does it mean "has at least one real eigenvalue"? Does it mean: (where the size is $n \times n$) "has $n$ linearly independent eigenvectors with real eigenvalues"?
–
Gerald EdgarJan 11 '13 at 14:16

4

I'm with Gerald in not being sure exactly what the question's asking. By definition, the eigenvalues of a matrix over a field $k$ are elements of $k$. So strictly speaking, the question is trivial; looking for a nontrivial interpretation, I guess it must be one of the two possibilities that Gerald mentions. @Z254R: yes, I think Gerald is helping to formulate the problem.
–
Tom LeinsterJan 11 '13 at 17:08

@marjeta: the point is that we shouldn't have to spend time guessing exactly what your question means, which is what many of these comments are trying to do. You should make it clear what your question means.
–
Tom LeinsterJan 12 '13 at 22:23

10 Answers
10

First minimize the Rayleigh ratio $R(x)=(x^TAx)/(x^Tx).$ The minimum exists and is real.
This is your first eigenvalue.

Then you repeat the usual proof by induction in dimension of the space.

Alternatively you can consider the minimax or maximin problem with the same Rayleigh ratio,
(find the minimum of a restriction on a subspace, then maximum over all
subspaces) and it will give you all eigenvalues.

But of course any proof requires some topology. The standard proof requires Fundamental theorem of
Algebra, this proof requires existence of a minimum.

Alexander, when you said that the minimum is an eigenvalue, did you mean to prove it by applying the Lagrange multiplier equation to the function $f(x)=x^tAx$ restricted to a level set of $g(x)=x^tx$, or did you have a different idea in mind?
–
Marcos CossariniJan 13 '13 at 23:56

6

Marcos. In that case, you can prove the Lagrange multiplier relation hand: if $\lambda$ is the minimum, then for every $y$, $(x+y)^TA(x+y)\geq \lambda (x+y)^T(x+y)$, that is, $2((y^T (Ax-\lambda x)\geq \lambda y^Ty - y^TAy$. The LHS is homoegeneous of degree $1$, the RHS of degree $2$. So the LHS has to be zero for every $y$. This implies $Ax=\lambda x$.
–
ACLJan 14 '13 at 0:15

Marcos: yes. ACL's explanation is one way to do it.
–
Alexandre EremenkoJan 14 '13 at 21:31

Let me give it a try. This one only uses the existence of a maximum in a compact set, and the Cauchy-Schwarz inequality.

Let $T$ be a selfadjoint operator in a finite dimensional inner product space.

Claim:$T$ has an eigenvalue $\pm\|T\|$.

Proof: Let $v$ in the unit sphere be such that $\|Tv\|$ attains its maximum value $M=\|T\|$. Let $w$ also in the unit sphere be such that $Mw=Tv$ (which is like saying that $w=\frac{Tv}{\|Tv\|}$, except in the trivial case $T=0$).

This implies that $\langle w,Tv\rangle=M$. In fact, the only way that two unit vectors $v$ and $w$ can satisfy this equation is to have $Tv=Mw$. (Since we know that $\|w\|=1$ and $\|Tv\|\leq M$, the Cauchy-Schwarz inequality tells us that $|\langle w,Tv\rangle|\leq M$, and the equality case is only attainable when $Tv$ is a scalar multiple of $w$, with the scalar $\lambda$ determined by the computation
$M=\langle w,Tv\rangle=\langle w,\lambda w\rangle=\lambda\langle w,w\rangle=\lambda$.)

But by selfadjointness of $T$, we also know that $\langle v,Tw\rangle=M$, and this implies, by the same Cauchy-Schwartz-equality reasoning, that $Tw=Mv$.

Now, one of the two vectors $v\pm w$ is nonzero, and we can compute

$T(v\pm w)=Tv\pm Tw=Mw\pm Mv=M(w\pm v)=\pm M(v\pm w)$.

This concludes the proof that $\pm\|T\|$ is eigenvalue with eigenvector $v\pm w$. The reality of the other eigenvalues can be proved by induction, restricting to $(v\pm w)^\bot$ as in the usual proof of the spectral theorem.

Remark:The proof above works with real or complex spaces, and also for compact operators in Hilbert spaces.

Comment: I would like to know if this proof can be found in the literature. I obtained it while trying to simplify a proof of the fact that if $T$ is a bounded selfadjoint operator, then $\|T\|=\sup_{\|v\|\leq 1} \langle Tv,v\rangle$ (as found, for example, on p.32 of Conway J.B., "An Introduction to Functional Analysis"). In the case of non-compact operators, one can only prove that $T$ has as an approximate eigenvalue one of the numbers $\pm\|T\|$. The argument is similar to the one above, but knowledge of the equality case of Cauchy-Schwarz is not enough. One has to know that near-equality implies near-dependence. More precisely, let $v$ be a fixed unit vector, $M\geq 0$ and $\varepsilon\in[0,M]$. If $z$ is a vector with $\|z\|\leq M$ such that $|\langle v,z\rangle|\geq \sqrt{M^2-\epsilon^2}$, then it can be proved that $z$ is within distance $\varepsilon$ of $\langle v,z\rangle v$.

Exercise: Follow the proof (find the possible vectors $v$ and $w$) for the cases in which $T:\mathbb R^2\to\mathbb R^2$ is given by any of the matrices $\begin{pmatrix}2&0\\0&1\end{pmatrix}$, $\begin{pmatrix}-2&0\\0&1\end{pmatrix}$, $\begin{pmatrix}2&0\\0&-2\end{pmatrix}$. This may make clear how the proof was made. Notice that $v$ and $w$ are already eigenvectors in some ("most") cases.

I don't see why this is different from Alexander Eremenko's answer.
–
Deane YangJan 13 '13 at 22:10

2

I don't understand Alexander's answer. How do you prove that if $R(x)=\frac{x^tAx}{x^tx}$ is maximum, then $x$ is an eigenvector? I got nowhere by derivating $R$, and the only easy way that I see to complete his proof is to normalize $x$ to get a maximum of $x^tAx$ in the unit sphere, and then write the Lagrange multipliers equation that tells you that $x$ is an eigenvector.
–
Marcos CossariniJan 13 '13 at 23:35

But Lagrange multipliers is, in my opinion, different from the argument above, which in fact was originally designed to deal with bounded operators, as explained in the comment. Can Lagrange multiplier be used to prove that $\pm\|T\|$ is an approximate eigenvalue of a bounded operator T? If not, is this enough to conclude that the proofs are different?
–
Marcos CossariniJan 13 '13 at 23:45

2

I think that the main difference is that Alexander extremises $x^tAx$ and I extremise $y^tAx$. That the two situations are not trivially equal is the subject of p.32 of Conway.
–
Marcos CossariniJan 14 '13 at 0:33

I love this answer because it gets by with such minimal machinery. Really just Cauchy-Schwarz.
–
Steven GubkinAug 12 '14 at 15:59

This is quite an interesting question, perhaps a research problem.
I think an elementary answer should be a high school algebra answer in the
sense of Abhyankar and it would have to be in the spirit of what follows.
But first a little story.

I was teaching linear algebra and had just covered eigenvalues and characteristic polynomials but was not yet at the chapter on the spectral theorem for real symmetric matrices. I was looking for problems to assign for my students as homework in the textbook we were using.
One of the exercises was to show that a real matrix
$$
A=\left[
\begin{array}{cc}
\alpha & \beta \\\
\beta & \gamma
\end{array}
\right]
$$
only had real eigenvalues.
Not too hard. Write
the characteristic polynomial
$$
\chi(\lambda)=det(\lambda I-A)=\lambda^2-(\alpha+\gamma)\lambda+\alpha\gamma-\beta^2
$$
then its discriminant is
$$
\Delta=(\alpha+\gamma)^2-4(\alpha\gamma-\beta^2)=(\alpha-\gamma)^2+4\beta^2\ge 0\ .
$$
Hence two real roots.

The next problem in the book was to do the same for
$$
A=\left[
\begin{array}{ccc}
\alpha & \beta & \gamma\\\
\beta & \delta & \varepsilon \\\
\gamma & \varepsilon & \zeta
\end{array}
\right]
$$
and (silly me) I also assigned it...

This formula comes from a paper by Ilyushechkin in Mat. Zametki, 51, 16-23, 1992.

I suspect the elementary answer should be as follows.
First find a list of invariants or covariants of binary forms $C_1,C_2,\ldots$
such that a form with real coefficients has only real roots iff these covariants are nonnegative. Apply this to the characteristic polynomial of a general real symmetric matrix and show that you get sums of squares. I suppose these covariants, via Sturm's sequence type arguments, should correspond to subresultants or rather subdiscriminants.
This seems also related to Part 2) of Godsil's answer.

Edit: Another recent research reference which relates to the above sum-of-squares formula
is the article The entropic discriminant by Sanyal, Sturmfels and Vinzant.

Edit 2: I just found out that the problem I mentioned above has been completely solved!
See Proposition 4.50 page 127 in the book by Basu, Pollack and Roy on real algebraic geometry. The connection with classical invariants/covariants of binary forms is not
apparent but it is there: their proof is based on subresultants and subdiscriminants which are leading terms of $SL_2$ covariants.

Theorem 4.48 on the previous page looks more on target to me. In any case, nice answer! By the way, a typo: $(\alpha - \gamma)^2 + 4 \beta^2$, not $(\alpha+\gamma)^2 + 4 \beta^2$ in the $2 \times 2$ case.
–
David SpeyerAug 1 '14 at 12:28

Another elementary proof, based on the order structure of symmetric matrices. Let me first recall the basic definitions and facts to avoid misunderstandings: we define $A\ge B$ iff $(A-B)x\cdot x\ge0$ for all $x\in\mathbb{R}^n$). Also, a lemma:

A symmetric matrix $A$, which is positive and invertible, is also definite positive (that is, $A\ge\epsilon I$ for some $\epsilon > 0 \,$).

We may say, equivalently: if $A$ is positive but, for any $\epsilon >0$, the matrix $A-\epsilon I$ is not, then $A$ is not invertible. (A quick proof passes through the square root of $A$: $(Ax\cdot x)=\|A^{1/2} x\|^2 \ge \|A^{-1/2}\|^{-2} \| x\|^2$; one has to construct $A^{1/2}$ before, without diagonalization, of course).

As a consequence, $\alpha^*:=\sup_{|x|=1}(Ax \cdot x)$ is an eigenvalue of $A$, because $ \alpha^*I-A$ is positive and $\alpha^*I-A-\epsilon I$ is not (and $\alpha _ *:=\inf _ {|x|=1}(Ax \cdot x)$ too, for analogous reasons).

The complete diagonalization is then performed inductively, as in other proofs.

Let $M$ be the matrix we're trying to diagonalize. Maximize $\psi(J M J^T)$ over $J$ in $SO(n)$. Since $SO(n)$ is compact, $\psi$ has a maximum value; let $X = JMJ^T$ achieve this maximum. For any skew symmetric matrix $Y$, we compute:
$$\psi \left( \exp(Y) X \exp(-Y) \right) =\psi \left( X + (YX-XY) + O(|Y|^2) \right) = $$
$$\psi(X) + \sum_{i,j} \left(a_{i} Y_{ij} X_{ji} - a_i X_{ij} Y_{ji} \right) +O(|Y|^2) = \psi(X) + 2 \sum_{i<j} (a_i-a_j) Y_{ij} X_{ij} +O(|Y|^2).$$
(Recall that $X$ is symmetric and $Y$ is skew-symmetric.) So
$$\left. \frac{\partial \psi}{\partial Y_{ij}} \right|_{Y=0} = 2 (a_i - a_j) X_{ij}.$$
We see that, at a critical point, all the off diagonal $X_{ij}$ are zero. One can also compute that the Hessian is positive definite only when $X_{11} > X_{22} > \cdots > X_{nn}$. So the maximum occurs at the unique diagonalization for which the eigenvalues appear in order. (If there are repeated eigenvalues, then there is still a unique maximum on the orbit $J M J^T$, but it is achieved by multliple values of $J$, so the Hessian is only positive semi-definite.)

Awesome! I was trying to think of a proof along these lines yesterday.
–
Steven GubkinAug 1 '14 at 12:56

you maximize over both symmetric matrices $X$ and rotation matrices $J= \exp(Y)\in SO(n)$?
–
john mangualAug 1 '14 at 13:58

No, $X$ is the matrix I'm trying to diagonalize. Maximize over $J$.
–
David SpeyerAug 1 '14 at 14:19

Now edited to reduce confusion. I violated one of my basic rules of exposition: Never use the same variable name for two things except when performing induction/recursion. In this case, I had used the same name for the matrix I am trying to diagonalize (now called $M$) and its putative diagonalization (now called $X$).
–
David SpeyerAug 1 '14 at 14:23

@DavidSpeyer I'm curious as to the relation you have in mind between this nice proof and Horn's theorem. So far as I can see Horn himself didn't use this method (of maximizing a function and writing that the derivative in appropriate directions vanishes), but I know at least one proof that does work similarly. (With due apologies ;-)
–
Francois ZieglerAug 1 '14 at 16:30

Just found in Godsil-Royle's Algebraic graph theory: One first proves that two eigenvectors associated with two different eigenvalues are necessarily orthogonal to each other (pretty standard), then observes that if $u$ is eigenvector associated with eigenvalue $\lambda$, then $\bar u$ is eigenvector associated with eigenvalue $\bar\lambda$. Now the eigenvalues $\lambda,\bar\lambda$ cannot be different, for otherwise by the above observation $0=u^T u=\|u\|^2$ although $u\not=0$.

(It does contain complex numbers, but is still amazingly straightforward).

This is what I would call the standard approach (going through operators on ${\mathbb C}^n$) and as such I don't think it really fulfils the requirements of the original question.
–
Yemon ChoiFeb 26 '13 at 23:45

Yes, this is how an operator theorist would do it. But the question was also the existence of an eigenvalue (possibly without the fundamental theorem of algebra). Is there an argument for it too?
–
András BátkaiFeb 27 '13 at 7:26

This is just the details of the first step of Alexander Eremenko's answer (so upvote his answer if you like mine), which I think is by far the most elementary. You only need two facts: A continuous function on a compact set in $R^n$ achieves its maximum (or minimum), and the derivative of a smooth function vanishes at a local maximum. And there's no need for Lagrange multipliers at all.

Let $C$ be any closed annulus centered at $0$.
The function
$$
R(x) = \frac{x\cdot Ax}{x\cdot x},
$$
is continuous on $R^n\backslash\{0\}$ and therefore achieves a maximum on $C$. Since $R$ is homogeneous of degree $0$, any maximum point $x \in C$ is a maximum point on all of $R^n\backslash\{0\}$. Therefore, for any $v \in R^n$, $t = 0$ is a local maximum for the function
$$
f(t) = R(x + tv).
$$
Differentiating this, we get
$$
0 = f'(0) = \frac{2}{x\cdot x}[Ax - R(x) x]\cdot v
$$
This holds for any $v$ and therefore $x$ is an eigenvector of $A$ with eigenvalue $R(x)$.

The fact that real symmetric matrix is ortogonally diagonalizable
can be proved by induction. The crucial part is the start. Namely, the
observation that such a matrix has at least one (real) eigenvalue. But
this can be done in three steps.

(1) An easy observation (using direct matrix multiplication) shows that all
columns of a matrix $\mathbf{A}\in\mathbb{R}_{m\times n}$ are orthogonal to any vector
$z\in\mathbb{R}_{m\times 1}$ iff $z$ belongs to the null space of the transpose
$\mathbf{A}^{\sf T}$, i.e. $\mathcal{N}(\mathbf{A}^{\sf T})=\mathcal{R}(\mathbf{A})^{\perp}$.

(2) If $\mathbf{S}^{\sf T}=\mathbf{S}$ and $\mathbf{S}x \neq 0$ for every $x\neq 0$, then
the dot product $\langle\mathbf{S}x,x\rangle\neq 0$ for any $x\neq 0$ as well. Otherwise,
if $\langle \mathbf{S}z,z\rangle=0$ for some $z\neq 0$, then we have, using (1),
$z\in\mathcal{R}(\mathbf{S})^{\perp}=\mathcal{N}(\mathbf{S}^{\sf T})=
\mathcal{N}(\mathbf{S})$, i.e. a contradiction $z\ne0$ and $\mathbf{S}z=0$.

(3) If matrix $\mathbf{A}=\mathbf{A}^{\sf T}\in\mathbb{R}_{n\times n}$ has no (real) eigenvalue,
then $(t\mathbf{I}-\mathbf{A})x\neq 0$ for any $x\neq 0$ and every $t\in\mathbb{R}$.
Consequently, according to (2), we have $\langle(t\mathbf{I}-\mathbf{A})y,y\rangle\neq 0$
for fixed $y\neq 0$ and $t\in\mathbb{R}$. Therefore
$t\|y\|^2 -\langle\mathbf{A}y,y\rangle \neq 0$ for every $t\in\mathbb{R}$, which is impossible.

Thank you for the comment. Unfortunately, the relation $z\perp\mathbf{S}z$ does not imply $z\perp y$ for \underline{every} $y\in\mathcal{R}(\mathbf{S})$. I apologize for the false proof. Vito Lampret
–
VitoJan 13 '13 at 13:48

The idea is simple, define $\Sigma(A)=\sum_{i=1}^n\sum_{j=i+1}^n a_{ij}^2$ for $A=(a_{ij})$ a symmetric real matrix. Then minimize the function $O(n)\ni J \mapsto \Sigma(J^TAJ)$ over the orthogonal group $O(n)$. The function is continuous and bounded below by zero, and $O(n)$ is compact, so the minimum is attained. But it can not be strictly positive, because if there is an $a_{ij}\not=0$, $i\not=j$, then you can make it zero by a rotation that acts only on the $i$th and $j$th row and column, so that it decreases $\Sigma$ (this is a simple little calculation with $2\times 2$ matrices). Therefore the minimum is zero and it is attained in a matrix $J$ for which $J^TAJ$ is diagonal.

The eigenvalues of $A$ are now the (diagonal) entries of $J^TAJ$. No complex numbers are used, but you have to know that the minimum exists. We get the existence of an orthonormal basis consisting of eigenvectors with real eigenvalues.

To add a little more detail: The total energy $\frac 12\sum a_{ij}$, which is the sum of the energy on the diagonal and $\Sigma$, is invariant by orthogonal conjugation, so we want to move it to the diagonal. When you apply a rotation $J$ in the plane spanned by the canonic vectors $e_i$ and $e_j$, which only affects the $i$th and $j$th rows and columns, the resulting coefficients $ii$, $ij$, $ji$, $jj$ of $J^tAJ$ depend only on the same coefficients of $A$, so the problem is reduced to increasing the energy on the diagonal of a $2\times 2$ matrix.
–
Marcos CossariniJan 13 '13 at 2:00

Step 1: show that if $A$ is a real symmetric matrix, there is an orthogonal matrix $L$ such that $A=LHL^T$, where $H$ is tridiagonal and its off-diagonal entries are non-negative.
(Apply Gram-Schmidt to sets of vectors of the form $\{x,Ax,\ldots,A^mx\}$, or use Householder transformations, which is the same thing.)

Step 2. We need to show that the eigenvalues of tridiagonal matrices with non-negative off-diagonal entries are real. We can reduce to the case where $H$ is indecomposable. Assume it is $n\times n$ and let $\phi_{n-r}$ the the characteristic polynomial of the matrix we get by deleting the first $r$ rows and columns of $H$. Then
$$
\phi_{n-r+1} = (t-a_r)\phi_{n-r} -b_r \phi_{n-r-1},
$$
where $b>0$. Now prove by induction on $n$ that the zeros of $\phi_{n-r}$ are real and are interlaced by the zeros of $\phi_{n-r-1}$. The key here is to observe that this induction hypothesis is equivalent to the claim that all poles and zeroes of $\phi_{n-r-1}/\phi_{n-r}$ are real, and in its partial fraction expansion all numerators are positive. From this it follows that the derivative of this rational function is negative everywhere it is defined and hence, between each consecutive pair of zeros of $\phi_{n-r-1}$ there must be a real zero of $\phi_{n-r}$.

Might it be done using eigenvalue interlacing on the original matrix rather than reducing to tridiagonal form first?
–
Brendan McKayJan 13 '13 at 5:54

I can do it if I am allowed to use spectral decomposition. Write $A$ as $A_1 + bb^T$, where the first row and column of $A_1$ are both zero. (If needed replace $A$ by $-A$.) Then $$ \det(tI-A) = \det(tI-A_1-bb^T) = \det(tI-A_1)\det(I-(tI-A)^{-1}bb^T) $$ and since $\det(I-uv^T)=1-v^Tu$, we get that $\det(tI-A)/\det(tI-A_1)$ is equal to $1-b^T(tI-A_1)^{-1}b$. Now use spectral decomposition to deduce that the numerators in $b^T(tI-A_1)^{-1}b$ are real. (This argument is logical, but it might not be a lot of fun in a classroom.)
–
Chris GodsilJan 13 '13 at 15:38