Section O Orthogonality

In this section we define a couple more operations with vectors, and prove a few
theorems. At first blush these definitions and results will not appear central to
what follows, but we will make use of them at key points in the remainder of the
course (such as Section MINM, Section OD). Because we have chosen to use
ℂ as our
set of scalars, this subsection is a bit more, uh, … complex than it would be for the
real numbers. We’ll explain as we go along how things get easier for the real numbers
ℝ. If you
haven’t already, now would be a good time to review some of the basic properties
of arithmetic with complex numbers described in Section CNO. With that done,
we can extend the basics of complex number arithmetic to our study of vectors in
{ℂ}^{m}.

Subsection CAV: Complex Arithmetic and Vectors

We know how the addition and multiplication of complex
numbers is employed in defining the operations for vectors in
{ℂ}^{m}
(Definition CVA and Definition CVSM). We can also extend the idea of the
conjugate to vectors.

In the case where the entries of our vectors are all real numbers (as in the
second part of Example CSIP), the computation of the inner product may look
familiar and be known to you as a dot product or scalar product. So you can
view the inner product as a generalization of the scalar product to vectors from
{ℂ}^{m} (rather
than {ℝ}^{m}).

Also, note that we have chosen to conjugate the entries of the second vector
listed in the inner product, while many authors choose to conjugate entries from
the first component. It really makes no difference which choice is made, it just
requires that subsequent definitions and theorems are consistent with the
choice. You can study the conclusion of Theorem IPAC as an explanation
of the magnitude of the difference that results from this choice. But be
careful as you read other treatments of the inner product or its use in
applications, and be sure you know ahead of time which choice has been
made.

There are several quick theorems we can now prove, and they will each be
useful later.

Proof The proofs of the two parts are very similar, with the second one
requiring just a bit more effort due to the conjugation that occurs. We will prove
part 2 and you can prove part 1 (Exercise O.T10).

Proof The proofs of the two parts are very similar, with the second one
requiring just a bit more effort due to the conjugation that occurs. We will prove
part 2 and you can prove part 1 (Exercise O.T11).

Subsection N: Norm

If treating linear algebra in a more geometric fashion, the length of a
vector occurs naturally, and is what you would expect from its name.
With complex numbers, we will define a similar function. Recall that if
c is a complex
number, then \left \vert c\right \vert
denotes its modulus (Definition MCN).

Definition NVNorm of a Vector The norm of the vector u
is the scalar quantity in {ℂ}^{}

Since each modulus is squared, every term is positive, and the sum must
also be positive. (Notice that in general the inner product is a complex
number and cannot be compared with zero, but in the special case of
\left \langle u,\kern 1.95872pt u\right \rangle the result is a
real number.) The phrase, “with equality if and only if” means that we want to show that
the statement \left \langle u,\kern 1.95872pt u\right \rangle = 0
(i.e. with equality) is equivalent (“if and only if”) to the statement
u = 0.

If u = 0,
then it is a straightforward computation to see that
\left \langle u,\kern 1.95872pt u\right \rangle = 0. In the other direction,
assume that \left \langle u,\kern 1.95872pt u\right \rangle = 0.
As before, \left \langle u,\kern 1.95872pt u\right \rangle
is a sum of moduli. So we have

Now we have a sum of squares equaling zero, so each term must be zero. Then by similar
logic, \left \vert {\left [u\right ]}_{i}\right \vert = 0 will
imply that {\left [u\right ]}_{i} = 0,
since 0 + 0i
is the only complex number with zero modulus. Thus every entry of
u is zero
and so u = 0, as
desired. ■

The results contained in Theorem PIP are summarized by saying “the inner
product is positive definite.”

Subsection OV: Orthogonal Vectors

“Orthogonal” is a generalization of “perpendicular.” You may have used
mutually perpendicular vectors in a physics class, or you may recall from a
calculus class that perpendicular vectors have a zero dot product. We will
now extend these ideas into the realm of higher dimensions and complex
scalars.

We extend this definition to whole sets by requiring vectors to be pairwise
orthogonal. Despite using the same word, careful thought about what objects you
are using will eliminate any source of confusion.

Definition OSVOrthogonal Set of Vectors Suppose that S = \left \{{u}_{1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{n}\right \} is a
set of vectors from {ℂ}^{m}.
Then S
is an orthogonal set if every pair of different vectors from
S is orthogonal,
that is \left \langle {u}_{i},\kern 1.95872pt {u}_{j}\right \rangle = 0
whenever i\mathrel{≠}j.
△

We now define the prototypical orthogonal set, which we will reference
repeatedly.

Notice that {e}_{j} is
identical to column j
of the m × m identity
matrix {I}_{m}
(Definition IM). This observation will often be useful. It is not hard to see that
the set of standard unit vectors is an orthogonal set. We will reserve the notation
{e}_{i} for
these vectors.

is an orthogonal set. Since the inner product is anti-commutative
(Theorem IPAC) we can test pairs of different vectors in any order. If the result is
zero, then it will also be zero if the inner product is computed in the opposite
order. This means there are six pairs of different vectors to use in an inner
product computation. We’ll do two and you can practice your inner products on
the other four.

So far, this section has seen lots of definitions, and lots of theorems
establishing un-surprising consequences of those definitions. But here is our first
theorem that suggests that inner products and orthogonal vectors have some
utility. It is also one of our first illustrations of how to arrive at linear
independence as the conclusion of a theorem.

So we conclude that {α}_{i} = 0
for all 1 ≤ i ≤ n in any relation of
linear dependence on S.
But this says that S
is a linearly independent set since the only way to form a relation
of linear dependence is the trivial way (Definition LICV). Boom!
■

Subsection GSP: Gram-Schmidt Procedure

The Gram-Schmidt Procedure is really a theorem. It says that if we begin with a linearly
independent set of p
vectors, S,
then we can do a number of calculations with these vectors and produce an orthogonal
set of p
vectors, T,
so that \left \langle S\right \rangle = \left \langle T\right \rangle .
Given the large number of computations involved, it is indeed a procedure to do
all the necessary computations, and it is best employed on a computer. However,
it also has value in proofs where we may on occasion wish to replace a linearly
independent set by an orthogonal set.

This is our first occasion to use the technique of “mathematical induction” for
a proof, a technique we will see again several times, especially in Chapter D. So
study the simple example described in Technique I first.

Proof We will prove the result by using induction on
p (Technique I). To begin,
we prove that T has the
desired properties when p = 1.
In this case {u}_{1} = {v}_{1}
and T = \left \{{u}_{1}\right \} = \left \{{v}_{1}\right \} = S.
Because S and
T are equal,
\left \langle S\right \rangle = \left \langle T\right \rangle . Equally trivial,
T is an orthogonal
set. If {u}_{1} = 0,
then S
would be a linearly dependent set, a contradiction.

and let T = {T}^{′}∪\left \{{u}_{
p}\right \}. We need
to now show that T
has several properties by building on what we know about
{T}^{′}. But
first notice that the above equation has no problems with the denominators
(\left \langle {u}_{i},\kern 1.95872pt {u}_{i}\right \rangle ) being zero,
since the {u}_{i}
are from {T}^{′},
which is composed of nonzero vectors.

The term {a}_{p}{u}_{p} is a linear
combination of vectors from {T}^{′}
and the vector {v}_{p},
while the remaining terms are a linear combination of vectors from
{T}^{′}. Since
\left \langle {T}^{′}\right \rangle = \left \langle {S}^{′}\right \rangle , any term that is a
multiple of a vector from {T}^{′}
can be rewritten as a linear combination of vectors from
{S}^{′}. The remaining
term {a}_{p}{v}_{p} is a multiple
of a vector in S.
So we see that x
can be rewritten as a linear combination of vectors from
S,
i.e. x ∈\left \langle S\right \rangle .

Rearrange our defining equation for
{u}_{p} by solving
for {v}_{p}. Then
the term {a}_{p}{v}_{p}
is a multiple of a linear combination of elements of
T.
The remaining terms are a linear combination of
{v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{p−1}, hence an
element of \left \langle {S}^{′}\right \rangle = \left \langle {T}^{′}\right \rangle .
Thus these remaining terms can be written as a linear combination of the vectors
in {T}^{′}. So
y is a linear combination
of vectors from T,
i.e. y ∈\left \langle T\right \rangle .

The elements of {T}^{′} are
nonzero, but what about {u}_{p}?
Suppose to the contrary that {u}_{p} = 0,

Since \left \langle {S}^{′}\right \rangle = \left \langle {T}^{′}\right \rangle we can
write the vectors {u}_{1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{p−1}
on the right side of this equation in terms of the vectors
{v}_{1},\kern 1.95872pt {v}_{2},\kern 1.95872pt {v}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {v}_{p−1} and we then
have the vector {v}_{p}
expressed as a linear combination of the other
p − 1 vectors in
S, implying
that S is a
linearly dependent set (Theorem DLDS), contrary to our lone hypothesis about
S.

Finally, it is a simple matter to establish that
T is an
orthogonal set, though it will not appear so simple looking. Think about your objects
as you work through the following — what is a vector and what is a scalar. Since
{T}^{′}
is an orthogonal set by induction, most pairs of elements in
T are
already known to be orthogonal. We just need to test “new” inner products, between
{u}_{p} and
{u}_{i}, for
1 ≤ i ≤ p − 1. Here
we go, using summation notation,

is an orthogonal set (which you can check) of nonzero vectors and
\left \langle T\right \rangle = \left \langle S\right \rangle (all
by Theorem GSP). Of course, as a by-product of orthogonality, the set
T is also linearly independent
(Theorem OSLI). ⊠

One final definition related to orthogonal vectors.

Definition ONSOrthoNormal Set Suppose S = \left \{{u}_{1},\kern 1.95872pt {u}_{2},\kern 1.95872pt {u}_{3},\kern 1.95872pt \mathop{\mathop{…}},\kern 1.95872pt {u}_{n}\right \} is an orthogonal
set of vectors such that \left \Vert {u}_{i}\right \Vert = 1
for all 1 ≤ i ≤ n. Then
S is an orthonormal
set of vectors. △

Once you have an orthogonal set, it is easy to convert it to an orthonormal set
— multiply each vector by the reciprocal of its norm, and the resulting vector will
have norm 1. This scaling of each vector will not affect the orthogonality
properties (apply Theorem IPSM).

to an orthogonal set via the Gram-Schmidt Process (Theorem GSP) and then
scale the vectors to norm 1 to create an orthonormal set. You should get the same set
you would if you scaled the orthogonal set of Example AOS to become an orthonormal
set. ⊠

It is crazy to do all but the simplest and smallest instances of the
Gram-Schmidt procedure by hand. Well, OK, maybe just once or twice to get a
good understanding of Theorem GSP. After that, let a machine do the
work for you. That’s what they are for.See: Computation GSP.MMA
.

T30 Suppose that the set S
in the hypothesis of Theorem GSP is not just linearly independent, but is also orthogonal.
Prove that the set T
created by the Gram-Schmidt procedure is equal to
S.
(Note that we are getting a stronger conclusion than
\left \langle T\right \rangle = \left \langle S\right \rangle — the conclusion
is that T = S.)
In other words, it is pointless to apply the Gram-Schmidt procedure to a set that
is already orthogonal. Contributed by Steve Canfield

Annotated Acronyms V: Vectors

Theorem VSPCVThese are the fundamental rules for working with the addition, and scalar
multiplication, of column vectors. We will see something very similar in the next
chapter (Theorem VSPM) and then this will be generalized into what is arguably
our most important definition, Definition VS.

Theorem SLSLCVector addition and scalar multiplication are the two fundamental operations on
vectors, and linear combinations roll them both into one. Theorem SLSLC
connects linear combinations with systems of equations. This one we will see often
enough that it is worth memorizing.

Theorem PSPHSThis theorem is interesting in its own right, and sometimes the vaugeness surrounding the
choice of z
can seem mysterious. But we list it here because we will see an important theorem
in Section ILT which will generalize this result (Theorem KPI).

Theorem LIVRNIf you have a set of column vectors, this is the fastest computational approach to determine
if the set is linearly independent. Make the vectors the columns of a matrix, row-reduce,
compare r
and n.
That’s it — and you always get an answer. Put this one in your toolkit.

Theorem BNSWe will have several theorems (all listed in these “Annotated Acronyms” sections)
whose conclusions will provide a linearly independent set of vectors whose span
equals some set of interest (the null space here). While the notation in
this theorem might appear a gruesome, in practice it can become very
routine to apply. So practice this one — we’ll be using it all through the
book.

Theorem BSAs promised, another theorem that provides a linearly independent set of vectors
whose span equals some set of interest (a span now). You can use this one to clean
up any span.