A FIRST COURSE

IN LINEAR ALGEBRAAn Open Text by Ken KuttlerLYRYX SERVICE COURSE SOLUTIONS ADAPTATION

LYRYX LEARNINGMATHEMATICS - LINEAR ALGEBRA IANYTIME- ANYONE

*Creative Commons Attribution License (CC BY)

This text, including the art and illustrations, are available under the Creative CommonsAttribution license (CC BY), allowing anyone to reuse, revise, remix and redistribute thetext.

a r n

LYRYX SERVICE COURSE SOLUTIONS

This is an open text supported by Lyryx Service Course Solutions (LSCS) products &services.While there is no requirement that users of the book do anything more than download thepdf file and use the text for educational purposes, the text is aligned with the LSCS productsand services offering the following benefits.OPEN TEXTThe text can be downloaded in electronic format, printed, and can be distributed to studentsat no cost.In collaboration with the authors, Lyryx will also adapt the content and provide customeditions for specific courses who adopt Lyryx Service Course Solutions, and Lyryx will alsoprovide the original TeX files if instructors wish to adapt certain sections themselves.ONLINE ASSESSMENTLyryx has developed corresponding formative online assessment for homework and quizpurposes.These are genuine questions for the subject and adapted to the content. Student answersare carefully analyzed by the system and personalized feedback is immediately provided tohelp students improve on their work.Lyryx provides all the tools required to manage your online assessment including studentgrade reports and student performance statistics.INSTRUCTOR SUPPLEMENTSAmong other things the book is accompanied by a full set of beamer slides and a partialsoultion manual.SUPPORTLyryx provides timely support to both instructors and students.Starting from the course preparation time to beyond the end of the course, the Lyryx staff isavailable 7 days/week to provide assistance. This may include adapting the text, managingmultiple sections of the course, providing course supplements, as well as timely assistance tostudents with registration, navigation, and daily organization.

Contact Lyryx!solutions@lyryx.com

A First Course in Linear Algebra

Ken Kuttler

Version 2014 Revision A

2012-2014: The content of text has been modified and adapted by Stephanie Keyowski withthe addition of new material and several images. All new content (text and images) isreleased under the same license as noted above.

A Some Prerequisite Topics

PrefaceLinear Algebra: A First Course presents an introduction to the fascinating subject oflinear algebra. As the title suggests, this text is designed as a first course in linear algebrafor students who have a reasonable understanding of basic algebra. Major topics of linearalgebra are presented in detail, with proofs of important theorems provided. Connectionsto additional topics covered in advanced courses are introduced, in an effort to assist thosestudents who are interested in continuing on in linear algebra.Each chapter begins with a list of desired outcomes which a student should be able toachieve upon completing the chapter. Throughout the text, examples and diagrams are givento reinforce ideas and provide guidance on how to approach various problems. Suggestedexercises are given at the end of each section, and students are encouraged to work througha selection of these exercises.A brief review of complex numbers is given, which can serve as an introduction to anyoneunfamiliar with the topic.Linear algebra is a wonderful and interesting subject, which should not be limited to achallenge of correct arithmetic. The use of a computer algebra system can be a great helpin long and difficult computations. Some of the standard computations of linear algebra areeasily done by the computer, including finding the reduced row-echelon form. While the useof a computer system is encouraged, it is not meant to be done without the student havingan understanding of the computations.

10

1. Systems of Equations1.1 Systems of Equations, Geometry

OutcomesA. Relate the types of solution sets of a system of two (three) variables to theintersections of lines in a plane (the intersection of planes in three space)As you may remember, linear equations like 2x + 3y = 6 can be graphed as straight linesin the coordinate plane. We say that this equation is in two variables, in this case x andy. Suppose you have two such equations, each of which can be graphed as a straight line,and consider the resulting graph of two lines. What would it mean if there exists a pointof intersection between the two lines? This point, which lies on both graphs, gives x and yvalues for which both equations are true. In other words, this point gives the ordered pair(x, y) that satisfy both equations. If the point (x, y) is a point of intersection, we say that(x, y) is a solution to the two equations. In linear algebra, we often are concerned withfinding the solution(s) to a system of equations, if such solutions exist. First, we considergraphical representations of solutions and later we will consider the algebraic methods forfinding solutions.When looking for the intersection of two lines in a graph, several situations may arise.The following picture demonstrates the possible situations when considering two equations(two lines in the graph) involving two variables.yyy

xOne Solution

xNo Solutions

xInfinitely Many Solutions

In the first diagram, there is a unique point of intersection, which means that there is onlyone (unique) solution to the two equations. In the second, there are no points of intersectionand no solution. When no solution exists, this means that the two lines are parallel and theynever intersect. The third situation which can occur, as demonstrated in diagram three, isthat the two lines are really the same line. For example, x + y = 1 and 2x + 2y = 2 areequations which when graphed yield the same line. In this case there are infinitely many11

points which are solutions of these two equations, as every ordered pair which is on thegraph of the line satisfies both equations. When considering linear systems of equations,there are always three types of solutions possible; exactly one (unique) solution, infinitelymany solutions, or no solution.Example 1.1: A Graphical SolutionUse a graph to find the solution to the following system of equationsx+y = 3yx=5Solution. Through graphing the above equations and identifying the point of intersection, wecan find the solution(s). Remember that we must have either one solution, infinitely many, orno solutions at all. The following graph shows the two equations, as well as the intersection.Remember, the point of intersection represents the solution of the two equations, or the(x, y) which satisfy both equations. In this case, there is one point of intersection at (1, 4)which means we have one unique solution, x = 1, y = 4.y6(x, y) = (1, 4)

x4

In the above example, we investigated the intersection point of two equations in twovariables, x and y. Now we will consider the graphical solutions of three equations in twovariables.Consider a system of three equations in two variables. Again, these equations can begraphed as straight lines in the plane, so that the resulting graph contains three straightlines. Recall the three possible types of solutions; no solution, one solution, and infinitelymany solutions. There are now more complex ways of achieving these situations, due to thepresence of the third line. For example, you can imagine the case of three intersecting lineshaving no common point of intersection. Perhaps you can also imagine three intersectinglines which do intersect at a single point. These two situations are illustrated below.12

xOne Solution

No Solution

Consider the first picture above. While all three lines intersect with one another, thereis no common point of intersection where all three lines meet at one point. Hence, there isno solution to the three equations. Remember, a solution is a point (x, y) which satisfies allthree equations. In the case of the second picture, the lines intersect at a common point.This means that there is one solution to the three equations whose graphs are the given lines.You should take a moment now to draw the graph of a system which results in three parallellines. Next, try the graph of three identical lines. Which type of solution is represented ineach of these graphs?We have now considered the graphical solutions of systems of two equations in twovariables, as well as three equations in two variables. However, there is no reason to limitour investigation to equations in two variables. We will now consider equations in threevariables.You may recall that equations in three variables, such as 2x + 4y 5z = 8, form aplane. Above, we were looking for intersections of lines in order to identify any possiblesolutions. When graphically solving systems of equations in three variables, we look forintersections of planes. These points of intersection give the (x, y, z) that satisfy all theequations in the system. What types of solutions are possible when working with threevariables? Consider the following picture involving two planes, which are given by twoequations in three variables.

Notice how these two planes intersect in a line. This means that the points (x, y, z) onthis line satisfy both equations in the system. Since the line contains infinitely many points,this system has infinitely many solutions.It could also happen that the two planes fail to intersect. However, is it possible to havetwo planes intersect at a single point? Take a moment to attempt drawing this situation, andconvince yourself that it is not possible! This means that when we have only two equationsin three variables, there is no way to have a unique solution! Hence, the types of solutionspossible for two equations in three variables are no solution or infinitely many solutions.13

Now imagine adding a third plane. In other words, consider three equations in threevariables. What types of solutions are now possible? Consider the following diagram.

New Plane

In this diagram, there is no point which lies in all three planes. There is no intersectionbetween all planes so there is no solution. The picture illustrates the situation in which theline of intersection of the new plane with one of the original planes forms a line parallel tothe line of intersection of the first two planes. However, in three dimensions, it is possiblefor two lines to fail to intersect even though they are not parallel. Such lines are called skewlines.Recall that when working with two equations in three variables, it was not possible tohave a unique solution. Is it possible when considering three equations in three variables?In fact, it is possible, and we demonstrate this situation in the following picture.New Plane

In this case, the three planes have a single point of intersection. Can you think of othertypes of solutions possible? Another is that the three planes could intersect in a line, resultingin infinitely many solutions, as in the following diagram.

We have now seen how three equations in three variables can have no solution, a uniquesolution, or intersect in a line resulting in infinitely many solutions. It is also possible thatthe three equations graph the same plane, which also leads to infinitely many solutions.14

You can see that when working with equations in three variables, there are many moreways to achieve the different types of solutions than when working with two variables. Itmay prove enlightening to spend time imagining (and drawing) many possible scenarios, andyou should take some time to try a few.You should also take some time to imagine (and draw) graphs of systems in more thanthree variables. Equations like x + y 2z + 4w = 8 with more than three variables areoften called hyper-planes. You may soon realize that it is tricky to draw the graphs ofhyper-planes! Through the tools of linear algebra, we can algebraically examine these typesof systems which are difficult to graph. In the following section, we will consider thesealgebraic tools.

1.1.1. Exercises1. Graphically, find the point (x1 , y1 ) which lies on both lines, x + 3y = 1 and 4x y = 3.That is, graph each line and see where they intersect.2. Graphically, find the point of intersection of the two lines 3x + y = 3 and x + 2y = 1.That is, graph each line and see where they intersect.3. You have a system of k equations in two variables, k 2. Explain the geometricsignificance of(a) No solution.(b) A unique solution.(c) An infinite number of solutions.

15

1.2 Systems Of Equations,

Algebraic Procedures

OutcomesA. Use elementary operations to find the solution to a linear system of equations.B. Find the row-echelon form and reduced row-echelon form of a matrix.C. Determine whether a system of linear equations has no solution, a unique solutionor an infinite number of solutions from its row-echelon form.D. Solve a system of equations using Gaussian Elimination and Gauss-Jordan Elimination.E. Model a physical system with linear equations and then solve.We have taken an in depth look at graphical representations of systems of equations, aswell as how to find possible solutions graphically. Our attention now turns to working withsystems algebraically.Definition 1.2: System of Linear EquationsA system of linear equations is a list of equations,a11 x1 + a12 x2 + + a1n xn = b1a21 x1 + a22 x2 + + a2n xn = b2...am1 x1 + am2 x2 + + amn xn = bmwhere aij and bj are real numbers. The above is a system of m equations in the nvariables, x1 , x2 , xn . Written more simply in terms of summation notation, theabove can be written in the formnXj=1

aij xj = bi , i = 1, 2, 3, , m

The relative size of m and n is not important here. Notice that we have allowed aij andbj to be any real number. We can also call these numbers scalars . We will use this termthroughout the text, so keep in mind that the term scalar just means that we are workingwith real numbers.

16

Now, suppose we have a system where bi = 0 for all i. In other words every equationequals 0. This is a special type of system.Definition 1.3: Homogeneous System of EquationsA system of equations is called homogeneous if each equation in the system is equalto 0. A homogeneous system has the forma11 x1 + a12 x2 + + a1n xn = 0a21 x1 + a22 x2 + + a2n xn = 0...am1 x1 + am2 x2 + + amn xn = 0where aij are scalars and xi are variables.Recall from the previous section that our goal when working with systems of linearequations was to find the point of intersection of the equations when graphed. In otherwords, we looked for the solutions to the system. We now wish to find these solutionsalgebraically. We want to find values for x1 , , xn which solve all of the equations. If sucha set of values exists, we call (x1 , , xn ) the solution set.Recall the above discussions about the types of solutions possible. We will see thatsystems of linear equations will have one unique solution, infinitely many solutions, or nosolution. Consider the following definition.Definition 1.4: Consistent and Inconsistent SystemsA system of linear equations is called consistent if there exists at least one solution.It is called inconsistent if there is no solution.If you think of each equation as a condition which must be satisfied by the variables,consistent would mean there is some choice of variables which can satisfy all the conditions.Inconsistent would mean there is no choice of the variables which can satisfy all of theconditions.The following sections provide methods for determining if a system is consistent or inconsistent, and finding solutions if they exist.

1.2.1. Elementary Operations

We begin this section with an example. Recall from Example 1.1 that the solution to thegiven system was (x, y) = (1, 4).

17

Example 1.5: Verifying an Ordered Pair is a Solution

Algebraically verify that (x, y) = (1, 4) is a solution to the following system ofequations.x+y = 3yx=5Solution. By graphing these two equations and identifying the point of intersection, wepreviously found that (x, y) = (1, 4) is the unique solution.We can verify algebraically by substituting these values into the original equations, andensuring that the equations hold. First, we substitute the values into the first equation andcheck that it equals 3.x + y = (1) + (4) = 3This equals 3 as needed, so we see that (1, 4) is a solution to the first equation. Substitutingthe values into the second equation yieldsy x = (4) (1) = 4 + 1 = 5which is true. For (x, y) = (1, 4) each equation is true and therefore, this is a solution tothe system.Now, the interesting question is this: If you were not given these numbers to verify, howcould you algebraically determine the solution? Linear algebra gives us the tools needed toanswer this question. The following basic operations are important tools that we will utilize.Definition 1.6: Elementary OperationsElementary operations are those operations consisting of the following.1. Interchange the order in which the equations are listed.2. Multiply any equation by a nonzero number.3. Replace any equation with itself added to a multiple of another equation.It is important to note that none of these operations will change the set of solutions ofthe system of equations. In fact, elementary operations are the key tool we use in linearalgebra to find solutions to systems of equations.Consider the following example.

18

Example 1.7: Effects of an Elementary Operation

Show that the system

has the same solution as the system

x+y = 72x y = 8x+y = 73y = 6

Solution. Notice that the second system has been obtained by taking the second equationof the first system and adding -2 times the first equation, as follows:2x y + (2)(x + y) = 8 + (2)(7)By simplifying, we obtain3y = 6which is the second equation in the second system. Now, from here we can solve for y andsee that y = 2. Next, we substitute this value into the first equation as followsx+y = x+2 =7Hence x = 5 and so (x, y) = (5, 2) is a solution to the second system. We want to check if(5, 2) is also a solution to the first system. We check this by substituting (x, y) = (5, 2) intothe system and ensuring the equations are true.x + y = (5) + (2) = 72x y = 2 (5) (2) = 8Hence, (5, 2) is also a solution to the first system.This example illustrates how an elementary operation applied to a system of two equationsin two variables does not affect the solution set. However, a linear system may involve manyequations and many variables and there is no reason to limit our study to small systems.For any size of system in any number of variables, the solution set is still the collectionof solutions to the equations. In every case, the above operations of Definition 1.6 do notchange the set of solutions to the system of linear equations.In the following theorem, we use the notation Ei to represent an equation, while bi denotesa constant.

19

Theorem 1.8: Elementary Operations and Solutions

Suppose you have a system of two linear equationsE1 = b1E2 = b2

(1.1)

Then the following systems have the same solution set as 1.1:1.

E2 = b2E1 = b1

(1.2)

E1 = b1kE2 = kb2

(1.3)

E1 = b1E2 + kE1 = b2 + kb1

(1.4)

2.

for any scalar k, provided k 6= 0.

3.

for any scalar k (including k = 0).

Recall the elementary operations that we used to modify the system in the solution to theexample. First, we added (2) times the first equation to the second equation. In terms ofTheorem 1.8, this action is given byE2 + (2) E1 = b2 + (2) b1

or2x y + (2) (x + y) = 8 + (2) 7This gave us the second system in Example 1.7, given byE1 = b1E2 + (2) E1 = b2 + (2) b1From this point, we were able to find the solution to the system. Theorem 1.8 tells usthat the solution we found is in fact a solution to the original system.We will now prove Theorem 1.8.Proof.

20

1. The proof that the systems 1.1 and 1.2 have the same solution set is as follows. Supposethat (x1 , , xn ) is a solution to E1 = b1 , E2 = b2 . We want to show that this is asolution to the system in 1.2 above. This is clear, because the system in 1.2 is theoriginal system, but listed in a different order. Changing the order does not effect thesolution set, so (x1 , , xn ) is a solution to 1.2.2. Next we want to prove that the systems 1.1 and 1.3 have the same solution set. That isE1 = b1 , E2 = b2 has the same solution set as the system E1 = b1 , kE2 = kb2 providedk 6= 0. Let (x1 , , xn ) be a solution of E1 = b1 , E2 = b2 ,. We want to show that itis a solution to E1 = b1 , kE2 = kb2 . Notice that the only difference between these twosystems is that the second involves multiplying the equation, E2 = b2 by the scalark. Recall that when you multiply both sides of an equation by the same number, thesides are still equal to each other. Hence if (x1 , , xn ) is a solution to E2 = b2 , thenit will also be a solution to kE2 = kb2 . Hence, (x1 , , xn ) is also a solution to 1.3.

the equation kE2 = kb2 by the scalar 1/k, which is possible only because we haverequired that k 6= 0. Just as above, this action preserves equality and we obtain theequation E2 = b2 . Hence (x1 , , xn ) is also a solution to E1 = b1 , E2 = b2 .

3. Finally, we will prove that the systems 1.1 and 1.4 have the same solution set. We willshow that any solution of E1 = b1 , E2 = b2 is also a solution of 1.4. Then, we will showthat any solution of 1.4 is also a solution of E1 = b1 , E2 = b2 . Let (x1 , , xn ) be asolution to E1 = b1 , E2 = b2 . Then in particular it solves E1 = b1 . Hence, it solves thefirst equation in 1.4. Similarly, it also solves E2 = b2 . By our proof of 1.3, it also solveskE1 = kb1 . Notice that if we add E2 and kE1 , this is equal to b2 + kb1 . Therefore, if(x1 , , xn ) solves E1 = b1 , E2 = b2 it must also solve E2 + kE1 = b2 + kb1 .Now suppose (x1 , , xn ) solves the system E1 = b1 , E2 + kE1 = b2 + kb1 . Thenin particular it is a solution of E1 = b1 . Again by our proof of 1.3, it is also asolution to kE1 = kb1 . Now if we subtract these equal quantities from both sides ofE2 + kE1 = b2 + kb1 we obtain E2 = b2 , which shows that the solution also satisfiesE1 = b1 , E2 = b2 .

Stated simply, the above theorem shows that the elementary operations do not changethe solution set of a system of equations.We will now look at an example of a system of three equations and three variables.Similarly to the previous examples, the goal is to find values for x, y, z such that each of thegiven equations are satisfied when these values are substituted in.

Example 1.9: Solving a System of Equations with

Solution. We can relate this system to Theorem 1.8 above. In this case, we haveE1 = x + 3y + 6z, b1 = 25E2 = 2x + 7y + 14z, b2 = 58E3 = 2y + 5z,b3 = 19Theorem 1.8 claims that if we do elementary operations on this system, we will not changethe solution set. Therefore, we can solve this system using the elementary operations givenin Definition 1.6. First, replace the second equation by (2) times the first equation addedto the second. This yields the systemx + 3y + 6z = 25y + 2z = 82y + 5z = 19

(1.6)

Now, replace the third equation with (2) times the second added to the third. This yieldsthe systemx + 3y + 6z = 25y + 2z = 8(1.7)z=3At this point, we can easily find the solution. Simply take z = 3 and substitute this backinto the previous equation to solve for y, and similarly to solve for x.x + 3y + 6 (3) = x + 3y + 18 = 25y + 2 (3) = y + 6 = 8z=3The second equation is nowy+6=8You can see from this equation that y = 2. Therefore, we can substitute this value into thefirst equation as follows:x + 3 (2) + 18 = 25By simplifying this equation, we find that x = 1. Hence, the solution to this system is(x, y, z) = (1, 2, 3). This process is called back substitution.

22

Alternatively, in 1.7 you could have continued as follows. Add (2) times the thirdequation to the second and then add (6) times the second to the first. This yieldsx + 3y = 7y=2z=3Now add (3) times the second to the first. This yieldsx=1y=2z=3a system which has the same solution set as the original system. This avoided back substitution and led to the same solution set. It is your decision which you prefer to use, as bothmethods lead to the correct solution, (x, y, z) = (1, 2, 3).

1.2.2. Gaussian Elimination

The work we did in the previous section will always find the solution to the system. In thissection, we will explore a less cumbersome way to find the solutions. First, we will representa linear system with an augmented matrix. A matrix is simply a rectangular array ofnumbers. The size or dimension of a matrix is defined as m n where m is the numberof rows and n is the number of columns. In order to construct an augmented matrix froma linear system, we create a coefficient matrix from the coefficients of the variables inthe system, as well as a constant matrix from the constants. The coefficients from oneequation of the system create one row of the augmented matrix.For example, consider the linear system in Example 1.9x + 3y + 6z = 252x + 7y + 14z = 582y + 5z = 19This system can be written as an augmented matrix, as follows

1 3 6 25 2 7 14 58 0 2 5 19

Notice that it has exactly the same information as the original system. Here it is understoodorder, that the first column contains the coefficients from x in each equation, in

13 2 . Similarly, we create a column from the coefficients on y in each equation, 7 0223

6and a column from the coefficients on z in each equation, 14 . For a system of more5than three variables, we would continue in this way constructing a column for each variable.Similarly, for a system of less than three variables, we simply construct a column for eachvariable.

25Finally, we construct a column from the constants of the equations, 58 .19The rows of the augmented matrix correspondtotheequationsinthesystem.For exam

Consider the following definition.

Definition 1.10: Augmented Matrix of a Linear SystemFor a linear system of the forma11 x1 + + a1n xn = b1...am1 x1 + + amn xn = bmwhere the xi are variables and the aij and bi are constants, the augmented matrix ofthis system is given by

a11 a1n b1

.... ..

.. .am1 amn bmNow, consider elementary operations in the context of the augmented matrix. The elementary operations in Definition 1.6 can be used on the rows just as we used them onequations previously. Changes to a system of equations in as a result of an elementary operation are equivalent to changes in the augmented matrix resulting from the correspondingrow operation. Note that Theorem 1.8 implies that any elementary row operations used onan augmented matrix will not change the solution to the corresponding system of equations.We now formally define elementary row operations. These are the key tool we will use tofind solutions to systems of equations.

24

Definition 1.11: Elementary Row Operations

The elementary row operations (also known as row operations) consist of thefollowing1. Switch two rows.2. Multiply a row by a nonzero number.3. Replace a row by any multiple of another row added to it.Recall how we solved Example 1.9. We can do the exact same steps as above, except nowin the context of an augmented matrix and using row operations. The augmented matrix ofthis system is

1 3 6 25 2 7 14 58 0 2 5 19

Thus the first step in solving the system given by 1.5 would be to take (2) times the firstrow of the augmented matrix and add it to the second row,

1 3 6 25 0 1 2 8 0 2 5 19

Note how this corresponds to 1.6. Next take (2) times the second row and add to the third,

1 3 6 25 0 1 2 8 0 0 1 3

This augmented matrix corresponds to the system

x + 3y + 6z = 25y + 2z = 8z=3which is the same as 1.7. By back substitution you obtain the solution x = 1, y = 6, andz = 3.Through a systematic procedure of row operations, we can simplify an augmented matrixand carry it to row-echelon form or reduced row-echelon form, which we define next.These forms are used to find the solutions of the system of equations corresponding to theaugmented matrix.In the following definitions, the term leading entry refers to the first nonzero entry ofa row when scanning the row from left to right.

25

Definition 1.12: Row-Echelon Form

An augmented matrix is in row-echelon form if1. All nonzero rows are above any rows of zeros.2. Each leading entry of a row is in a column to the right of the leading entries ofany row above it.3. Each leading entry of a row is equal to 1.We also consider another reduced form of the augmented matrix which has one furthercondition.Definition 1.13: Reduced Row-Echelon FormAn augmented matrix is in reduced row-echelon form if1. All nonzero rows are above any rows of zeros.2. Each leading entry of a row is in a column to the right of the leading entries ofany rows above it.3. Each leading entry of a row is equal to 1.4. All entries in a column above and below a leading entry are zero.Notice that the first three conditions on a reduced row-echelon form matrix are the sameas those for row-echelon form.Hence, every reduced row-echelon form matrix is also in row-echelon form. The converseis not necessarily true; we cannot assume that every matrix in row-echelon form is also inreduced row-echelon form. However, it often happens that the row-echelon form is sufficientto provide information about the solution of a system.The following examples describe matrices in these various forms. As an exercise, takethe time to carefully verify that they are in the specified form.Example 1.14: Not in Row-Echelon FormThe following augmented matrices are not in row-echelon form (and therefore also notin reduced row-echelon form).

01000

02100

03000

03210

0 2

31 2 1 5

, 2 4 6 ,

7 5

4 070 026

3001

32

1 0

Example 1.15: Matrices in Row-Echelon Form

The following augmented matrices are in row-echelon form, but not in reduced rowechelon form.

1 3 5 401061 0 6 5 8 2

0 0 1 2 7 3 0 1 0 7 0 1 4 0 , 0 0 1 0 ,

0 0 1 0 0 0 0 0 1 1 0 0 0 1 0 0 0 0 0 00 0 0 00 0 0 0Notice that we could apply further row operations to these matrices to carry them toreduced row-echelon form. Take the time to try that on your own. Consider the followingmatrices, which are in reduced row-echelon form.Example 1.16: Matrices in Reduced Row-Echelon FormThe following augmented matrices are in reduced row-echelon

1 0 0 0

1 0 0 5 0 0 0 1 0 0 1 0 0 0 1 2 0 0

, 0 0 1 0 , 0 1

0 0 0 0 1 1 0 0 0 1 0 00 0 0 0 0 00 0 0 0

form.

0 40 3 1 2

One way in which the row-echelon form of a matrix is useful is in identifying the pivotpositions and pivot columns of the matrix.Definition 1.17: Pivot Position and Pivot ColumnA pivot position in a matrix is the location of a leading entry in the row-echelonformof a matrix. A pivot column is a column that contains a pivot position.For example consider the following.Example 1.18: Pivot PositionLet

1 2 3 4A= 3 2 1 6 4 4 4 10

Where are the pivot positions and pivot columns of the augmented matrix A?Solution. The row-echelon form of this matrix is

1 2 3 4 0 1 2 32 0 0 0 027

This is all we need in this example, but note that this matrix is not in reduced row-echelonform.In order to identify the pivot positions in the original matrix, we look for the leadingentries in the row-echelon form of the matrix. Here, the entry in the first row and firstcolumn, as well as the entry in the second row and second column are the leading entries.Hence, these locations are the pivot positions. We identify the pivot positions in the originalmatrix, as in the following:

12 3 4 3 2 1 6 4 4 4 10Thus the pivot columns in the matrix are the first two columns.

The following is an algorithm for carrying a matrix to row-echelon form and reduced rowechelon form. You may wish to use this algorithm to carry the above matrix to row-echelonform or reduced row-echelon form yourself for practice.Algorithm 1.19: Reduced Row-Echelon Form AlgorithmThis algorithm provides a method for using row operations to take a matrix to itsreduced row-echelon form. We begin with the matrix in its original form.1. Starting from the left, find the first nonzero column. This is the first pivotcolumn, and the position at the top of this column is the first pivot position.Switch rows if necessary to place a nonzero number in the first pivot position.2. Use row operations to make the entries below the first pivot position (in the firstpivot column) equal to zero.3. Ignoring the row containing the first pivot position, repeat steps 1 and 2 withthe remaining rows. Repeat the process until there are no more rows to modify.4. Divide each nonzero row by the value of the leading entry, so that the leadingentry becomes 1. The matrix will then be in row-echelon form.The following step will carry the matrix from row-echelon form to reduced row-echelonform.5. Moving from right to left, use row operations to create zeros in the entries of thepivot columns which are above the pivot positions. The result will be a matrixin reduced row-echelon form.Most often we will apply this algorithm to an augmented matrix in order to find thesolution to a system of linear equations. However, we can use this algorithm to compute thereduced row-echelon form of any matrix which could be useful in other applications.Consider the following example of Algorithm 1.19.

28

Example 1.20: Finding Row-Echelon Form and

Reduced Row-Echelon Form of a MatrixLet

0 5 443 A= 15 107

Find the row-echelon form of A. Then complete the process until A is in reducedrow-echelon form.Solution. In working through this example, we will use the steps outlined in Algorithm 1.19.1. The first pivot column is the first column of the matrix, as this is the first nonzerocolumn from the left. Hence the first pivot position is the one in the first row and firstcolumn. Switch the first two rows to obtain a nonzero entry in the first pivot position,outlined in a box below.

143 0 5 4 5 107

2. Step two involves creating zeros in the entries below the first pivot position. The firstentry of the second row is already a zero. All we need to do is subtract 5 times thefirst row from the third row. The resulting matrix is

143 0 5 4 0 1083. Now ignore the top row. Apply steps 1 and 2 to the smaller matrix

5 4108In this matrix, the first column is a pivot column, and 5 is in the first pivot position.Therefore, we need to create a zero below it. To do this, add 2 times the first row (ofthis matrix) to the second. The resulting matrix is

5 400Our original matrix now looks like

143 0 5 4 000

We can see that there are no more rows to modify.

29

4. Now, we need to create leading 1s in each row. The first row already has a leading 1so no work is needed here. Divide the second row by 5 to create a leading 1. Theresulting matrix is

1 4 3 0 1 45 0 0 0

This matrix is now in row-echelon form.

5. Now create zeros in the entries above pivot positions in each column, in order to carrythis matrix all the way to reduced row-echelon form. Notice that there is no pivotposition in the third column so we do not need to create any zeros in this column! Thecolumn in which we need to create zeros is the second. To do so, subtract 4 times thesecond row from the first row. The resulting matrix is

1 0 51

4 0 15 0 00This matrix is now in reduced row-echelon form.The above algorithm gives you a simple way to obtain the row-echelon form and reducedrow-echelon form of a matrix. The main idea is to do row operations in such a way as to endup with a matrix in row-echelon form or reduced row-echelon form. This process is importantbecause the resulting matrix will allow you to describe the solutions to the correspondinglinear system of equations in a meaningful way.In the next example, we look at how to solve a system of equations using the correspondingaugmented matrix.Example 1.21: Finding the Solution to a SystemGive the complete solution to the following system of equations2x + 4y 3z = 15x + 10y 7z = 23x + 6y + 5z = 9Solution. The augmented matrix for this system is

2 4 3 1 5 10 7 2 3 659

In order to find the solution to this system, we wish to carry the augmented matrix toreduced row-echelon form. We will do so using Algorithm 1.19. Notice that the first columnis nonzero, so this is our first pivot column. The first entry in the first row, 2, is the firstleading entry and it is in the first pivot position. We will use row operations to create zeros30

in the entries below the 2. First, replace

times the second row. This yields

2 03

the second row with 5 times the first row plus 2

4 3 1011 659

Now, replace the third row with 3 times the first row plus to 2 times the third row. Thisyields

2 4 3 1 0 011 0 01 21

Now the entries in the first column below the pivot position are zeros. We now look for thesecond pivot column, which in this case is column three. Here, the 1 in the second row andthird column is in the pivot position. We need to do just one row operation to create a zerobelow the 1.Taking 1 times the second row and adding it to the third row yields

2 4 3 1 0 011 0 00 20We could proceed with the algorithm to carry this matrix to row-echelon form or reducedrow-echelon form. However, remember that we are looking for the solutions to the systemof equations. Take another look at the third row of the matrix. Notice that it correspondsto the equation0x + 0y + 0z = 20

There is no solution to this equation because for all x, y, z, the left side will equal 0 and0 6= 20. This shows there is no solution to the given system of equations. In other words,this system is inconsistent.The following is another example of how to find the solution to a system of equations bycarrying the corresponding augmented matrix to reduced row-echelon form.Example 1.22: An Infinite Set of SolutionsGive the complete solution to the system of equations3x y 5z = 9y 10z = 02x + y = 6Solution. The augmented matrix of this system is

3 1 59 01 100 210 631

(1.8)

In order to find the solution to this system, we will carry the augmented matrix to reducedrow-echelon form, using Algorithm 1.19. The first column is the first pivot column. We wantto use row operations to create zeros beneath the first entry in this column, which is in thefirst pivot position. Replace the third row with 2 times the first row added to 3 times thethird row. This gives

3 1 5 9 01 10 0 01 10 0

Now, we have created zeros beneath the 3 in the first column, so we move on to the secondpivot column (which is the second column) and repeat the procedure. Take 1 times thesecond row and add to the third row.

3 1 5 9 01 10 0 000 0

The entry below the pivot position in the second column is now a zero. Notice that we haveno more pivot columns because we have only two leading entries.At this stage, we also want the leading entries to be equal to one. To do so, divide thefirst row by 3.

1 13 35 3

01 10 0 000 0This matrix is now in row-echelon form.Lets continue with row operations until the matrix is in reduced row-echelon form. Thisinvolves creating zeros above the pivot positions in each pivot column. This requires onlyone step, which is to add 13 times the second row to the first row.

1 0 5 3 0 1 10 0 0 00 0This is in reduced row-echelon form, which you should verify using Definition 1.13. Theequations corresponding to this reduced row-echelon form are

or

x 5z = 3y 10z = 0x = 3 + 5zy = 10z

Observe that z is not restrained by any equation. In fact, z can equal any number. Forexample, we can let z = t, where we can choose t to be any number. In this context t iscalled a parameter . Therefore, the solution set of this system isx = 3 + 5ty = 10tz=t32

where t is arbitrary. The system has an infinite set of solutions which are given by theseequations. For any value of t we select, x, y, and z will be given by the above equations. Forexample, if we choose t = 4 then the corresponding solution would bex = 3 + 5(4) = 23y = 10(4) = 40z=4

In Example 1.22 the solution involved one parameter. It may happen that the solutionto a system involves more than one parameter, as shown in the following example.Example 1.23: A Two Parameter Set of SolutionsFind the solution to the systemx + 2y z + w = 3x+yz+w = 1x + 3y z + w = 5Solution. The augmented matrix is

1 2 1 1 3 1 1 1 1 1 1 3 1 1 5

We wish to carry this matrix to row-echelon form. Here, we will outline the row operationsused. However, make sure that you understand the steps in terms of Algorithm 1.19.Take 1 times the first row and add to the second. Then take 1 times the first rowand add to the third. This yields

312 1 1 0 10 0 2 010 02Now add the second row to the third

1 00

row and divide the second row by 1.

2 1 1 310 0 2 00 0 0

(1.9)

This matrix is in row-echelon form and we can see that x and y correspond to pivotcolumns, while z and w do not. Therefore, we will assign parameters to the variables z andw. Assign the parameter s to z and the parameter t to w. Then the first row yields theequation x + 2y s + t = 3, while the second row yields the equation y = 2. Since y = 2,the first equation becomes x + 4 s + t = 3 showing that the solution is given byx = 1 + s ty=2z=sw=t33

It is customary to write this solution in the form

1 + s tx y 2

z =stw

(1.10)

This example shows a system of equations with an infinite solution set which depends ontwo parameters. It can be less confusing in the case of an infinite solution set to first placethe augmented matrix in reduced row-echelon form rather than just row-echelon form beforeseeking to write down the description of the solution.In the above steps, this means we dont stop with the row-echelon form in equation 1.9.Instead we first place it in reduced row-echelon form as follows.

1 0 1 1 1 0 10 02 0 00 00

Then the solution is y = 2 from the second row and x = 1 + z w from the first. Thusletting z = s and w = t, the solution is given by 1.10.You can see here that there are two paths to the correct answer, which both yield thesame answer. Hence, either approach may be used. The process which we first used in theabove solution is called Gaussian Elimination This process involves carrying the matrixto row-echelon form, converting back to equations, and using back substitution to find thesolution. When you do row operations until you obtain reduced row-echelon form, the processis called Gauss-Jordan Elimination.We have now found solutions for systems of equations with no solution and infinitelymany solutions, with one parameter as well as two parameters. Recall the three types ofsolution sets which we discussed in the previous section; no solution, one solution, andinfinitely many solutions. Each of these types of solutions could be identified from the graphof the system. It turns out that we can also identify the type of solution from the reducedrow-echelon form of the augmented matrix. No Solution: In the case where the system of equations has no solution, the rowechelon form of the augmented matrix will have a row of the form

0 0 0 | 1This row indicates that the system is inconsistent and has no solution.

One Solution: In the case where the system of equations has one solution, everycolumn of the coefficient matrix is a pivot column. The following is an example ofan augmented matrix in reduced row-echelon form for a system of equations with onesolution.

1 0 0 5 0 1 0 0 0 0 1 234

Infinitely Many Solutions: In the case where the system of equations has infinitelymany solutions, the solution contains parameters. There will be columns of the coefficient matrix which are not pivot columns. The following are examples of augmentedmatrices in reduced row-echelon form for systems of equations with infinitely manysolutions.

51 0 0 0 1 2 3 0 0 00or

51 0 00 1 0 3

1.2.3. Uniqueness of the Reduced Row-Echelon Form

As we have seen in earlier sections, we know that every matrix can be brought into reducedrow-echelon form by a sequence of elementary row operations. Here we will prove thatthe resulting matrix is unique; in other words, the resulting matrix in reduced row-echelonform does not depend upon the particular sequence of elementary row operations or the orderin which they were performed.Let A be the augmented matrix of a homogeneous system of linear equations in thevariables x1 , x2 , , xn which is also in reduced row-echelon form. The matrix A divides theset of variables in two different types. We say that xi is a basic variable whenever A has aleading 1 in column number i, in other words, when column i is a pivot column. Otherwisewe say that xi is a free variable.Recall Example 1.23.Example 1.24: Basic and Free VariablesFind the basic and free variables in the systemx + 2y z + w = 3x+yz+w = 1x + 3y z + w = 5Solution. Recall from the solution of Examplemented matrix of this system is given by

1 2 1 0 100 00

1.23 that the row-echelon form of the aug

1 30 2 0 0

You can see that columns 1 and 2 are pivot columns. These columns correspond to variablesx and y, making these the basic variables. Columns 3 and 4 are not pivot columns, whichmeans that z and w are free variables.35

We can write the solution to this system as

x = 1 + s ty=2z=sw=tHere the free variables are written as parameters, and the basic variables are given bylinear functions of these parameters.In general, all solutions can be written in terms of the free variables. In such a description,the free variables can take any values (they become parameters), while the basic variablesbecome simple linear functions of these parameters. Indeed, a basic variable xi is a linearfunction of only those free variables xj with j > i. This leads to the following observation.Proposition 1.25: Basic and Free VariablesIf xi is a basic variable of a homogeneous system of linear equations, then any solutionof the system with xj = 0 for all those free variables xj with j > i must also havexi = 0.Using this proposition, we prove a lemma which will be used in the proof of the mainresult of this section below.Lemma 1.26: Solutions and the Reduced Row-Echelon Form of a MatrixLet A and B be two distinct augmented matrices for two homogeneous systems of mequations in n variables, such that A and B are each in reduced row-echelon form.Then, the two systems do not have exactly the same solutions.Proof. With respect to the linear systems associated with the matrices A and B, there aretwo cases to consider: Case 1: the two systems have the same basic variables Case 2: the two systems do not have the same basic variablesIn case 1, the two matrices will have exactly the same pivot positions. However, since A andB are not identical, there is some row of A which is different from the corresponding row ofB and yet the rows each have a pivot in the same column position. Let i be the index ofthis column position. Since the matrices are in reduced row-echelon form, the two rows mustdiffer at some entry in a column j > i. Let these entries be a in A and b in B, where a 6= b.Since A is in reduced row-echelon form, if xj were a basic variable for its linear system, wewould have a = 0. Similarly, if xj were a basic variable for the linear system of the matrix B,we would have b = 0. Since a and b are unequal, they cannot both be equal to 0, and hencexj cannot be a basic variable for both linear systems. However, since the systems have thesame basic variables, xj must then be a free variable for each system. We now look at the36

solutions of the systems in which xj is set equal to 1 and all other free variables are set equalto 0. For this choice of parameters, the solution of the system for matrix A has xj = a,while the solution of the system for matrix B has xj = b, so that the two systems havedifferent solutions.In case 2, there is a variable xi which is a basic variable for one matrix, lets say A, anda free variable for the other matrix B. The system for matrix B has a solution in whichxi = 1 and xj = 0 for all other free variables xj . However, by Proposition 1.25 this cannotbe a solution of the system for the matrix A. This completes the proof of case 2.Now, we say that the matrix B is equivalent to the matrix A provided that B can beobtained from A by performing a sequence of elementary row operations beginning with A.The importance of this concept lies in the following result.Theorem 1.27: Equivalent MatricesThe two linear systems of equations corresponding to two equivalent augmented matrices have exactly the same solutions.The proof of this theorem is left as an exercise.Now, we can use Lemma 1.26 and Theorem 1.27 to prove the main result of this section.Theorem 1.28: Uniqueness of the Reduced Row-Echelon FormEvery matrix A is equivalent to a unique matrix in reduced row-echelon form.Proof. Let A be an m n matrix and let B and C be matrices in reduced row-echelon form,each equivalent to A. It suffices to show that B = C.Let A+ be the matrix A augmented with a new rightmost column consisting entirely ofzeros. Similarly, augment matrices B and C each with a rightmost column of zeros to obtainB + and C + . Note that B + and C + are matrices in reduced row-echelon form which areobtained from A+ by respectively applying the same sequence of elementary row operationswhich were used to obtain B and C from A.Now, A+ , B + , and C + can all be considered as augmented matrices of homogeneouslinear systems in the variables x1 , x2 , , xn . Because B + and C + are each equivalent toA+ , Theorem 1.27 ensures that all three homogeneous linear systems have exactly the samesolutions. By Lemma 1.26 we conclude that B + = C + . By construction, we must also haveB = C.According to this theorem we can say that each matrix A has a unique reduced rowechelon form.

1.2.4. Rank and Homogeneous Systems

There is a special type of system which requires additional study. This type of system iscalled a homogeneous system of equations, which we defined above in Definition 1.3. Our37

focus in this section is to consider what types of solutions are possible for a homogeneoussystem of equations.Consider the following definition.Definition 1.29: Trivial SolutionConsider the homogeneous system of equations given bya11 x1 + a12 x2 + + a1n xn = 0a21 x1 + a22 x2 + + a2n xn = 0...am1 x1 + am2 x2 + + amn xn = 0Then, x1 = 0, x2 = 0, , xn = 0 is always a solution to this system. We call this thetrivial solution .If the system has a solution in which not all of the x1 , , xn are equal to zero, then wecall this solution nontrivial . The trivial solution does not tell us much about the system,as it says that 0 = 0! Therefore, when working with homogeneous systems of equations, wewant to know when the system has a nontrivial solution.Suppose we have a homogeneous system of m equations, using n variables, and supposethat n > m. In other words, there are more variables than equations. Then, it turns outthat this system always has a nontrivial solution. Not only will the system have a nontrivialsolution, but it also will have infinitely many solutions. It is also possible, but not required,to have a nontrivial solution if n = m and n < m.Consider the following example.Example 1.30: Solutions to a Homogeneous System of EquationsFind the nontrivial solutions to the following homogeneous system of equations2x + y z = 0x + 2y 2z = 0Solution. Notice that this system has m = 2 equations and n = 3 variables, so n > m.Therefore by our previous discussion, we expect this system to have infinitely many solutions.The process we use to find the solutions for a homogeneous system of equations is thesame process we used in the previous section. First, we construct the augmented matrix,given by

2 1 1 01 2 2 0

Then, we carry this matrix to its reduced row-echelon form, given below.

1 00 00 1 1 038

The corresponding system of equations is

x=0yz =0Since z is not restrained by any equation, we know that this variable will become our parameter. Let z = t where t is any number. Therefore, our solution has the formx=0y=z=tz=tHence this system has infinitely many solutions, with one parameter t.Suppose we were to write the solution to the previous example in another form. Specifically,x=0y =0+tz =0+tcan be written as

x00 y = 0 + t 1 z01

Notice that we have constructed a column from the constants in the solution (all equal to0), as well as a column corresponding to the coefficients on t in each equation. While wewill discuss this form of solution more in further chapters, for now consider the column of0

coefficients of the parameter t. In this case, this is the column 1 .

1There is a special name for this column, which is basic solution. The basic solutionsof a system are columns constructed from the coefficients on parameters in the solution.We often denote basic solutions by X1 , X2 etc., depending on how many solutions occur.0

Therefore, Example 1.30 has the basic solution X1 = 1 .

1We explore this further in the following example.Example 1.31: Basic Solutions of a Homogeneous SystemConsider the following homogeneous system of equations.x + 4y + 3z = 03x + 12y + 9z = 0Find the basic solutions to this system.

39

Solution. The augmented matrix of this system and the resulting reduced row-echelonform are

1 4 3 01 4 3 0

3 12 9 00 0 0 0

When written in equations, this system is given by

x + 4y + 3z = 0Notice that only x corresponds to a pivot column. In this case, we will have two parameters,one for y and one for z. Let y = s and z = t for any numbers s and t. Then, our solutionbecomesx = 4s 3ty=sz=twhich can be written as

x043 y = 0 + s 1 + t 0 z001

You can see here that we have two columns of coefficients corresponding to parameters,specifically one for s and one for t. Therefore, this system has two basic solutions! Theseare

43X1 = 1 , X2 = 0 01We now present a new definition.Definition 1.32: Linear CombinationLet X1 , , Xn , V be column matrices. Then V is said to be a linear combinationof the columns X1 , , Xn if there exist scalars, a1 , , an such thatV = a1 X1 + + an XnA remarkable result of this section is that a linear combination of the basic solutions isagain a solution to the system. Even more remarkable is that every solution can be writtenas a linear combination of these solutions. Therefore, if we take a linear combination of thetwo solutions to Example 1.31, this would also be a solution. For example, we could takethe following linear combination

43183 3 1 +2 0 = 01240

You should take a moment to verify that

x18 y =3 z2

is in fact a solution to the system in Example 1.31.

Another way in which we can find out more information about the solutions of a homogeneous system is to consider the rank of the associated coefficient matrix. We now definewhat is meant by the rank of a matrix.Definition 1.33: Rank of a MatrixLet A be a matrix and consider any row-echelon form of A. Then, the number r ofleading entries of A does not depend on the row-echelon form you choose, and is calledthe rank of A. We denote it by rank(A).Similarly, we could count the number of pivot positions (or pivot columns) to determinethe rank of A.Example 1.34: Finding the Rank of a MatrixConsider the matrix

What is its rank?

1 2 3 1 5 9 2 4 6

Solution. First, we need to find the reduced row-echelon form of A. Through the usualalgorithm, we find that this is

0 11 0 12 0 00

Here we have two leading entries, or two pivot positions, shown above in boxes.The rank ofA is r = 2.Notice that we would have achieved the same answer if we had found the row-echelonform of A instead of the reduced row-echelon form.Suppose we have a homogeneous system of m equations in n variables, and suppose thatn > m. From our above discussion, we know that this system will have infinitely manysolutions. If we consider the rank of the coefficient matrix of this system, we can find outeven more about the solution. Note that we are looking at just the coefficient matrix, notthe entire augmented matrix.

41

Theorem 1.35: Rank and Solutions to a Homogeneous System

Let A be the m n coefficient matrix corresponding to a homogeneous system ofequations, and suppose A has rank r. Then, the solution to the corresponding systemhas n r parameters.Consider our above Example 1.31 in the context of this theorem. The system in thisexample has m = 2 equations in n = 3 variables. First, because n > m, we know that thesystem has a nontrivial solution, and therefore infinitely many solutions. This tells us thatthe solution will contain at least one parameter. The rank of the coefficient matrix can tellus even more about the solution! The rank of the coefficient matrix of the system is 1, as ithas one leading entry in row-echelon form. Theorem 1.35 tells us that the solution will haven r = 3 1 = 2 parameters. You can check that this is true in the solution to Example1.31.Notice that if n = m or n < m, it is possible to have either a unique solution (which willbe the trivial solution) or infinitely many solutions.We are not limited to homogeneous systems of equations here. The rank of a matrixcan be used to learn about the solutions of any system of linear equations. In the previoussection, we discussed that a system of equations can have no solution, a unique solution,or infinitely many solutions. Suppose the system is consistent, whether it is homogeneousor not. The following theorem tells us how we can use the rank to learn about the type ofsolution we have.Theorem 1.36: Rank and Solutions to a Consistent System of EquationsLet A be the m (n + 1) augmented matrix corresponding to a consistent system ofequations in n variables, and suppose A has rank r. Then1. the system has a unique solution if r = n2. the system has infinitely many solutions if r < nWe will not present a formal proof of this, but consider the following discussions.1. No Solution The above theorem assumes that the system is consistent, that is, thatit has a solution. It turns out that it is possible for the augmented matrix of a systemwith no solution to have any rank r as long as r > 1. Therefore, we must know thatthe system is consistent in order to use this theorem!2. Unique Solution Suppose r = n. Then, there is a pivot position in every column ofthe coefficient matrix of A. Hence, there is a unique solution.3. Infinitely Many Solutions Suppose r < n. Then there are infinitely many solutions.There are less pivot positions (and hence less leading entries) than columns, meaningthat not every column is a pivot column. The columns which are not pivot columnscorrespond to parameters. In fact, in this case we have n r parameters.42

1.2.5. Balancing Chemical Reactions

The tools of linear algebra can also be used in the subject area of Chemistry, specifically forbalancing chemical reactions.Consider the chemical reactionSnO2 + H2 Sn + H2 OHere the elements involved are tin (Sn), oxygen (O), and hydrogen (H). A chemical reactionoccurs and the result is a combination of tin (Sn) and water (H2 O). When consideringchemical reactions, we want to investigate how much of each element we began with andhow much of each element is involved in the result.An important theory we will use here is the mass balance theory. It tells us that wecannot create or delete elements within a chemical reaction. For example, in the aboveexpression, we must have the same number of oxygen, tin, and hydrogen on both sides ofthe reaction. Notice that this is not currently the case. For example, there are two oxygenatoms on the left and only one on the right. In order to fix this, we want to find numbersx, y, z, w such thatxSnO2 + yH2 zSn + wH2 O

where both sides of the reaction have the same number of atoms of the various elements.This is a familiar problem. We can solve it by setting up a system of equations in thevariables x, y, z, w. Thus you needSn : x = zO : 2x = wH : 2y = 2wWe can rewrite these equations asSn : x z = 0O : 2x w = 0H : 2y 2w = 0The augmented matrix for this system of equations is given by

1 0 10 0 2 00 1 0 0 20 2 0

The reduced row-echelon form of this matrix is

1 0 0 21 0

0 1 0 1 0 0 0 1 21 0

The solution is given by

x 12 w = 0yw = 0z 12 w = 043

which we can write as

x = 21 ty=tz = 12 tw=t

For example, let w = 2 and this would yield x = 1, y = 2, and z = 1. We can put thesevalues back into the expression for the reaction which yieldsSnO2 + 2H2 Sn + 2H2 OObserve that each side of the expression contains the same number of atoms of each element.This means that it preserves the total number of atoms, as required, and so the chemicalreaction is balanced.Consider another example.Example 1.37: Balancing a Chemical ReactionPotassium is denoted by K, oxygen by O, phosphorus by P and hydrogen by H.Consider the reaction given byKOH + H3 P O4 K3 P O4 + H2 OBalance this chemical reaction.Solution. We will use the same procedure as above to solve this problem. We need to findvalues for x, y, z, w such thatxKOH + yH3P O4 zK3 P O4 + wH2Opreserves the total number of atoms ofFinding these values can be doneequations.K:O:H:P :

The solution is given by

which can be written as

that the balanced reaction is given by3KOH + 1H3 P O4 1K3 P O4 + 3H2 ONote that this results in the same number of atoms on both sides.Of course these numbers you are finding would typically be the number of moles of themolecules on each side. Thus three moles of KOH added to one mole of H3 P O4 yields onemole of K3 P O4 and three moles of H2 O.

1.2.6. Dimensionless Variables

This section shows how solving systems of equations can be used to determine appropriatedimensionless variables. It is only an introduction to this topic and considers a specificexample of a simple airplane wing shown below. We assume for simplicity that it is a flatplane at an angle to the wind which is blowing against it with speed V as shown.V

The angle is called the angle of incidence, B is the span of the wing and A is calledthe chord. Denote by l the lift. Then this should depend on various quantities like , V, B, Aand so forth. Here is a table which indicates various quantities on which it is reasonable toexpect l to depend.VariableSymbol UnitschordAmspanBmangle incidence m0 kg 0 sec0speed of windVm sec1speed of sound V0m sec1density of air

kgm3viscosity

kg sec1 m1liftlkg sec2 m45

Here m denotes meters, sec refers to seconds and kg refers to kilograms. All of these arelikely familiar except for , which we will discuss in further detail now.Viscosity is a measure of how much internal friction is experienced when the fluid moves.It is roughly a measure of how sticky the fluid is. Consider a piece of area parallel to thedirection of motion of the fluid. To say that the viscosity is large is to say that the tangentialforce applied to this area must be large in order to achieve a given change in speed of thefluid in a direction normal to the tangential force. Thus (area) (velocity gradient) = tangential forceHence(units on ) m2Thus the units on are

m = kg sec2 msec m

kg sec1 m1as claimed above.Returning to our original discussion, you may think that we would wantl = f (A, B, , V, V0 , , )This is very cumbersome because it depends on seven variables. Also, it is likely that withoutmuch care, a change in the units such as going from meters to feet would result in an incorrectvalue for l. The way to get around this problem is to look for l as a function of dimensionlessvariables multiplied by something which has units of force. It is helpful because first of all,you will likely have fewer independent variables and secondly, you could expect the formulato hold independent of the way of specifying length, mass and so forth. One looks forl = f (g1 , , gk ) V 2 ABwhere the units on V 2 AB arekg m 2 2 kg mm =m3 secsec2which are the units of force. Each of these gi is of the formAx1 B x2 x3 V x4 V0x5 x6 x7

(1.11)

and each gi is independent of the dimensions. That is, this expression must not depend onmeters, kilograms, seconds, etc. Thus, placing in the units for each of these quantities, oneneedsx7

x6kg sec1 m1= m0 kg 0 sec0mx1 mx2 mx4 secx4 mx5 secx5 kgm3Notice that there are no units on because it is just the radian measure of an angle. Henceits dimensions consist of length divided by length, thus it is dimensionless. Then this leadsto the following equations for the xi .m:sec :kg :

then the dimensionless variable which results from this is A1 V 1 1 . It is customary to

write it as Re = (AV ) /. This one is called the Reynolds number. It is the one whichinvolves viscosity. Thus we would look forl = f (Re, AR, , M) kg m/ sec247

This is quite interesting because it is easy to vary Re by simply adjusting the velocity orA but it is hard to vary things like or . Note that all the quantities are easy to adjust.Now this could be used, along with wind tunnel experiments to get a formula for the liftwhich would be reasonable. You could also consider more variables and more complicatedsituations in the same way.

1.2.7. Exercises1. Find the point (x1 , y1 ) which lies on both lines, x + 3y = 1 and 4x y = 3.2. Find the point of intersection of the two lines 3x + y = 3 and x + 2y = 1.3. Do the three lines, x + 2y = 1, 2x y = 1, and 4x + 3y = 3 have a common point ofintersection? If so, find the point and if not, tell why they dont have such a commonpoint of intersection.4. Do the three planes, x + y 3z = 2, 2x + y + z = 1, and 3x + 2y 2z = 0 havea common point of intersection? If so, find one and if not, tell why there is no suchpoint.5. Four times the weight of Gaston is 150 pounds more than the weight of Ichabod.Four times the weight of Ichabod is 660 pounds less than seventeen times the weightof Gaston. Four times the weight of Gaston plus the weight of Siegfried equals 290pounds. Brunhilde would balance all three of the others. Find the weights of the fourpeople.6. Consider the following augmented matrix in which denotes an arbitrary numberand denotes a nonzero number. Determine whether the given augmented matrix isconsistent. If consistent, is the solution unique?

0 0

0 0 0 0 0 0 7. Consider the following augmented matrix in which denotes an arbitrary numberand denotes a nonzero number. Determine whether the given augmented matrix isconsistent. If consistent, is the solution unique?

0 0 0 8. Consider the following augmented matrix in which denotes an arbitrary numberand denotes a nonzero number. Determine whether the given augmented matrix is

48

consistent. If consistent, is the solution

0

0 00 0

unique?

000

0

9. Consider the following augmented matrix in which denotes an arbitrary number

and denotes a nonzero number. Determine whether the given augmented matrix isconsistent. If consistent, is the solution unique?

0 0

0 0 0 0 0 0 0 0 0 10. Suppose a system of equations has fewer equations than variables. Will such a systemnecessarily be consistent? If so, explain why and if not, give an example which is notconsistent.11. If a system of equations has more equations than variables, can it have a solution? Ifso, give an example and if not, tell why not.12. Find h such that

2 h 43 6 7

1 h 32 4 6

1 1 43 h 12

is the augmented matrix of an inconsistent system.

13. Find h such that

is the augmented matrix of a consistent system.

14. Find h such that

is the augmented matrix of a consistent system.

15. Choose h and k such that the augmented matrix shown has each of the following:(a) one solution(b) no solution(c) infinitely many solutions

1 h 22 4 k49

16. Choose h and k such that the augmented matrix shown has each of the following:(a) one solution(b) no solution(c) infinitely many solutions

1 2 22 h k

17. Determine if the system is consistent. If so, is the solution unique?

40. Find the solution to the system of equations, 19x + 8y = 108, 71x + 30y = 404,2x + y = 12, 4x + z = 14.41. Suppose a system of equations has fewer equations than variables and you have founda solution to this system of equations. Is it possible that your solution is the only one?Explain.42. Suppose a system of linear equations has a 24 augmented matrix and the last columnis a pivot column. Could the system of linear equations be consistent? Explain.43. Suppose the coefficient matrix of a system of n equations with n variables has theproperty that every column is a pivot column. Does it follow that the system ofequations must have a solution? If so, must the solution be unique? Explain.44. Suppose there is a unique solution to a system of linear equations. What must be trueof the pivot columns in the augmented matrix?45. The steady state temperature, u, of a plate solves Laplaces equation, u = 0. One wayto approximate the solution is to divide the plate into a square mesh and require thetemperature at each node to equal the average of the temperature at the four adjacentnodes. In the following picture, the numbers represent the observed temperature atthe indicated nodes. Find the temperature at the interior nodes, indicated by x, y, z,and w. One of the equations is z = 41 (10 + 0 + w + x).2020

30 30y w 0x z 010 10

46. Consider the following diagram of four circuits.

35 volts

I2

I1

I3

210 volts

20 volts1

6I4

The jagged lines denote resistors and the numbers next to them give their resistancein ohms, written as . The breaks in the lines having one short line and one longline denote a voltage source which causes the current to flow in the direction whichgoes from the longer of the two lines toward the shorter along the unbroken part ofthe circuit. The current in amps in the four circuits is denoted by I1 , I2 , I3 , I4 and53

it is understood that the motion is in the counter clockwise direction. If Ik ends up

being negative, then it just means the current flows in the clockwise direction. ThenKirchhoffs law states:The sum of the resistance times the amps in the counter clockwise direction around aloop equals the sum of the voltage sources in the same direction around the loop.In the above diagram, the top left circuit should give the equation2I2 2I1 + 5I2 5I3 + 3I2 = 5For the circuit on the lower left, you should have4I1 + I1 I4 + 2I1 2I2 = 10Write equations for each of the other two circuits and then give a solution to theresulting system of equations.47. Consider the following diagram of three circuits.310 volts

I1

12 volts7 I2

1I3

4The jagged lines denote resistors and the numbers next to them give their resistancein ohms, written as . The breaks in the lines having one short line and one longline denote a voltage source which causes the current to flow in the direction whichgoes from the longer of the two lines toward the shorter along the unbroken part ofthe circuit. The current in amps in the four circuits is denoted by I1 , I2 , I3 and itis understood that the motion is in the counter clockwise direction. If Ik ends upbeing negative, then it just means the current flows in the clockwise direction. ThenKirchhoffs law states:The sum of the resistance times the amps in the counter clockwise direction around aloop equals the sum of the voltage sources in the same direction around the loop.Find I1 , I2 , I3 .48. Find the rank of the following matrix.

4 16 1 5 1 40 1 1 4 1 254

49. Find the rank of the following matrix.

3 6 5 12 1 2 2 5 1 2 1 250. Find the rank of the following matrix.

00 103 1410 8

14012 1 40 1 251. Find the rank of the following matrix.

4 4 3 9 1 1 1 2 1 1 0 352. Find the rank of the following matrix.

2 1

11

0000

1100

0011

53. Find the rank of the following matrix.

4 15 29 1 4 8

1 3 53 9 15

10

7 7

54. Find the rank of the following matrix.

00 101 123 2 18

122 1 111 2 2111

55

55. Find the rank of the following matrix.

1 2 1 2

1 200

0000

3 114 15

3 11 0 0

56. Find the rank of the following matrix.

2 3 2 111

10130 3

57. Find the rank of the following matrix.

4 4 20 1 17 1 1 50 5

1 1 5 1 2 3 3 15 3 658. Find the rank of the following matrix.

134 38 1 3 42 5

1 3 41 2 268 2459. Suppose A is an m n matrix. Explain why the rank of A is always no larger thanmin (m, n) .60. State whether each of the following sets of data are possible for the matrix equationAX = B. If possible, describe the solution set. That is, tell whether there existsa unique solution, no solution or infinitely many solutions. Here, [A|B] denotes theaugmented matrix.(a) A is a 5 6 matrix, rank (A) = 4 and rank [A|B] = 4.

(d) A is a 5 5 matrix, rank (A) = 4 and rank [A|B] = 5.

zero and so 5x + 2y z = 5x 2y z which is equivalent to y = 0. Does it followthat x and z can equal anything? Notice that when x = 1, z = 4, and y = 0 areplugged in to the equations, the equations do not equal 0. Why?62. Balance the following chemical reactions.(a) KNO3 + H2 CO3 K2 CO3 + HNO3

(b) AgI + Na2 S Ag2 S + NaI

(c) Ba3 N2 + H2 O Ba (OH)2 + NH3

(d) CaCl2 + Na3 P O4 Ca3 (P O4)2 + NaCl

63. In the section on dimensionless variables it was observed that V 2 AB has the units offorce. Describe a systematic way to obtain such combinations of the variables whichwill yield something which has the units of force.

57

58

2. Matrices2.1 Matrix Arithmetic

OutcomesA. Perform the matrix operations of matrix addition, scalar multiplication, transposition and matrix multiplication. Identify when these operations are not defined.Represent these operations in terms of the entries of a matrix.B. Prove algebraic properties for matrix addition, scalar multiplication, transposition, and matrix multiplication. Apply these properties to manipulate an algebraic expression involving matrices.C. Compute the inverse of a matrix using row operations, and prove identities involving matrix inverses.E. Solve a linear system using matrix algebra.F. Use multiplication by an elementary matrix to apply row operations.G. Write a matrix as a product of elementary matrices.You have now solved systems of equations by writing them in terms of an augmentedmatrix and then doing row operations on this augmented matrix. It turns out that matricesare important not only for systems of equations but also in many applications.Recall that a matrix is a rectangular array of numbers. Several of them are referred toas matrices. For example, here is a matrix.

12 3 4 52 8 7 (2.1)6 9 1 2

Recall that the size or dimension of a matrix is defined as m n where m is the number ofrows and n is the number of columns. The above matrix is a 3 4 matrix because there arethree rows and four columns. You can remember the columns are like columns in a Greektemple. They stand upright while the rows lay flat like rows made by a tractor in a plowedfield.59

When specifying the size of a matrix, you always list the number of rows before thenumber of columns.You might remember that you always list the rows before the columnsby using the phrase Rowman Catholic.Consider the following definition.Definition 2.1: Square MatrixA matrix A which has size n n is called a square matrix . In other words, A is asquare matrix if it has the same number of rows and columns.There is some notation specific to matrices which we now introduce. We denote thecolumns of a matrix A by Aj as follows

A = A1 A2 An

Therefore, Aj is the j th column of A, when counted from left to right.

The individual elements of the matrix are called entries or components of A. Elementsof the matrix are identified according to their position. The (i, j)-entry of a matrix is theentry in the ith row and j th column. For example, in the matrix 2.1 above, 8 is in position(2, 3) (and is called the (2, 3)-entry) because it is in the second row and the third column.In order to remember which matrix we are speaking of, we will denote the entry in theith row and the j th column of matrix A by aij . Then, we can write A in terms of its entries,as A = [aij ]. Using this notation on the matrix in 2.1, a23 = 8, a32 = 9, a12 = 2, etc.There are various operations which are done on matrices of appropriate sizes. Matricescan be added to and subtracted from other matrices, multiplied by a scalar, and multipliedby other matrices. We will never divide a matrix by another matrix, but we will see laterhow matrix inverses play a similar role.In doing arithmetic with matrices, we often define the action by what happens in termsof the entries (or components) of the matrices. Before looking at these operations in depth,consider a few general definitions.Definition 2.2: The Zero MatrixThe m n zero matrix is the m n matrix having every entry equal to zero. It isdenoted by 0.One possible zero matrix is shown in the following example.Example 2.3: The Zero Matrix

Definition 2.4: Equality of Matrices

Let A and B be two m n matrices. Then A = B means that for A = [aij ] andB = [bij ] , aij = bij for all 1 i m and 1 j n.In other words, two matrices are equal exactly when they are the same size and thecorresponding entries are identical. Thus

0 000 0 0 6=0 00 0because they are different sizes. Also,

0 13 2

6=

1 02 3

because, although they are the same size, their corresponding entries are not identical.In the following section, we explore addition of matrices.

2.1.1. Addition of Matrices

When adding matrices, all matrices in the sum need have the same size. For example,

1 2 3 4 5 2and

1 4 82 8 5

cannot be added, as one has size 3 2 while the other has size 2 3.However, the addition

Definition 2.5: Addition of Matrices

This definition tells us that when adding matrices, we simply add corresponding entriesof the matrices. This is demonstrated in the next example.Example 2.6: Addition of Matrices of Same SizeAdd the following matrices, if possible.

5 2 31 2 3,B =A=6 2 11 0 4Solution. Notice that both A and B are of size 2 3. Since A and B are of the same size,the addition is possible. Using Definition 2.5, the addition is done as follows.

6 4 61+5 2+2 3+35 2 31 2 3==+A+B =5 2 51 + 6 0 + 2 4 + 16 2 11 0 4Addition of matrices obeys very much the same properties as normal addition with numbers. Note that when we write for example A + B then we assume that both matrices areof equal size so that the operation is indeed possible.Proposition 2.7: Properties of Matrix AdditionLet A, B and C be matrices. Then, the following properties hold. Commutative Law of AdditionA+B =B+A

(2.2)

(A + B) + C = A + (B + C)

(2.3)

Associative Law of Addition

Existence of an Additive Identity

There exists a zero matrix 0 such thatA+0=A

(2.4)

Existence of an Additive Inverse

There exists a matrix A such thatA + (A) = 0

(2.5)

Proof. Consider the Commutative Law of Addition given in 2.2. Let A, B, C, and D bematrices such that A + B = C and B + A = D. We want to show that D = C. To do so, wewill use the definition of matrix addition given in Definition 2.5. Now,cij = aij + bij = bij + aij = dij62

Therefore, C = D because the ij th entries are the same for all i and j. Note that theconclusion follows from the commutative law of addition of numbers, which says that if aand b are two numbers, then a + b = b + a. The proof of the other results are similar, andare left as an exercise.We call the zero matrix in 2.4 the additive identity. Similarly, we call the matrix Ain 2.5 the additive inverse. A is defined to equal (1) A = [aij ]. In other words, everyentry of A is multiplied by 1. In the next section we will study scalar multiplication inmore depth to understand what is meant by (1) A.

2.1.2. Scalar Multiplication of Matrices

Recall that we use the word scalar when referring to numbers. Therefore, scalar multiplication of a matrix is the multiplication of a matrix by a number. To illustrate this concept,consider the following example in which a matrix is multiplied by the scalar 3.

12 3 436 9 123 52 8 7 = 156 24 21 6 9 1 218 27 3 6

The new matrix is obtained by multiplying every entry of the original matrix by the givenscalar.The formal definition of scalar multiplication is as follows.Definition 2.8: Scalar Multiplication of MatricesIf A = [aij ] and k is a scalar, then kA = [kaij ] .Consider the following example.Example 2.9: Effect of Multiplication by a ScalarFind the result of multiplying the following matrix A by 7.

20A=1 4Solution. By Definition 2.8, we multiply each element of A by 7. Therefore,

1407(2)7(0)20==7A = 77 287(1) 7(4)1 4

Similarly to addition of matrices, there are several properties of scalar multiplication

The proof of this proposition is similar to the proof of Proposition 2.7 and is left anexercise to the reader.

2.1.3. Multiplication of Matrices

The next important matrix operation we will explore is multiplication of matrices. Theoperation of matrix multiplication is one of the most important and useful of the matrixoperations. Throughout this section, we will also demonstrate how matrix multiplicationrelates to linear systems of equations.First, we provide a formal definition of row and column vectors.Definition 2.11: Row and Column VectorsMatrices of size n 1 or 1 n are called vectors. If X is such a matrix, then we writexi to denote the entry of X in the ith row of a column matrix, or the ith column of arow matrix.The n 1 matrix

x1

X = ... xn

is called a column vector. The 1 n matrix

X = x1 xn

is called a row vector.

64

We may simply use the term vector throughout this text to refer to either a column orrow vector. If we do so, the context will make it clear which we are referring to.In this chapter, we will again use the notion of linear combination of vectors as in Definition 1.32. In this context, a linear combination is a sum consisting of vectors multipliedby scalars. For example,

32150+9+8=7654122is a linear combination of three vectors.It turns out that we can express any system of linear equations as a linear combinationof vectors. In fact, the vectors that we will use are just the columns of the correspondingaugmented matrix!Definition 2.12: The Vector Form of a System of Linear EquationsSuppose we have a system of equations given bya11 x1 + + a1n xn = b1...am1 x1 + + amn xn = bmWe can express this system in vector form which is as follows:

a11a12a1n a21 a22 a2n

x1 .. + x2 .. + + xn .. = . . . am1am2amn

b1b2...bm

Notice that each vector used here is one column from the corresponding augmentedmatrix. There is one vector for each variable in the system, along with the constant vector.The first important form of matrix multiplication is multiplying a matrix by a vector.Consider the product given by

7

1 2 3 84 5 69We will soon see that this equals

50321=+9+87122654In general terms,

x1

a13a12a11 a12 a13 a11

x2+ x3+ x2= x1a23a22a21 a22 a23a21x3

a11 x1 + a12 x2 + a13 x3=a21 x1 + a22 x2 + a23 x365

Thus you take x1 times the first column, add to x2 times the second column, and finally x3times the third column. The above sum is a linear combination of the columns of the matrix.When you multiply a matrix on the left by a vector on the right, the numbers making up thevector are just the scalars to be used in the linear combination of the columns as illustratedabove.Here is the formal definition of how to multiply an m n matrix by an n 1 columnvector.Definition 2.13: Multiplication of Vector by MatrixLet A = [aij ] be an m n matrix and let X be an n 1 matrix given by

x1

A = [A1 An ] , X = ... xn

Then the product AX is the m 1 column vector which equals the following linearcombination of the columns of A:x1 A1 + x2 A2 + + xn An =

nX

xj Aj

j=1

If we write the columns of A in terms of their entries, they are of the form

a1j a2j

Aj = .. . amj

Then, we can write the product AX as

a11a12 a21 a22

AX = x1 .. + x2 .. . .am1am2

+ + xn

a1na2n...amn

Note that multiplication of an m n matrix and an n 1 vector produces an m 1

vector.Here is an example.

66

Example 2.14: A Vector Multiplied by a Matrix

Compute the product AX for

11 2 13 2

A = 0 2 1 2 , X = 0 2 1 411

Solution. We will use Definition 2.13 to compute the product. Therefore, we compute theproduct AX as follows.

12131 0 + 2 2 + 0 1 + 1 2 2141

1403= 0 + 4 + 0 + 2 2201

8

= 2 5Using the above operation, we can also write a system of linear equations in matrixform. In this form, we express the system as a matrix multiplied by a vector. Consider thefollowing definition.Definition 2.15: The Matrix Form of a System of Linear EquationsSuppose we have a system of equations given bya11 x1 + + a1n xn = b1a21 x1 + + a2n xn = b2...am1 x1 + + amn xn = bmThen we can express this system

a11 a12 a21 a22

.... ..am1 am2

in matrix form as follows.

a1nx1b1 x2 b2 a2n

.. .. = ...... . . amnxnbm

This is also known as The Form AX = B. The matrix A is simply the coefficientmatrix of the system. The vector X is the column vector constructed from the variables of67

the system, and the vector B is the column vector constructed from the constants of thesystem. It is important to note that any system of linear equations can be written in thisform.Notice that if we write a homogeneous system of equations in matrix form, it would havethe form AX = 0, for the zero vector 0.You can see from this definition that a vector

x1 x2

X = .. . xn

will satisfy the equation AX = B only when the entries x1 , x2 , , xn of the vector X aresolutions to the original system.Now that we have examined how to multiply a matrix by a vector, we wish to considerthe case where we multiply two matrices of more general sizes, although these sizes still needto be appropriate as we will see. For example, in Example 2.14, we multiplied a 3 4 matrixby a 4 1 vector. We want to investigate how to multiply other sizes of matrices.We have not yet given any conditions on when matrix multiplication is possible! Formatrices A and B, in order to form the product AB, the number of columns of A must equalthe number of rows of B. Consider a product AB where A has size m n and B has sizen p. Then, the product in terms of size of matrices is given bythese must match!

(m

[n)(n p ) = m p

Note the two outside numbers give the size of the product. One of the most importantrules regarding matrix multiplication is the following. If the two middle numbers dontmatch, you cant multiply the matrices!When the number of columns of A equals the number of rows of B the two matrices aresaid to be conformable and the product AB is obtained as follows.Definition 2.16: Multiplication of Two MatricesLet A be an m n matrix and let B be an n p matrix of the formB = [B1 Bp ]where B1 , ..., Bp are the n 1 columns of B. Then the m p matrix AB is defined asfollows:AB = A [B1 Bp ] = [(AB)1 (AB)p ]where (AB)k is an m 1 matrix or column vector which gives the k th column of AB.Consider the following example.

68

Example 2.17: Multiplying Two Matrices

Find AB if possible.A=

1 2 10 2 1

1 2 0,B = 0 3 1 2 1 1

Solution. The first thing you need to verify when calculating a product is whether themultiplication is possible. The first matrix has size 2 3 and the second matrix has size3 3. The inside numbers are equal, so A and B are conformable matrices. According tothe above discussion AB will be a 2 3 matrix. Definition 2.16 gives us a way to calculateeach column of AB, as follows.

First columnSecond columnThird columnz}| {}|{}|{zz

2 0

1 1 2 1

0 , 1 2 1 3 , 1 2 1 1 0 2 1

0 2 10 2 1

211 You know how to multiply a matrix times athree columns. Thus

1 21 2 1 0 30 2 12 1

vector, using Definition 2.13 for each of the

01931 =2 7 31

Since vectors are simply n1 or 1 m matrices, we can also multiply a vector by anothervector.Example 2.18: Vector Times Vector Multiplication

1

Multiply if possible 2 1 2 1 0 .1Solution. In this case we are multiplying a matrix of size 3 1 by a matrix of size 1 4. Theinside numbers match so the product is defined. Note that the product will be a matrix ofsize 3 4. Using Definition 2.16, we can compute this product as follows First column Second column Third column Fourth column z}|{ z }|{ }|{ }|{zz

1 111

2 1 2 1 0 = 2 1 , 2 2 , 2 1 , 2 0

111169

You can use Definition 2.13 to verify that

1 2 2 41 2

this product is

1 02 0 1 0

Example 2.19: A Multiplication Which is Not Defined

Find BA if possible.

1 2 0121B = 0 3 1 ,A =0 2 12 1 1Solution. First check if it is possible. This product is of the form (3 3) (2 3) . The insidenumbers do not match and so you cant do this multiplication.In this case, we say that the multiplication is not defined. Notice that these are the samematrices which we used in Example 2.17. In this example, we tried to calculate BA insteadof AB. This demonstrates another property of matrix multiplication. While the product ABmaybe be defined, we cannot assume that the product BA will be possible. Therefore, it isimportant to always check that the product is defined before carrying out any calculations.Earlier, we defined the zero matrix 0 to be the matrix (of appropriate size) containingzeros in all entries. Consider the following example for multiplication by the zero matrix.Example 2.20: Multiplication by the Zero MatrixCompute the product A0 for the matrixA=

1 23 4

0=

0 00 0

and the 2 2 zero matrix given by

Solution. In this product, we compute

0 00 01 2=0 00 03 4Hence, A0 = 0.

0. TheNotice that we could also multiply A by the 2 1 zero vector given by0result would be the 2 1 zero vector. Therefore, it is always the case that A0 = 0, for anappropriately sized zero matrix or vector.

70

2.1.4. The ij th Entry of a Product

In previous sections, we used the entries of a matrix to describe the action of matrix additionand scalar multiplication. We can also study matrix multiplication using the entries ofmatrices.What is the ij th entry of AB? It is the entry in the ith row and the j th column of theproduct AB.Now if A is m n and B is n p, then we know that the product AB has the form

b11 b12 b1j b1p

a11 a12 a1n a21 a22 a2n b21 b22 b2j b2p

..

................ .... .

.am1 am2 amnbn1 bn2 bnj bnpThe j th column of AB is of the form

a11 a12 a1n

a21 a22 a2n

........ ....am1 am2 amn

b1jb2j...bnj

which is an m 1 column vector. It is calculated by

a1na12a11 a2n a22 a21

b1j .. + b2j .. + + bnj ..

. . . amnam2am1

Therefore, the ij th entry is the entry in row i of this vector. This is computed byai1 b1j + ai2 b2j + + ain bnj =

nX

aik bkj

k=1

The following is the formal definition for the ij th entry of a product of matrices.

Definition 2.21: The ij th Entry of a Product

Another way to write this is

(AB)ij =

ai1 ai2 ain

b1jb2j...bnj

= ai1 b1j + ai2 b2j + + ain bnj

In other words, to find the (i, j)-entry of the product AB, or (AB)ij , you multiply thei row of A, on the left by the j th column of B. To express AB in terms of its entries, wewrite AB = [(AB)ij ].Consider the following example.th

Example 2.22: The Entries of a Product

Compute AB if possible. If it is, find the (3, 2)-entry of AB using Definition 2.21.

1 2231A = 3 1 ,B =7 6 22 6Solution. First check if the product is possible. It is of the form (3 2) (2 3) and since theinside numbers match, it is possible to do the multiplication. The result should be a 3 3matrix. We can first compute AB:

the resulting product. Thus the above product

which is a 3 3 matrix as desired. Thus, the (3, 2)-entry equals 42.

72

Now using Definition 2.21, we can find that the (3, 2)-entry equals2X

a3k bk2 = a31 b12 + a32 b22

k=1

= 2 3 + 6 6 = 42

Consulting our result for AB above, this is correct!

You may wish to use this method to verify that the rest of the entries in AB are correct.Here is another example.Example 2.23: Finding the Entries of a ProductDetermine if the product AB is defined. If it is, find the (2, 1)-entry of the product.

2 3 11 2A = 7 6 2 ,B = 3 1 0 0 02 6Solution. This product is of the form (3 3) (3 2). The middle numbers match so thematrices are conformable and it is possible to compute the product.We want to find the (2, 1)-entry of AB, that is, the entry in the second row and firstcolumn of the product. We will use Definition 2.21, which states(AB)ij =

nX

aik bkj

k=1

In this case, n = 3, i = 2 and j = 1. Hence the (2, 1)-entry is found by computing

3b11X

a2k bk1 = a21 a22 a23 b21 (AB)21 =k=1b31

Substituting in the appropriate values, this product becomes

b11

1a21 a22 a23 b21 = 7 6 2 3 = 1 7 + 3 6 + 2 2 = 29b312

Hence, (AB)21 = 29.

You should take a moment to find a few other entries of AB. You can multiply thematrices to check that your answers are correct. The product AB is given by

13 13AB = 29 32 0 0

73

2.1.5. Properties of Matrix Multiplication

As pointed out above, it is sometimes possible to multiply matrices in one order but not inthe other order. However, even if both AB and BA are defined, they may not be equal.Example 2.24: Matrix Multiplication is Not Commutative

0 11 2,B =Compare the products AB and BA, for matrices A =1 03 4Solution. First, notice that A and B are both of size 2 2. Therefore, both products ABand BA are defined. The first product, AB is

1 20 12 1AB ==3 41 04 3The second product, BA is

0 11 0

1 23 4

3 41 2

Therefore, AB 6= BA.This example illustrates that you cannot assume AB = BA even when multiplication isdefined in both orders. If for some matrices A and B it is true that AB = BA, then we saythat A and B commute. This is one important property of matrix multiplication.The following are other important properties of matrix multiplication. Notice that theseproperties hold only when the size of matrices are such that the products are defined.Proposition 2.25: Properties of Matrix MultiplicationThe following hold for matrices A, B, and C and for scalars r and s,

aik ckj = r (AB)ij + s (AC)ij

Thus A (rB + sC) = r(AB) + s(AC) as claimed.

The proof of 2.7 follows the same pattern and is left as an exercise.Statement 2.8 is the associative law of multiplication. Using Definition 2.21,XXXbkl cljaikaik (BC)kj =(A (BC))ij =k

(AB)il clj = ((AB) C)ij .

This proves 2.8.

2.1.6. The Transpose

Another important operation on matrices is that of taking the transpose. For a matrix A,we denote the transpose of A by AT . Before formally defining the transpose, we explorethis operation on the following matrix.

T

1 4132 3 1 =4 1 62 6

What happened? The first column became the first row and the second column becamethe second row. Thus the 3 2 matrix became a 2 3 matrix. The number 4 was in thefirst row and the second column and it ended up in the second row and first column.The definition of the transpose is as follows.Definition 2.26: The Transpose of a MatrixLet A be an m n matrix. Then AT , the transpose of A, denotes the n m matrixgiven byAT = [aij ]T = [aji ]The (i, j)-entry of A becomes the (j, i)-entry of AT .Consider the following example.Example 2.27: The Transpose of a MatrixCalculate AT for the following matrixA=

1 2 63 54

75

Solution. By Definition 2.26, we know that for A = [aij ], AT = [aji ]. In other words, weswitch the row and column location of each entry. The (1, 2)-entry becomes the (2, 1)-entry.Thus,

1 3AT = 2 5 6 4Notice that A is a 2 3 matrix, while AT is a 3 2 matrix.

The transpose of a matrix has the following important properties .

Lemma 2.28: Properties of the Transpose of a MatrixLet A be an m n matrix, B an n p matrix, and r and s scalars. ThenT1. AT = A2. (AB)T = B T AT

3. (rA + sB)T = rAT + sB T

[bik ]T [akj ]T = [bij ]T [aij ]T = B T AT

The proof of Formula 3 is left as an exercise.

The transpose of a matrix is related to other important topics. Consider the followingdefinition.Definition 2.29: Symmetric and Skew Symmetric MatricesAn n n matrix A is said to be symmetric if A = AT . It is said to be skewsymmetric if A = AT .We will explore these definitions in the following examples.Example 2.30: Symmetric MatricesLet

2135 3 A= 13 37

Use Definition 2.29 to show that A is symmetric.

76

Solution. By Definition 2.29, we need to show that A = AT . Now, using Definition 2.26,

213AT = 15 3 3 37

Hence, A = AT , so A is symmetric.

Example 2.31: A Skew Symmetric Matrix

Let

Show that A is skew symmetric.

Solution. By Definition 2.29,

01 30 2 A = 13 2 0

0 1 30 2 AT = 1320

You can see that each entry of AT is equal to 1 times the same entry of A. Hence,AT = A and so by Definition 2.29, A is skew symmetric.

2.1.7. The Identity and Inverses

There is a special matrix, denoted I, which is called to as the identity matrix. The identitymatrix is always a square matrix, and it has the property that there are ones down the maindiagonal and zeroes elsewhere. Here are some identity matrices of various sizes.

1 0 0 0

1 0 0 0 1 0 0 1 0

, 0 1 0 ,[1] , 0 0 1 0 0 10 0 10 0 0 1

The first is the 1 1 identity matrix, the second is the 2 2 identity matrix, and so on. Byextension, you can likely see what the n n identity matrix would be. When it is necessaryto distinguish which size of identity matrix is being discussed, we will use the notation Infor the n n identity matrix.The identity matrix is so important that there is a special symbol to denote the ij thentry of the identity matrix. This symbol is given by Iij = ij where ij is the Kroneckersymbol defined by

1 if i = jij =0 if i 6= j77

In is called the identity matrix because it is a multiplicative identity in the following

sense.Lemma 2.32: Multiplication by the Identity MatrixSuppose A is an m n matrix and In is the n n identity matrix. Then AIn = A. IfIm is the m m identity matrix, it also follows that Im A = A.Proof. The (i, j)-entry of AIn is given by:X

aik kj = aij

and so AIn = A. The other case is left as an exercise for you.

We now define the matrix operation which in some ways plays the role of division.Definition 2.33: The Inverse of a MatrixA square n n matrix A is said to have an inverse A1 if and only ifAA1 = A1 A = InIn this case, the matrix A is called invertible.Such a matrix A1 will have the same size as the matrix A. It is very important toobserve that the inverse of a matrix, if it exists, is unique. Another way to think of this isthat if it acts like the inverse, then it is the inverse.Theorem 2.34: Uniqueness of InverseSuppose A is an n n matrix such that an inverse A1 exists. Then there is only onesuch inverse matrix. That is, given any matrix B such that AB = BA = I, B = A1 .Proof. In this proof, it is assumed that I is the n n identity matrix. Let A, B be n nmatrices such that A1 exists and AB = BA = I. We want to show that A1 = B. Nowusing properties we have seen, we get:

Solution. To check this, multiply

showing that this matrix is indeed the inverse of A.

Unlike ordinary multiplication of numbers, it can happen that A 6= 0 but A may fail tohave an inverse. This is illustrated in the following example.Example 2.36: A Nonzero Matrix With No Inverse

1 1Let A =. Show that A does not have an inverse.1 1Solution. One might think A would have an inverse because it does not equal zero. However,note that

1 110=1 110If A1 existed, we would have the following

001= A00

11A= A1

11= A A1

1= I1

1=1

This says that

00

11

which is impossible! Therefore, A does not have an inverse.

In the next section, we will explore how to find the inverse of a matrix, if it exists.

79

2.1.8. Finding the Inverse of a Matrix

In Example 2.35, we were given A1 and asked to verify that this matrix was in fact theinverse of A. In this section, we explore how to find A1 .Let

1 1A=1 2

xzsuch thatas in Example 2.35. In order to find A1 , we need to find a matrixy w

1 1x z1 0=1 2y w0 1We can multiply these two matrices, and see that in order for this equation to be true, wemust find the solution to the systems of equations,x+y = 1x + 2y = 0and

z+w =0z + 2w = 1

Writing the augmented matrix for these two systems gives

1 1 11 2 0for the first system and

1 1 01 2 1

(2.9)

for the second.

Lets solve the first system. Take 1 times the first row and add to the second to get

11 10 1 1Now take 1 times the second row and add to the first to get

21 00 1 1Writing in terms of variables, this says x = 2 and y = 1.Now solve the second system, 2.9 to find z and w. You will find that z = 1 and w = 1.If we take the values found for x, y, z, and w and put them into our inverse matrix, wesee that the inverse is

2 1x z1=A =11y w80

After taking the time to solve the second system, you may have noticed that exactlythe same row operations were used to solve both systems. In each case, the end result wassomething of the form [I|X] where I is the identity and X gave a column of the inverse. Inthe above,

xythe first column of the inverse was obtained by solving the first system and then the secondcolumn

zw

To simplify this procedure, we could have solved both systems at once! To do so, wecould have written

1 1 1 01 2 0 1and row reduced until we obtained

1 02 10 1 11

and read off the inverse as the 2 2 matrix on the right side.This exploration motivates the following important algorithm.Algorithm 2.37: Matrix Inverse AlgorithmSuppose A is an n n matrix. To find A1 if it exists, form the augmented n 2nmatrix[A|I]If possible do row operations until you obtain an n 2n matrix of the form[I|B]When this has been done, B = A1 . In this case, we say that A is invertible. If it isimpossible to row reduce to a matrix of the form [I|B] , then A has no inverse.This algorithm shows how to find the inverse if it exists. It will also tell you if A doesnot have an inverse.Consider the following example.Example 2.38: Finding the Inverse

1 222 . Find A1 if it exists.Let A = 1 03 1 1

81

Solution. Set up the augmented matrix

1 22 1 0 0[A|I] = 1 02 0 1 0 3 1 1 0 0 1

Now we row reduce, with the goal of obtaining the 3 3 identity matrix on the left handside. First, take 1 times the first row and add to the second followed by 3 times the firstrow added to the third row. This yields

1 0 0122 0 20 1 1 0 0 5 7 3 0 1Then take 5 times the second row and add to -2 times the third row.

At this point, you can see there will be no way to obtain I on the left side of this augmentedmatrix. Hence, there is no way to complete this algorithm, and therefore the inverse of Adoes not exist. In this case, we say that A is not invertible.If the algorithm provides an inverse for the original matrix, it is always possible to checkyour answer. To do so, use the method demonstrated in Example 2.35. Check that theproducts AA1 and A1 A both equal the identity matrix. Through this method, you canalways be sure that you have calculated A1 properly!One way in which the inverse of a matrix is useful is to find the solution of a systemof linear equations. Recall from Definition 2.15 that we can write a system of equations inmatrix form, which is of the form AX = B. Suppose you find the inverse of the matrixA1 . Then you could multiply both sides of this equation on the left by A1 and simplifyto obtain(A1 ) AX = A1 B(A1 A) X = A1 BIX = A1 BX = A1 B83

Therefore we can find X, the solution to the system, by computing X = A1 B. Note

that once you have found A1 , you can easily get the solution for different right hand sides(different B). It is always just A1 B.We will explore this method of finding the solution to a system in the following example.Example 2.40: Using the Inverse to Solve a System of EquationsConsider the following system of equations. Use the inverse of a suitable matrix togive the solutions to this system.x+z =1xy+z =3x+yz =2Solution. First, we can write the system of equations in matrix form

101x1

1y = 3 =BAX = 1 111 1z2The inverse of the matrix

is

(2.10)

1011 A = 1 111 1

12

12

0 A1 =

1 111 2 12

Verifying this inverse is left as an exercise.

From here, the solution to the given system 2.10 is found by

11

5 022x2

y = A1 B = 1 1

0 3 = 2

z1 21 12 232

0What if the right side, B, of 2.10 had been 1 ? In other words, what would be the3solution to

101x0 1 1

1y = 1 ?11 1z384

By the above discussion, the solution is

y = A1 B =

given by0

12

12

02

1 10 1 = 1111 2 232

This illustrates that for a system AX = B where A1 exists, it is easy to find the solutionwhen the vector B is changed.We conclude this section with some important properties of the inverse.Theorem 2.41: Inverses of Transposes and ProductsLet A, B, and Ai for i = 1, ..., k be n n matrices.1. If A is an invertible matrix, then (AT )1 = (A1 )T2. If A and B are invertible matrices, then AB is invertible and (AB)1 = B 1 A13. If A1 , A2 , ..., Ak are invertible, then the product A1 A2 Ak is invertible, and11 1(A1 A2 Ak )1 = A1k Ak1 A2 A1Consider the following theorem.Theorem 2.42: Properties of the InverseLet A be an n n matrix and I the usual identity matrix.1. I is invertible and I 1 = I2. If A is invertible then so is A1 , and (A1 )1 = A3. If A is invertible then so is Ak , and (Ak )1 = (A1 )k4. If A is invertible and p is a nonzero real number, then pA is invertible and(pA)1 = 1p A1

2.1.9. Elementary Matrices

We now turn our attention to a special type of matrix called an elementary matrix. Anelementary matrix is always a square matrix. Recall the row operations given in Definition1.11. Any elementary matrix, which we often denote by E, is obtained from applying onerow operation to the identity matrix of the same size.For example, the matrix

0 1E=1 085

is the elementary matrix obtained from switching

1 0E= 0 30 0

the two rows. The matrix

00 1

is the elementary matrix obtained from multiplying the second row of the 3 3 identitymatrix by 3. The matrix

1 0E=3 1

is the elementary matrix obtained from adding 3 times the first row to the third row.You may construct an elementary matrix from any row operation, but remember thatyou can only apply one operation.Consider the following definition.Definition 2.43: Elementary Matrices and Row OperationsLet E be an n n matrix. Then E is an elementary matrix if it is the result ofapplying one row operation to the n n identity matrix In .Those which involve switching rows of the identity matrix are called permutationmatrices.Therefore, E constructed above by switching the two rows of I2 is called a permutationmatrix.Elementary matrices can be used in place of row operations and therefore are very useful.It turns out that multiplying (on the left hand side) by an elementary matrix E will havethe same effect as doing the row operation used to obtain E.The following theorem is an important result which we will use throughout this text.Theorem 2.44: Multiplication by an Elementary Matrix andRow OperationsTo perform any of the three row operations on a matrix A it suffices to take the productEA, where E is the elementary matrix obtained by using the desired row operationon the identity matrix.Therefore, instead of performing row operations on a matrix A, we can row reduce throughmatrix multiplication with the appropriate elementary matrix. We will examine this theoremin detail for each of the three row operations given in Definition 1.11.First, consider the following lemma.

86

Lemma 2.45: Action of Permutation Matrix

Let P ij denote the elementary matrix which involves switching the ith and the j throws. Then P ij is a permutation matrix andP ij A = Bwhere B is obtained from A by switching the ith and the j th rows.We will explore this idea more in the following example.Example 2.46: Switching Rows with an Elementary MatrixLetP 12Find B where B = P 12 A.

0 1 0a b= 1 0 0 ,A = g d 0 0 1e f

Solution. You can see that the matrix P 12 is obtained by switching the first and second rowsof the 3 3 identity matrix I.Using our usual procedure, compute the product P 12 A = B. The result is given by

g dB= a b e f

Notice that B is the matrix obtained by switching rows 1 and 2 of A. Therefore by multiplying A by P 12 , the row operation which was applied to I to obtain P 12 is applied to A toobtain B.

Theorem 2.44 applies to all three row operations, and we now look at the row operationof multiplying a row by a scalar. Consider the following lemma.Lemma 2.47: Multiplication by a Scalar and Elementary MatricesLet E (k, i) denote the elementary matrix corresponding to the row operation in whichthe ith row is multiplied by the nonzero scalar, k. ThenE (k, i) A = Bwhere B is obtained from A by multiplying the ith row of A by k.We will explore this lemma further in the following example.

87

Example 2.48: Multiplication of a Row by 5 Using Elementary Matrix

Let

1 0 0a bE (5, 2) = 0 5 0 , A = c d 0 0 1e f

Find the matrix B where B = E (5, 2) A

Solution. You can see that E (5, 2) is obtained by multiplying the second row of the identitymatrix by 5.Using our usual procedure for multiplication of matrices, we can compute the productE (5, 2) A. The resulting matrix is given by

a bB = 5c 5d e fNotice that B is obtained by multiplying the second row of A by the scalar 5.

There is one last row operation to consider. The following lemma discusses the finaloperation of adding a multiple of a row to another row.Lemma 2.49: Adding Multiples of Rows and Elementary MatricesLet E (k i + j) denote the elementary matrix obtained from I by adding k times theith row to the j th . ThenE (k i + j) A = Bwhere B is obtained from A by adding k times the ith row to the j th row of A.Consider the following example.Example 2.50: Adding Two Times the First Row to the LastLet

1 0 0a bE (2 1 + 3) = 0 1 0 , A = c d 2 0 1e f

Find B where B = E (2 1 + 3) A.

Solution. You can see that the matrix E (2 1 + 3) was obtained by adding 2 times the firstrow of I to the third row of I.Using our usual procedure, we can compute the product E (2 1 + 3) A. The resultingmatrix B is given by

ab

cdB=2a + e 2b + f88

You can see that B is the matrix obtained by adding 2 times the first row of A to thethird row.Suppose we have applied a row operation to a matrix A. Consider the row operationrequired to return A to its original form, to undo the row operation. It turns out that thisaction is how we find the inverse of an elementary matrix E.Consider the following theorem.Theorem 2.51: Elementary Matrices and InversesEvery elementary matrix is invertible and its inverse is also an elementary matrix.In fact, the inverse of an elementary matrix is constructed by doing the reverse rowoperation on I. E 1 will be obtained by performing the row operation which would carry Eback to I. If E is obtained by switching rows i and j, then E 1 is also obtained by switching rowsi and j. If E is obtained by multiplying row i by the scalar k, then E 1 is obtained by multiplying row i by the scalar k1 . If E is obtained by adding k times row i to row j, then E 1 is obtained by subtractingk times row i from row j.Consider the following example.Example 2.52: Inverse of an Elementary MatrixLetE=Find E 1 .

1 00 2

Solution. Consider the elementary matrix E given by

1 0E=0 2Here, E is obtained from the 2 2 identity matrix by multiplying the second row by 2. Inorder to carry E back to the identity, we need to multiply the second row of E by 21 . Hence,E 1 is given by#"10E 1 = 0 12We can verify that EE 1 = I. Take the product EE 1 , given by# "

10101 01EE =0 21 = 0 10 289

This equals I so we know that we have compute E 1 properly.

Suppose an m n matrix A is row reduced to its reduced row-echelon form. By trackingeach row operation completed, this row reduction can be completed through multiplicationby elementary matrices. Consider the following definition.Definition 2.53: The Form B = UALet A be an m n matrix and let B be the reduced row-echelon form of A. Then wecan write B = UA where U is the product of all elementary matrices representing therow operations done to A to obtain B.Consider the following example.Example 2.54: The Form B = UA

0 1Let A = 1 0 . Find B, the reduced row-echelon form of A and write it in the2 0form B = UA.Solution. To find B, row reduce A. For each step,matrix. First, switch rows 1 and 2.

It remains to find the matrix U.

U = E(2 1 + 2)P 12

1 0 00 1 0= 0 1 0 1 0 0 2 0 10 0 1

01 0

10 0 =0 2 1

We can verify that B = UA holds for this matrix U:

01 00 10 0 1 0 UA = 10 2 12 0

1 0= 0 1 0 0= B

While the process used in the above example is reliable and simple when only a few rowoperations are used, it becomes cumbersome in a case where many row operations are neededto carry A to B. The following theorem provides an alternate way to find the matrix U.Theorem 2.55: Finding the Matrix ULet A be an m n matrix and let B be its reduced row-echelon form. Then B = UAwhere U is an invertible m m matrix found by forming the matrix [A|Im ] and rowreducing to [B|U].Lets revisit the above example using the process outlined in Theorem 2.55.Example 2.56: The Form B = UA, Revisited

Now, row reduce this matrix until the left side equals the reduced row-echelon form of A.

0 1 1 0 01 0 0 1 0 0 1 0 0 1 12 0 0 0 12 0 0

1 0 0 0 1 10 0 0

1 00 0 0 1

1 00 0 2 1

The left side of this matrix is B, and the right side is U. Comparing this to the matrixU found above in Example 2.54, you can see that the same matrix is obtained regardless ofwhich process is used.Recall from Algorithm 2.37 that an n n matrix A is invertible if and only if A canbe carried to the n n identity matrix using the usual row operations. This leads to animportant consequence related to the above discussion.Suppose A is an n n invertible matrix. Then, set up the matrix [A|In ] as done above,and row reduce until it is of the form [B|U]. In this case, B = In because A is invertible.B = UAIn = UA1U= ANow suppose that U = E1 E2 Ek where each Ei is an elementary matrix representinga row operation used to carry A to I. Then,U 1 = (E1 E2 Ek )1 = Ek1 E21 E1 1

Remember that if Ei is an elementary matrix, so too is Ei1 . It follows that

A = U 1= Ek1 E21 E1 1and A can be written as a product of elementary matrices.Theorem 2.57: Product of Elementary MatricesLet A be an n n matrix. Then A is invertible if and only if it can be written as aproduct of elementary matrices.Consider the following example.Example 2.58: Product of Elementary Matrices

Solution. We will use the process outlined in Theorem 2.55 to write A as a product ofelementary matrices. We will set up the matrix [A|I] and row reduce, recording each rowoperation as an elementary matrix.First:

11 0 0 1 001 0 1 0 0 11 0 0 1 0 01 0 1 0 0 0 2 1 0 0 10 2 1 0 0 1

0 1 0represented by the elementary matrix E1 = 1 0 0 .0 0 1Secondly:

10 0 1 1 011 0 0 1 0 01 0 0 1 0 1 0 0 01 00 2 1 0 0 10 2 10 0 1

1 1 0

1 0 .represented by the elementary matrix E2 = 000 1Finally:

10 0 1 1 01 0 0 1 1 0 01 01 0 0 0 1 01 0 0 0 2 10 0 10 0 12 0 1

1 0 0represented by the elementary matrix E3 = 0 1 0 .0 2 1Notice that the reduced row-echelon form of A is I. Hence I = UA where U is theproduct of the above elementary matrices. It follows that A = U 1 . Since we want to writeA as a product of elementary matrices, we wish to express U 1 as a product of elementarymatrices.U 1 = (E3 E2 E1 )1= E11 E21 E31

2.1.10. More on Matrix Inverses

In this section, we will prove three theorems which will clarify the concept of matrix inverses.In order to do this, first recall some important properties of elementary matrices.93

Recall that an elementary matrix is a square matrix obtained by performing an elementary operation on an identity matrix. Each elementary matrix is invertible, and its inverseis also an elementary matrix. If E is an m m elementary matrix and A is an m n matrix,then the product EA is the result of applying to A the same elementary row operation thatwas applied to the m m identity matrix in order to obtain E.Let R be the reduced row-echelon form of an m n matrix A. R is obtained by iteratively applying a sequence of elementary row operations to A. Denote by E1 , E2 , , Ek theelementary matrices associated with the elementary row operations which were applied, in order, to the matrix A to obtain the resulting R. We then have that R = (Ek (E2 (E1 A))) =Ek E2 E1 A. Let E denote the product matrix Ek E2 E1 so that we can write R = EAwhere E is an invertible matrix whose inverse is the product (E1 )1 (E2 )1 (Ek )1 .Now, we will consider some preliminary lemmas.Lemma 2.59: Invertible Matrix and ZerosSuppose that A and B are matrices such that the product AB is an identity matrix.Then the reduced row-echelon form of A does not have a row of zeros.Proof: Let R be the reduced row-echelon form of A. Then R = EA for some invertiblesquare matrix E as described above. By hypothesis AB = I where I is an identity matrix,so we have a chain of equalitiesR(BE 1 ) = (EA)(BE 1 ) = E(AB)E 1 = EIE 1 = EE 1 = IIf R would have a row of zeros, then so would the product R(BE 1 ). But since the identitymatrix I does not have a row of zeros, neither can R have one. We now consider a second important lemma.Lemma 2.60: Size of Invertible MatrixSuppose that A and B are matrices such that the product AB is an identity matrix.Then A has at least as many columns as it has rows.Proof: Let R be the reduced row-echelon form of A. By Lemma 2.59, we know that Rdoes not have a row of zeros, and therefore each row of R has a leading 1. Since each columnof R contains at most one of these leading 1s, R must have at least as many columns as ithas rows. An important theorem follows from this lemma.Theorem 2.61: Invertible Matrices are SquareOnly square matrices can be invertible.Proof: Suppose that A and B are matrices such that both products AB and BA areidentity matrices. We will show that A and B must be square matrices of the same size. Letthe matrix A have m rows and n columns, so that A is an m n matrix. Since the product94

AB exists, B must have n rows, and since the product BA exists, B must have m columnsso that B is an n m matrix. To finish the proof, we need only verify that m = n.We first apply Lemma 2.60 with A and B, to obtain the inequality m n. We then applyLemma 2.60 again (switching the order of the matrices), to obtain the inequality n m. Itfollows that m = n, as we wanted. Of course, not all square matrices are invertible. In particular, zero matrices are notinvertible, along with many other square matrices.The following proposition will be useful in proving the next theorem.Proposition 2.62: Reduced Row-Echelon Form of a Square MatrixIf R is the reduced row-echelon form of a square matrix, then either R has a row ofzeros or R is an identity matrix.The proof of this proposition is left as an exercise to the reader. We now consider thesecond important theorem of this section.Theorem 2.63: Unique Inverse of a MatrixSuppose A and B are square matrices such that AB = I where I is an identity matrix.Then it follows that BA = I. Further, both A and B are invertible and B = A1 andA = B 1 .Proof: Let R be the reduced row-echelon form of a square matrix A. Then, R = EAwhere E is an invertible matrix. Since AB = I, Lemma 2.59 gives us that R does not havea row of zeros. By noting that R is a square matrix and applying Proposition 2.62, we seethat R = I. Hence, EA = I.Using both that EA = I and AB = I, we can finish the proof with a chain of equalitiesas given byBA = IBIA ====

(EA)B(E 1 E)AE(AB)E 1 (EA)EIE 1 IEE 1 = I

It follows from the definition of the inverse of a matrix that B = A1 and A = B 1 .

This theorem is very useful, since with it we need only test one of the products AB orBA in order to check that B is the inverse of A. The hypothesis that A and B are squarematrices is very important, and without this the theorem does not hold.We will now consider an example.

95

Example 2.64: Non Square Matrices

Let

Show that AT A = I but AAT 6= 0.

1 0A = 0 1 ,0 0

Solution. Consider the product AT A given by

However, the product AAT is

00 0

Hence AAT is not the 3 3 identity matrix. This shows that for Theorem 2.63, it is essentialthat both matrices be square and of the same size.Is it possible to have matrices A and B such that AB = I, while BA = 0? This questionis left to the reader to answer, and you should take a moment to consider the answer.We conclude this section with an important theorem.Theorem 2.65: The Reduced Row-Echelon Form of an Invertible MatrixFor any matrix A the following conditions are equivalent: A is invertible The reduced row-echelon form of A is an identity matrixProof. In order to prove this, we show that for any given matrix A, each condition implies theother. We first show that if A is invertible, then its reduced row-echelon form is an identitymatrix, then we show that if the reduced row-echelon form of A is an identity matrix, thenA is invertible.If A is invertible, there is some matrix B such that AB = I. By Lemma 2.59, we getthat the reduced row-echelon form of A does not have a row of zeros. Then by Theorem2.61, it follows that A and the reduced row-echelon form of A are square matrices. Finally,by Proposition 2.62, this reduced row-echelon form of A must be an identity matrix. Thisproves the first implication.Now suppose the reduced row-echelon form of A is an identity matrix I. Then I = EAfor some product E of elementary matrices. By Theorem 2.63, we can conclude that A isinvertible.96

Theorem 2.65 corresponds to Algorithm 2.37, which claims that A1 is found by rowreducing the augmented matrix [A|I] to the form [I|A1 ]. This will be a matrix productE [A|I] where E is a product of elementary matrices. By the rules of matrix multiplication,we have that E [A|I] = [EA|EI] = [EA|E].It follows that the reduced row-echelon form of [A|I] is [EA|E], where EA gives thereduced row-echelon form of A. By Theorem 2.65, if EA 6= I, then A is not invertible, andif EA = I, A is invertible. If EA = I, then by Theorem 2.63, E = A1 . This proves thatAlgorithm 2.37 does in fact find A1 .

2.1.11. Exercises1. For the following pairs of matrices, determine if the sum A + B is defined. If so, findthe sum.

1 00 1(a) A =,B =0 11 0

1 0 32 1 2,B =(b) A =0 1 41 1 0

1 0271(c) A = 2 3 , B =0 344 22. For each matrix A, find the matrix A such that A + (A) = 0.

1 2(a) A =2 1

2 3(b) A =0 2

01 2(c) A = 1 1 3 42 0

3. In the context of Proposition 2.7, describe A and 0.

4. For each matrix A, find the product (2)A, 0A, and 3A.

1 2(a) A =2 1

2 3(b) A =0 2

01 2(c) A = 1 1 3 42 097

5. Using only the properties given in Proposition 2.7 and Proposition 2.10, show A isunique.6. Using only the properties given in Proposition 2.7 and Proposition 2.10 show 0 isunique.7. Using only the properties given in Proposition 2.7 and Proposition 2.10 show 0A = 0.Here the 0 on the left is the scalar 0 and the 0 on the right is the zero matrix ofappropriate size.8. Using only the properties given in Proposition 2.7 and Proposition 2.10, as well asprevious problems show (1) A = A.

1 23 1 21 2 3,D =,C =,B =9. Consider the matrices A =3 132 12 1 7

122,E =.2 33Find the following if possible. If it is not possible explain why.(a) 3A

(b) 3B A(c) AC

(d) CB(e) AE(f) EA

1212252,D =10. Consider the matrices A = 32 ,B =,C =5 032 111

111,E =4 33Find the following if possible. If it is not possible explain why.(a) 3A

23. Write the system

24. A matrix A is called idempotent if A2 = A.

25. For each pair of matrices, find the (1, 2)-entry and (2, 3)-entry of the product AB.

1 2 14 6 20 ,B = 7 21 (a) A = 3 42 511 00

1 3 12 3 0

(b) A = 0 2 4 , B = 4 16 1 1 0 50 2 2

26. Suppose A and B are square matrices of the same size. Which of the following arenecessarily true?(a) (A B)2 = A2 2AB + B 2

(b) (AB)2 = A2 B 2

(c) (A + B)2 = A2 + 2AB + B 2

(d) (A + B)2 = A2 + AB + BA + B 2(e) A2 B 2 = A (AB) B100

(f) (A + B)3 = A3 + 3A2 B + 3AB 2 + B 3

(g) (A + B) (A B) = A2 B 2

121 22 5 2

32 ,B =,D =27. Consider the matrices A =,C =5 032 111

111,E =34 3Find the following if possible. If it is not possible explain why.(a) 3AT

(b) 3B AT(c) E T B

(d) EE T(e) B T B(f) CAT(g) D T BE28. Let A be an nn matrix. Show A equals the sum of a symmetric and a skew symmetricmatrix. Hint: Show that 12 AT + A is symmetric and then consider using this asone of the matrices.29. Show that the main diagonal of every skew symmetric matrix consists of only zeros.Recall that the main diagonal consists of every entry of the matrix which is of the formaii .30. Prove 3. That is, show that for an m n matrix A, an n p matrix B, and scalarsr, s, the following holds:(rA + sB)T = rAT + sB T31. Prove that Im A = A where A is an m n matrix.32. Suppose AB = AC and A is an invertible n n matrix. Does it follow that B = C?Explain why or why not.33. Suppose AB = AC and A is a non invertible n n matrix. Does it follow that B = C?Explain why or why not.34. Give an example of a matrix A such that A2 = I and yet A 6= I and A 6= I.35. LetA=

2 11 3

Find A1 if possible. If A1 does not exist, explain why.

101

36. LetA=

0 15 3

2 13 0

2 14 2

Find A1 if possible. If A1 does not exist, explain why.

37. LetA=

Find A1 if possible. If A1 does not exist, explain why.

38. LetA=

Find A1 if possible. If A1 does not exist, explain why.

a b. Find a formula for A1 in39. Let A be a 2 2 invertible matrix, with A =c dterms of a, b, c, d.40. Let

1 2 3A= 2 1 4 1 0 2

Find A1 if possible. If A1 does not exist, explain why.

41. Let

1 0 3A= 2 3 4 1 0 2

Find A1 if possible. If A1 does not exist, explain why.

42. Let

1 2 3A= 2 1 4 4 5 10

Find A1 if possible. If A1 does not exist, explain why.

43. Let

1 1A= 21

20 212 0

1 3 2 21 2

Find A1 if possible. If A1 does not exist, explain why.

44. Using the inverse of the matrix, find the solution to the systems:

102

(a)

(b)

2 41 1

xy

2 41 1

xy

12

20

Now give the solution in terms of a and b to

2 4xa=1 1yb45. Using the inverse of the matrix, find the solution to the systems:(a)

(b)

1 0 3x1 2 3 4 y = 0 1 0 2z1

1 0 3x3 2 3 4 y = 1 1 0 2z2

Now give the solution in terms of

1 21

a, b, and c to the following:

0 3xa3 4 y = b 0 2zc

46. Show that if A is an n n invertible matrix and X is a n 1 matrix such that AX = B

for B an n 1 matrix, then X = A1 B.47. Prove that if A1 exists and AX = 0 then X = 0.48. Show that if A1 exists for an n n matrix, then it is unique. That is, if BA = I andAB = I, then B = A1 .1T49. Show that if A is an invertible n n matrix, then so is AT and AT= (A1 ) .

12 15 1 . Suppose a row operation is applied to A and the result57. Let A = 02 1 4

12 1B = 2 1 4 .05 1

is

(a) Find the elementary matrix E such that EA = B.

(b) Find the

158. Let A = 02

12B = 0 102 1

(a) Find the

(b) Find the

159. Let A = 02

12

05B=1 12

inverse of E, E 1 , such that E 1 B = A.

2 15 1 . Suppose a row operation is applied to A and the result is1 4

12 .4

elementary matrix E such that EA = B.

inverse of E, E 1 , such that E 1 B = A.

2 15 1 . Suppose a row operation is applied to A and the result is1 4

11 .2104

(a) Find the elementary matrix E such that EA = B.

(b) Find the

060. Let A =2

12

4B= 22 1

inverse of E, E 1 , such that E 1 B = A.

2 15 1 . Suppose a row operation is applied to A and the result is1 415 .4

(a) Find the elementary matrix E such that EA = B.

(b) Find the inverse of E, E 1 , such that E 1 B = A.

105

106

3. Determinants3.1 Basic Techniques and Properties

OutcomesA. Evaluate the determinant of a square matrix using either Laplace Expansion orrow operations.B. Demonstrate the effects that row operations have on determinants.C. Verify the following:(a) The determinant of a product of matrices is the product of the determinants.(b) The determinant of a matrix is equal to the determinant of its transpose.

3.1.1. Cofactors and 2 2 Determinants

Let A be an nn matrix. That is, let A be a square matrix. The determinant of A, denotedby det (A) is a very important number which we will explore throughout this section.If A is a 22 matrix, the determinant is given by the following formula.Definition 3.1: Determinant of a Two By Two Matrix

a b. ThenLet A =c ddet (A) = ad cbThe determinant is also often denoted by enclosing the matrix with two vertical lines.Thus

a b a b = ad bc= detc d c dThe following is an example of finding the determinant of a 2 2 matrix.

107

Example 3.2: A Two by Two Determinant

2 4Find det (A) for the matrix A =.1 6Solution. From Definition 3.1,det (A) = (2) (6) (1) (4) = 12 + 4 = 16The 2 2 determinant can be used to find the determinant of larger matrices. We willnow explore how to find the determinant of a 3 3 matrix, using several tools including the2 2 determinant.We begin with the following definition.Definition 3.3: The ij th Minor of a MatrixLet A be a 3 3 matrix. The ij th minor of A, denoted as minor (A)ij , is thedeterminant of the 2 2 matrix which results from deleting the ith row and the j thcolumn of A.In general, if A is an n n matrix, then the ij th minor of A is the determinant of then 1 n 1 matrix which results from deleting the ith row and the j th column of A.Hence, there is a minor associated with each entry of A. Consider the following examplewhich demonstrates this definition.Example 3.4: Finding Minors of a MatrixLet

1 2 3A= 4 3 2 3 2 1

Find minor (A)12 and minor (A)23 .

Solution. First we will find minor (A)12 . By Definition 3.3, this is the determinant of the2 2 matrix which results when you delete the first row and the second column. This minoris given by

4 2minor (A)12 = det3 1

Using Definition 3.1, we see that

4 2= (4) (1) (3) (2) = 4 6 = 2det3 1Therefore minor (A)12 = 2.108

Similarly, minor (A)23 is the determinant of the 2 2 matrix which results when youdelete the second row and the third column. This minor is therefore

1 2minor (A)23 = det= 43 2Finding the other minors of A is left as an exercise.The ij th minor of a matrix A is used in another important definition, given next.Definition 3.5: The ij th Cofactor of a MatrixSuppose A is an n n matrix. The ij th cofactor, denoted by cof (A)ij is defined tobecof (A)ij = (1)i+j minor (A)ij

It is also convenient to refer to the cofactor of an entry of a matrix as follows. If aij is

the ij th entry of the matrix, then its cofactor is just cof (A)ij .Example 3.6: Finding Cofactors of a MatrixConsider the matrix

Find cof (A)12 and cof (A)23 .

1 2 3A= 4 3 2 3 2 1

Solution. We will use Definition 3.5 to compute these cofactors.

First, we will compute cof (A)12 . Therefore, we need to find minor (A)12 . This is thedeterminant of the 2 2 matrix which results when you delete the first row and the secondcolumn. Thus minor (A)12 is given by

4 2= 2det3 1Then,

cof (A)12 = (1)1+2 minor (A)12 = (1)1+2 (2) = 2

Hence, cof (A)12 = 2.

Similarly, we can find cof (A)23 . First, find minor (A)23 , which is the determinant of the2 2 matrix which results when you delete the second row and the third column. This minoris therefore

1 2det= 43 2Hence,

cof (A)23 = (1)2+3 minor (A)23 = (1)2+3 (4) = 4

109

You may wish to find the remaining cofactors for the above matrix. Remember thatthere is a cofactor for every entry in the matrix.We have now established the tools we need to find the determinant of a 3 3 matrix.Definition 3.7: The Determinant of a Three By Three MatrixLet A be a 3 3 matrix. Then, det (A) is calculated by picking a row (or column) andtaking the product of each entry in that row (column) with its cofactor and addingthese products together.This process when applied to the ith row (column) is known as expanding alongthe ith row (column) as is given bydet (A) = ai1 cof(A)i1 + ai2 cof(A)i2 + ai3 cof(A)i3When calculating the determinant, you can choose to expand any row or any column.Regardless of your choice, you will always get the same number which is the determinantof the matrix A. This method of evaluating a determinant by expanding along a row or acolumn is called Laplace Expansion or Cofactor Expansion.Consider the following example.Example 3.8: Finding the Determinant of a Three by Three MatrixLet

1 2 3A= 4 3 2 3 2 1

Find det (A) using the method of Laplace Expansion.

Solution. First, we will calculate det (A) by expanding along the first column. Using Definition 3.7, we take the 1 in the first column and multiply it by its cofactor,

1+1 3 2 1 (1) = (1)(1)(1) = 12 1

Similarly, we take the 4 in the first column and multiply it by its cofactor, as well as withthe 3 in the first column. Finally, we add these numbers together, as given in the followingequation.cof(A)21cof(A)31cof(A)11}|{z}|{z}|{

3 2 2+1 2 3 3+1 2 3 (1)(1)+4+3det (A) = 1(1)1+1 2 1 3 2 2 1 z

Calculating each of these, we obtain

det (A) = 1 (1) (1) + 4 (1) (4) + 3 (1) (5) = 1 + 16 + 15 = 0

110

Hence, det (A) = 0.

As mentioned in Definition 3.7, we can choose to expand along any row or column. Letstry now by expanding along the second row. Here, we take the 4 in the second row andmultiply it to its cofactor, then add this to the 3 in the second row multiplied by its cofactor,and the 2 in the second row multiplied by its cofactor. The calculation is as follows.cof(A)21cof(A)22cof(A)23zz}|{}|{}|{

231312 + 3(1)2+2 + 2(1)2+3

det (A) = 4(1)2+1

2 13 13 2 z

Calculating each of these products, we obtain

det (A) = 4 (1) (2) + 3 (1) (8) + 2 (1) (4) = 0

You can see that for both methods, we obtained det (A) = 0.As mentioned above, we will always come up with the same value for det (A) regardlessof the row or column we choose to expand along. You should try to compute the abovedeterminant by expanding along other rows and columns. This is a good way to check yourwork, because you should come up with the same number each time!We present this idea formally in the following theorem.Theorem 3.9: The Determinant is Well DefinedExpanding the n n matrix along any row or column always gives the same answer,which is the determinant.We have now looked at the determinant of 2 2 and 3 3 matrices. It turns out thatthe method used to calculate the determinant of a 3 3 matrix can be used to calculate thedeterminant of any sized matrix. Notice that Definition 3.3, Definition 3.5 and Definition3.7 can all be applied to a matrix of any size.For example, the ij th minor of a 4 4 matrix is the determinant of the 3 3 matrix youobtain when you delete the ith row and the j th column. Just as with the 3 3 determinant,we can compute the determinant of a 4 4 matrix by Laplace Expansion, along any row orcolumnConsider the following example.Example 3.10: Determinant of a Four by Four MatrixFind det (A) where

1 5A= 13

2434

111

3243

43

5 2

Solution. As in the case of a 3 3 matrix, you can expand this along any row or column.Lets pick the third column. Then, using Laplace Expansion,

1 2 4 5 4 3

det (A) = 3 (1)1+3 1 3 5 + 2 (1)2+3 1 3 5 + 3 4 2 3 4 2

1 2 4 1 2 4

4 (1)3+3 5 4 3 + 3 (1)4+3 5 4 3 3 4 2 1 3 5

Now, you can calculate each 33 determinant using Laplace Expansion, as we did above.You should complete these as an exercise and verify that det (A) = 12.

The following provides a formal definition for the determinant of an n n matrix. Youmay wish to take a moment and consider the above definitions for 22 and 33 determinantsin context of this definition.Definition 3.11: The Determinant of an n n MatrixLet A be an n n matrix where n 2 and suppose the determinant of an (n 1) (n 1) has been defined. Thendet (A) =

nX

aij cof (A)ij =

j=1

nX

aij cof (A)ij

i=1

The first formula consists of expanding the determinant along the ith row and thesecond expands the determinant along the j th column.In the following sections, we will explore some important properties and characteristicsof the determinant.

3.1.2. The Determinant of a Triangular Matrix

There is a certain type of matrix for which finding the determinant is a very simple procedure.Consider the following definition.

112

Definition 3.12: Triangular Matrices

A matrix A is upper triangular if aij = 0 whenever i > j. Thus the entries of sucha matrix below the main diagonal equal 0, as shown. Here, refers to any nonzeronumber.

0 .. . .

.. .. ... 0 0 A lower triangular matrix is defined similarly as a matrix for which all entries abovethe main diagonal are equal to zero.

The following theorem provides a useful way to calculate the determinant of a triangularmatrix.Theorem 3.13: Determinant of a Triangular MatrixLet A be an upper or lower triangular matrix. Then det (A) is obtained by taking theproduct of the entries on the main diagonal.The verification of this Theorem can be done by computing the determinant using LaplaceExpansion along the first row or column.Consider the following example.Example 3.14: Determinant of a Triangular MatrixLet

Find det (A) .

1 0A= 00

2200

37767

3 33.7 0 1

Solution. From Theorem 3.13, it suffices to take the product of the elements on the maindiagonal. Thus det (A) = 1 2 3 (1) = 6.Without using Theorem 3.13, you could use Laplace Expansion. We will expand alongthe first column. This gives

2 6

2 3777

2+1

det (A) =1 0 3 33.7 + 0 (1) 0 3 33.7 + 0 0 1 0 0 1

2 3 2 3 77 77

7 7 + 0 (1)4+1 2 60 (1)3+1 2 6 0 3 33.7 0 0 1 113

and the only nonzero term in the expansion is

2 67

1 0 3 33.7 0 0 1

Now find the determinant of this 3 3 matrix, by expanding along the first column to obtain

3 33.7 6

6

772+13+1 + 0 (1) + 0 (1)

det (A) = 1 2 0 1 3 33.7 0 1

3 33.7

= 120 1

Next use Definition 3.1 to find the determinant of this 2 2 matrix, which is just 3 1 0 33.7 = 3. Putting all these steps together, we havedet (A) = 1 2 3 (1) = 6which is just the product of the entries down the main diagonal of the original matrix!

You can see that while both methods result in the same answer, Theorem 3.13 providesa much quicker method.In the next section, we explore some important properties of determinants.

3.1.3. Properties of Determinants

There are many important properties of determinants. Since many of these properties involvethe row operations discussed in Chapter 1, we recall that definition now.Definition 3.15: Row OperationsThe row operations consist of the following1. Switch two rows.2. Multiply a row by a nonzero number.3. Replace a row by a multiple of another row added to itself.We will now consider the effect of row operations on the determinant of a matrix. Infuture sections, we will see that using the following properties can greatly assist in findingdeterminants.The first theorem explains the affect on the determinant of a matrix when two rows areswitched.

114

Theorem 3.16: Switching Rows

Let A be an n n matrix and let B be a matrix which results from switching two rowsof A. Then det (B) = det (A) .When we switch two rows of a matrix, the determinant is multiplied by 1. Considerthe following example.Example 3.17: Switching Two Rows

3 41 2. Knowing that det (A) = 2, find det (B).and let B =Let A =1 23 4Solution. By Definition 3.1, det (A) = 1 4 3 2 = 2. Notice that the rows of B arethe rows of A but switched. By Theorem 3.16 since two rows of A have been switched,det (B) = det (A) = (2) = 2. You can verify this using Definition 3.1.The next theorem demonstrates the effect on the determinant of a matrix when wemultiply a row by a scalar.Theorem 3.18: Multiplying a Row by a ScalarLet A be an n n matrix and let B be a matrix which results from multiplying somerow of A by a scalar k. Then det (B) = k det (A).Notice that this theorem is true when we multiply one row of the matrix by k. If we wereto multiply two rows of A by k to obtain B, we would have det (B) = k 2 det (A). Supposewe were to multiply all n rows of A by k to obtain the matrix B, so that B = kA. Then,det (B) = k n det (A). This gives the next theorem.Theorem 3.19: Scalar MultiplicationLet A and B be n n matrices and k a scalar, such that B = kA. Then det(B) =k n det(A).Consider the following example.Example 3.20: Multiplying a Row by 5

Now, lets compute det (B) using Theorem 3.18 and see if we obtain the same answer.Notice that the first row of B is 5 times the first row of A, while the second row of B isequal to the second row of A. By Theorem 3.18, det (B) = 5 det (A) = 5 2 = 10.You can see that this matches our answer above.Finally, consider the next theorem for the last row operation, that of adding a multipleof a row to another row.Theorem 3.21: Adding a Multiple of a Row to Another RowLet A be an n n matrix and let B be a matrix which results from adding a multipleof a row to another row. Then det (A) = det (B).Therefore, when we add a multiple of a row to another row, the determinant of the matrixis unchanged. Note that if a matrix A contains a row which is a multiple of another row,det (A) will equal 0. To see this, suppose the first row of A is equal to 1 times the secondrow. By Theorem 3.21, we can add the first row to the second row, and the determinantwill be unchanged. However, this row operation will result in a row of zeros. Using LaplaceExpansion along the row of zeros, we find that the determinant is 0.Consider the following example.Example 3.22: Adding a Row to Another Row

1 21 2. Find det (B).and let B =Let A =5 83 4Solution. By Definition 3.1, det (A) = 2. Notice that the second row of B is two timesthe first row of A added to the second row. By Theorem 3.16, det (B) = det (A) = 2. Asusual, you can verify this answer using Definition 3.1.Example 3.23: Multiple of a Row

1 2. Show that det (A) = 0.Let A =2 4Solution. Using Definition 3.1, the determinant is given bydet (A) = 1 4 2 2 = 0However notice that the second row is equal to 2 times the first row. Then by thediscussion above following Theorem 3.21 the determinant will equal 0.Until now, our focus has primarily been on row operations. However, we can carry out thesame operations with columns, rather than rows. The three operations outlined in Definition3.15 can be done with columns instead of rows. In this case, in Theorems 3.16, 3.18, and3.21 you can replace the word, row with the word column.116

There are several other major properties of determinants which do not involve row (orcolumn) operations. The first is the determinant of a product of matrices.Theorem 3.24: Determinant of a ProductLet A and B be two n n matrices. Thendet (AB) = det (A) det (B)In order to find the determinant of a product of matrices, we can simply take the productof the determinants.Consider the following example.Example 3.25: The Determinant of a ProductCompare det (AB) and det (A) det (B) for

3 21 2,B =A=4 13 2Solution. First compute AB, which is given by

1 23 2114AB ==3 24 11 4and so by Definition 3.1det (AB) = det

Now

1141 4

det (A) = det

1 23 2

det (B) = det

3 24 1

and

= 40

=8

= 5

Computing det (A) det (B) we have 8 5 = 40. This is the same answer as aboveand you can see that det (A) det (B) = 8 (5) = 40 = det (AB).Consider the next important property.Theorem 3.26: Determinant of the TransposeLet A by a matrix where AT is the transpose of A. Then,

det AT = det (A)117

This theorem is illustrated in the following example.

Example 3.27: Determinant of the TransposeLetA=

Find det AT .Solution. First, note thatT

A =

2 54 3

2 45 3

Using Definition 3.1, we can compute det (A) and det AT . It follows that det(A) =2 3 4 5 = 14 and det AT = 2 3 5 4 = 14. Hence, det (A) = det AT .The following provides an essential property of the determinant, as well as a useful wayto determine if a matrix is invertible.Theorem 3.28: Determinant of the InverseLet A be an n n matrix. Then A is invertible if and only if det(A) 6= 0. In this case,det(A1 ) =

1det(A)

Consider the following example.

Example 3.29: Determinant of an Invertible Matrix

3 62 3Let A =,B =. For each matrix, determine if it is invertible. If so,2 45 1find the determinant of the inverse.Solution. Consider the matrix A first. Using Definition 3.1 we can find the determinant asfollows:det (A) = 3 4 2 6 = 12 12 = 0By Theorem 3.28 A is not invertible.Now consider the matrix B. Again by Definition 3.1 we have

det (B) = 2 1 5 3 = 2 15 = 13

By Theorem 3.28 B is invertible and the determinant of the inverse is given by

1det A1 =det(A)1=131= 13118

3.1.4. Finding Determinants using Row Operations

Theorems 3.16, 3.18 and 3.21 illustrate how row operations affect the determinant of amatrix. In this section, we look at two examples where row operations are used to findthe determinant of a large matrix. Recall that when working with large matrices, LaplaceExpansion is effective but timely, as there are many steps involved. This section providesuseful tools for an alternative method. By first applying row operations, we can obtain asimpler matrix to which we apply Laplace Expansion.While working through questions such as these, it is useful to record your row operationsas you go along. Keep this in mind as you read through the next example.Example 3.30: Finding a DeterminantFind the determinant of the matrix

1 5A= 42

2312542 4

43

3 5

Solution. We will use the properties of determinants outlined above to find det (A). First,add 5 times the first row to the second row. Then add 4 times the first row to the thirdrow, and 2 times the first row to the fourth row. This yields the matrix

1234 0 9 13 17

B= 0 3 8 13 0 2 10 3

Notice that the only row operation we have done so far is adding a multiple of a row toanother row. Therefore, by Theorem 3.21, det (B) = det (A) .At this stage, you could use Laplace Expansion to find det (B). However, we will continuewith row operations to find an even simpler matrix to work with.Add 3 times the third row to the second row. By Theorem 3.21 this does not changethe value of the determinant. Then, multiply the fourth row by 3. This results in thematrix

1234 00 1122

C= 0 3 8 13 06 309

Here, det (C) = 3 det (B), which means that det (B) = 31 det (C)119

Since det (A) = det (B), we now have that det (A) = 13 det (C). Again, you could useLaplace Expansion here to find det (C). However, we will continue with row operations.Now replace the add 2 times the third row to the fourth row. This does not change thevalue of the determinant by Theorem 3.21. Finally switch the third and second rows. Thiscauses the determinant to be multiplied by 1. Thus det (C) = det (D) where

1234 0 3 8 13

D= 00 1122 00 14 17

Hence, det (A) = 13 det (C) = 31 det (D)You could do more row operations or you could note that this can be easily expandedalong the first column. Then, expand the resulting 3 3 matrix also along the first column.This results in

11 22 = 1485det (D) = 1 (3) 14 17

and so det (A) = 31 (1485) = 495.

You can see that by using row operations, we can simplify a matrix to the point whereLaplace Expansion involves only a few steps. In Example 3.30, we also could have continueduntil the matrix was in upper triangular form, and taken the product of the entries on themain diagonal. Whenever computing the determinant, it is useful to consider all the possiblemethods and tools.Consider the next example.Example 3.31: Find the DeterminantFind the determinant of the matrix

12 3 2 1 3 2 1

A= 21 2 5 3 4 1 2

Solution. Once again, we will simplify the matrix through row operations. Add 1 timesthe first row to the second row. Next add 2 times the first row to the third and finallytake 3 times the first row and add to the fourth row. This yields

1232 0 5 1 1

B= 0 3 41 0 10 8 4

By Theorem 3.21, det (A) = det (B).

120

Remember you can work with the columns

add to the second column. This yields

1 8 00C= 0 80 10

also. Take 5 times the fourth column and

321 1

41 8 4

By Theorem 3.21 det (A) = det (C).

Now take 1 times the third row and add to the top row. This gives.

1071 00 1 1

D= 0 8 41 0 10 8 4

which by Theorem 3.21 has the same determinant as A.

Now, we can find det (D) by expanding along the first column as follows. You can seethat there will be only one non zero term.

0 1 11 +0+0+0det (D) = 1 det 8 410 8 4Expanding again along the first column, we have

1 11 1det (D) = 1 0 + 8 det+ 10 det= 828 441Now since det (A) = det (D), it follows that det (A) = 82.Remember that you can verify these answers by using Laplace Expansion on A. Similarly,if you first compute the determinant using Laplace Expansion, you can use the row operationmethod to verify.

3.1.5. Exercises1. Find the

1(a)0

0(b)0

4(c)6

determinants of the following matrices.

32

32

32121

1 2 42. Let A = 0 1 3 . Find the following.2 5 1(a)(b)(c)(d)(e)(f)

minor(A)11minor(A)21minor(A)32cof(A)11cof(A)21cof(A)32

3. Find the determinants of the following matrices.

1 2 3(a) 3 2 2 0 9 8

43 27 8 (b) 13 9 3

1 2 3 2 1 3 2 3

(c) 4 1 5 0 1 2 1 2

4. Find the following determinant by expanding along the first row and second column.

1 2 1

2 1 3

2 1 1 5. Find the following determinant by expanding along the first column and third row.

1 2 1

1 0 1

2 1 1

6. Find the following determinant by expanding along the second row and first column.

8. Find the determinant of the following matrices.

9. An operation is done to get from the first matrix to the second. Identify what wasdone and tell how it will affect the value of the determinant.

a ca b

b dc d

10. An operation is done to get from the first matrix to the second. Identify what wasdone and tell how it will affect the value of the determinant.

c da b

a bc d11. An operation is done to get from the first matrix to the second. Identify what wasdone and tell how it will affect the value of the determinant.

a bab

c da+c b+d12. An operation is done to get from the first matrix to the second. Identify what wasdone and tell how it will affect the value of the determinant.

a ba b

2c 2dc d13. An operation is done to get from the first matrix to the second. Identify what wasdone and tell how it will affect the value of the determinant.

b aa b

d cc d14. Let A be an r r matrix and suppose there are r 1 rows (columns) such that all rows(columns) are linear combinations of these r 1 rows (columns). Show det (A) = 0.15. Show det (aA) = an det (A) for an n n matrix A and scalar a.123

16. Construct 2 2 matrices A and B to show that the det A det B = det(AB).17. Is it true that det (A + B) = det (A) + det (B)? If this is so, explain why. If it is notso, give a counter example.18. An n n matrix is called nilpotent if for some positive integer, k it follows Ak = 0. IfA is a nilpotent matrix and k is the smallest possible integer such that Ak = 0, whatare the possible values of det (A)?19. A matrix is said to be orthogonal if AT A = I. Thus the inverse of an orthogonalmatrix is just its transpose. What are the possible values of det (A) if A is an orthogonalmatrix?20. Let A and B be two n n matrices. A B (A is similar to B) means thereexists an invertible matrix P such that A = P 1 BP. Show that if A B, thendet (A) = det (B) .21. Tell whether each statement is true or false. If true, provide a proof. If false, providea counter example.(a) If A is a 33 matrix with a zero determinant, then one column must be a multipleof some other column.(b) If any two columns of a square matrix are equal, then the determinant of thematrix equals zero.(c) For two n n matrices A and B, det (A + B) = det (A) + det (B) .

(d) For an n n matrix A, det (3A) = 3 det (A)

(e) If A1 exists then det (A1 ) = det (A)1 .

(f) If B is obtained by multiplying a single row of A by 4 then det (B) = 4 det (A) .(g) For A an n n matrix, det (A) = (1)n det (A) .

(h) If A is a real n n matrix, then det AT A 0.

(i) If Ak = 0 for some positive integer k, then det (A) = 0.

(j) If AX = 0 for some X 6= 0, then det (A) = 0.

22. Find the determinant using row operations to first simplify.

1 2 1

2 3 2

4 1 2

23. Find the determinant using row operations to first simplify.

2 13

2 42

1 4 5 124

24. Find the determinant using row operations to first simplify.

1 212

3 1 23

1 031

2 32 2

25. Find the determinant using row operations to first simplify.

1 412

3 2 23

1 033

2 12 2

3.2 Applications of the Determinant

OutcomesA. Use determinants to determine whether a matrix has an inverse, and evaluatethe inverse using cofactors.B. Apply Cramers Rule to solve a 2 2 or a 3 3 linear system.C. Given data points, find an appropriate interpolating polynomial and use it toestimate points.

3.2.1. A Formula for the Inverse

The determinant of a matrix also provides a way to find the inverse of a matrix. Recall thedefinition of the inverse of a matrix in Definition 2.33. We say that A1 , an n n matrix,is the inverse of A, also n n, if AA1 = I and A1 A = I.We now define a new matrix called the cofactor matrix of A. The cofactor matrix ofA is the matrix whose ij th entry is the ij th cofactor of A. The formal definition is as follows.Definition 3.32: The Cofactor MatrixLet A = [aij ] be an n hn matrix.i Then the cofactor matrix of A, denoted cof (A),is defined by cof (A) = cof (A)ij where cof (A)ij is the ij th cofactor of A.Note that cof (A)ij denotes the ij th entry of the cofactor matrix.125

We will use the cofactor matrix to create a formula for the inverse of A. First, we definethe adjugate of A to be the transpose of the cofactor matrix. We can also call this matrixthe classical adjoint of A, and we denote it by adj (A).In the specific case where A is a 2 2 matrix given by

a bA=c dthen adj (A) is given byadj (A) =

d bca

In general, adj (A) can always be found by taking the transpose of the cofactor matrix ofA. The following theorem provides a formula for A1 using the determinant and adjugateof A.Theorem 3.33: The Inverse and the DeterminantLet A be an n n matrix. Then A is invertible if and only if det (A) 6= 0.If det (A) 6= 0 (so A is invertible), thenA1 =

1adj (A)det (A)

The proof of this Theorem is below, after two examples demonstrating this concept.Notice that this formula is only defined when det (A) 6= 0.Consider the following example.Example 3.34: Find Inverse Using the DeterminantFind the inverse of the matrix

1 2 3A= 3 0 1 1 2 1

using the formula in Theorem 3.33.

Solution. According to Theorem 3.33,

A1 =

1adj (A)det (A)

First we will find the determinant of this matrix. Using Theorems 3.16, 3.18, and 3.21,we can first simplify the matrix through row operations. First, add 3 times the first rowto the second row. Then add 1 times the first row to the third row to obtain

We will look at another example of how to use this formula to find A1 .

Example 3.35: Find the Inverse From a FormulaFind the inverse of the matrix

12

12

1

6A=

56

13

12

12

using the formula given in Theorem 3.33.

23

Solution. First we need to find det (A). This step is left as an exercise and you should verifythat det (A) = 61 . The inverse is therefore equal toA1 =

1adj (A) = 6 adj (A)(1/6)127

We continue to calculate as follows.

the cofactors. 11

3 2 21

3 2

1 0

2

A1 = 6 32 12

01

2 11 3 2

Here we show the 2 2 determinants needed to find

1

6

5

6

1

2

5

6

1

2

1

6

1 1 T

12 6 3

5 2 12 6 3

1

1 2 0 2

2 1 52 6 3

1

1 2 0 2

1 1 1 2 6 3

Expanding all the 2 2 determinants, this yields

1 11 T636

12 1111

3 = 211 A1 = 6

3 6 1 11 211 6 66

Again, you can always check your work by multiplying A1 A

these products equal I. 1

1022

12 1 1111

2 63= 011 A1 A = 2

1 21 5 2 1 0632

and AA1 and ensuring

0 01 0 0 1

This tells us that our calculation for A1 is correct. It is left to the reader to verify thatAA1 = I.

The verification step is very important, as it is a simple way to check your work! If youmultiply A1 A and AA1 and these products are not both equal to I, be sure to go back anddouble check each step. One common error is to forget to take the transpose of the cofactormatrix, so be sure to complete this step.We will now prove Theorem 3.33.Proof. First if A is invertible, then by Theorem ?? we have:

1 = det (I) = det AA1 = det (A) det A1

and thus det (A) 6= 0.

Equivalently, if det (A) = 0, then A is not invertible.Now assume det (A) 6= 0. From the definition of the determinant in terms of expansionalong a column, and letting A = [air ], if det (A) 6= 0,nX

which is the kr th entry of cof (A)T A. Therefore,

Using the other formula in Definition 3.11, and similar reasoning,

arj cof (A)kj det (A)1 = rk

arj cof (A)kj =

arj cof (A)Tjk

j=1

which is the rk th entry of Acof (A)T . Therefore,

A

cof (A)T=Idet (A)

and it follows from 3.1 and 3.2 that A1 = [aij ]1 , where

[aij ]1 = cof (A)ji det (A)1In other words,1

cof (A)T=det (A)

129

(3.2)

This method for finding the inverse of A is useful in many contexts. In particular, it isuseful with complicated matrices where the entries are functions, rather than numbers.Consider the following example.Example 3.36: Inverse for Non-Constant MatrixSuppose

3.2.2. Cramers Rule

Another context in which the formula given in Theorem 3.33 is important is CramersRule. Recall that we can represent a system of linear equations in the form AX = B, wherethe solutions to this system are given by X. Cramers Rule gives a formula for the solutionsX in the special case that A is a square invertible matrix. Note this rule does not applyif you have a system of equations in which there is a different number of equations thanvariables (in other words, when A is not square), or when A is not invertible.Suppose we have a system of equations given by AX = B, and we want to find solutionsX which satisfy this system. Then recall that if A1 exists,AXA (AX)

A1 A XIXX1

=====

130

BA1 BA1 BA1 BA1 B

Hence, the solutions X to the system are given by X = A1 B. Since we assume thatA1 exists, we can use the formula for A1 given above. Substituting this formula into theequation for X, we have1adj (A) BX = A1 B =det (A)Let xi be the ith entry of X and bj be the j th entry of B. Then this equation becomesxi =

nX

[aij ]

bj =

j=1

nXj=1

1adj (A)ij bjdet (A)

where adj (A)ij is the ij th entry of adj (A).

By the formula for the expansion of a determinant along

b1 1 ....xi =det ..det (A) bn

a column,

.. .

where here the ith column of A is replaced with the column vector [b1 , bn ]T . The determinant of this modified matrix is taken and divided by det (A). This formula is known asCramers rule.We formally define this method now.Procedure 3.37: Using Cramers RuleSuppose A is an n n invertible matrix and we wish to solve the system AX = B forX = [x1 , , xn ]T . Then Cramers rule saysdet (Ai )det (A)

xi =

where Ai is the matrix obtained by replacing the ith column of A with the columnmatrix

b1

B = ... bnWe illustrate this procedure in the following example.Example 3.38: Using Cramers RuleFind x, y, z if

12 1x1 32 1 y = 2 2 3 2z3131

Solution. We will use method outlined in Procedure 3.37 to find the values for x, y, z whichgive the solution to this system. Let

1

B= 2 3In order to find x, we calculate

x=

det (A1 )det (A)

where A1 is the matrix obtained from replacing the first column of A with B.Hence, A1 is given by

12 12 1 A1 = 23 3 2Therefore,

det (A1 )x==

det (A)

12 1 22 1 3 3 2 1=

212 1

32 1 2 3 2

Similarly, to find y we construct A2 by replacing the second column of A with B. Hence,

A2 is given by

1 1 1A2 = 3 2 1 2 3 2Therefore,

1 1 1

3 2 1

2 3 2 1det (A2 ) ==y= 1det (A)72 1

32 1

2 3 2

Similarly, A3 is constructed by replacing the third column of A with B. Then, A3 is given

by

12 12 2 A3 = 32 3 3

Therefore, z is calculated as follows.

132

det (A3 )z==

det (A)

12 1 32 2 2 3 3 11=

1412 1

32 1 2 3 2

Cramers Rule gives you another tool to consider when solving a system of linear equations.We can also use Cramers Rule for systems of non linear equations. Consider the followingsystem where the matrix A has functions rather than numbers for entries.Example 3.39: Use Cramers Rule for Non-Constant MatrixSolve for z if

1x100 0 et cos t et sin t y = t t20 et sin t et cos tz

Solution. We are asked to find the value of z in the solution. We will solve using Cramersrule. Thus

101

t 0 e cos t t

0 et sin t t2 = t ((cos t) t + sin t) etz=

100

0 et cos t et sin t

0 et sin t et cos t

3.2.3. Polynomial Interpolation

In studying a set of data that relates variables x and y, it may be the case that we can usea polynomial to fit to the data. If such a polynomial can be established, it can be used toestimate values of x and y which have not been provided.Consider the following example.Example 3.40: Polynomial InterpolationGiven data points (1, 4), (2, 9), (3, 12), find an interpolating polynomial p(x) of degreeat most 2 and then estimate the value corresponding to x = 12 .

Determine whether the matrix A has an inverse by finding whether the determinantis non zero. If the determinant is nonzero, find the inverse using the formula for theinverse which involves the cofactor matrix.2. Let

1 2 0A= 0 2 1 3 1 1

Determine whether the matrix A has an inverse by finding whether the determinantis non zero. If the determinant is nonzero, find the inverse using the formula for theinverse.136

3. Let

1 3 3A= 2 4 1 0 1 1

Determine whether the matrix A has an inverse by finding whether the determinantis non zero. If the determinant is nonzero, find the inverse using the formula for theinverse.4. Let

1 2 3A= 0 2 1 2 6 7

Determine whether the matrix A has an inverse by finding whether the determinantis non zero. If the determinant is nonzero, find the inverse using the formula for theinverse.5. Let

1 0 3A= 1 0 1 3 1 0

Determine whether the matrix A has an inverse by finding whether the determinantis non zero. If the determinant is nonzero, find the inverse using the formula for theinverse.6. For the following matrices, determine if they are invertible. If so, use the formula forthe inverse in terms of the cofactor matrix to find each inverse. If the inverse does notexist, explain why.

1 1(a)1 2

1 2 3(b) 0 2 1 4 1 1

1 2 1(c) 2 3 0 0 1 27. Consider the matrix

1 00A = 0 cos t sin t 0 sin t cos t

Does there exist a value of t for which this matrix fails to have an inverse? Explain.

137

8. Consider the matrix

1 t t2A = 0 1 2t t 0 2

Does there exist a value of t for which this matrix fails to have an inverse? Explain.9. Consider the matrix

et cosh t sinh tA = et sinh t cosh t et cosh t sinh t

Does there exist a value of t for which this matrix fails to have an inverse? Explain.10. Consider the matrix

Does there exist a value of t for which this matrix fails to have an inverse? Explain.11. Show that if det (A) 6= 0 for A an n n matrix, it follows that if AX = 0, then X = 0.12. Suppose A, B are n n matrices and that AB = I. Show that then BA = I. Hint:First explain why det (A) , det (B) are both nonzero. Then (AB) A = A and then showBA (BA I) = 0. From this use what is given to conclude A (BA I) = 0. Then useProblem 11.13. Use the formula for the inverse in terms of the cofactor matrix to find the inverse ofthe matrix t

e00

et cos tet sin tA= 0tttt0 e cos t e sin t e cos t + e sin t

14. Find the inverse, if it exists, of the matrix

t

ecos tsin tA = et sin t cos t et cos t sin t

15. Suppose A is an upper triangular matrix. Show that A1 exists if and only if allelements of the main diagonal are non zero. Is it true that A1 will also be uppertriangular? Explain. Could the same be concluded for lower triangular matrices?16. If A, B, and C are each n n matrices and ABC is invertible, show why each of A, B,and C are invertible.17. Decide if this statement is true or false: Cramers rule is useful for finding solutionsto systems of linear equations in which there is an infinite set of solutions.138

18. Use Cramers rule to find the solution to

OutcomesA. Find the position vector of a point in Rn .The notation Rn refers to the collection of ordered lists of n real numbers, that isRn = {(x1 xn ) : xj R for j = 1, , n}In this chapter, we take a closer look at vectors in Rn . First, we will consider what Rn lookslike in more detail. Recall that the point given by 0 = (0, , 0) is called the origin.Now, consider the case of Rn for n = 1. Then from the definition we can identify R withpoints in R1 as follows:R = R1 = {(x1 ) : x1 R}Hence, R is defined as the set of all real numbers and geometrically, we can describe this asall the points on a line.Now suppose n = 2. Then, from the definition,R2 = {(x1 , x2 ) : xj R for j = 1, 2}Consider the familiar coordinate plane, with an x axis and a y axis. Any point within thiscoordinate plane is identified by where it is located along the x axis, and also where it islocated along the y axis. Consider as an example the following diagram.

141

yQ = (3, 4)

P = (2, 1)

x3

Hence, every element in R2 is identified by two components, x and y, in the usual manner.The coordinates x, y (or x1 ,x2 ) uniquely determine a point in the plan. Note that while thedefinition uses x1 and x2 to label the coordinates and you may be used to x and y, thesenotations are equivalent.Now suppose n = 3. You may have previously encountered the 3-dimensional coordinatesystem, given byR3 = {(x1 , x2 , x3 ) : xj R for j = 1, 2, 3}

Points in R3 will be determined by three coordinates, often written (x, y, z) which correspond to the x, y, and z axes. We can think as above that the first two coordinates determinea point in a plane. The third component determines the height above or below the plane,depending on whether this number is positive or negative, and all together this determinesa point in space. You see that the ordered triples correspond to points in space just as theordered pairs correspond to points in a plane and single real numbers correspond to pointson a line.The idea behind the more general Rn is that we can extend these ideas beyond n = 3.This discussion regarding points in Rn leads into a study of vectors in Rn . While we considerRn for all n, we will largely focus on n = 2, 3 in this section.Consider the following definition.Definition 4.1: The Position Vector

Let P = (p1 , , pn ) be the coordinates of a point in Rn . Then the vector 0P with itstail at 0 = (0, , 0) and its tip at P is called the position vector of the point P .We write

This definition is illustrated in the following picture for the special case of R3 .142

P = (p1 , p2 , p3 )T

0P = p1 p2 p3

Thus every point P in Rn determines its position vector 0P . Conversely, every such

position vector 0P which has its tail at 0 and point at P determines the point P of Rn .Now suppose we are given two points, P, Q whose coordinates are (p1 , , pn ) and(q1 , , qn ) respectively. We can also determine the position vector from P to Q (alsocalled the vector from P to Q) defined as follows.

q1 p1

..PQ = = 0Q 0P.qn pnNow, imagine taking a vector in Rn and moving it around, always keeping it pointing inthe same direction as shown in the following picture.BA

T

0P = p1 p2 p3

After moving it around, it is regarded as the same vector. Each vector, 0P and AB hasthe same length (or magnitude) and direction. Therefore, they are equal.Consider now the general definition for a vector in Rn .Definition 4.2: Vectors in RnLet Rn = {(x1 , , xn ) : xj R for j = 1, , n} . Then,

x1

~x = ... xn

is called a vector. Vectors have both size (magnitude) and direction. The numbersxj are called the components of ~x.

Using this notation, we may use ~p to denote the position vector of point P . Notice that

in this context, p~ = 0P . These notations may be used interchangeably.

143

You can think of the components of a vector as directions for obtaining the vector.Consider n = 3. Draw a vector with its tail at the point (0, 0, 0) and its tip at the point(a, b, c). This vector it is obtained by starting at (0, 0, 0), moving parallel to the x axis to(a, 0, 0) and then from here, moving parallel to the y axis to (a, b, 0) and finally parallel to thez axis to (a, b, c) . Observe that the same vector would result if you began at the point (d, e, f ),moved parallel to the x axis to (d + a, e, f ) , then parallel to the y axis to (d + a, e + b, f ) ,and finally parallel to the z axis to (d + a, e + b, f + c). Here, the vector would have its tailsitting at the point determined by A = (d, e, f ) and its point at B = (d + a, e + b, f + c) . Itis the same vector because it will point in the same direction and have the same length.It is like you took an actual arrow, and moved it from one location to another keeping itpointing the same direction.We conclude this section with a brief discussion regarding notation. In previous sections,we have written vectors as columns, or n 1 matrices. For convenience in this chapter wemay write vectors as the transpose of row vectors, or 1 n matrices. These are of courseequivalent and we may move between both notations. Therefore, recognize that

T2= 2 33Notice that two vectors ~u = [u1 un ]T and ~v = [v1 vn ]T are equal if and only if allcorresponding components are equal. Precisely,~u = ~v if and only ifuj = vj for all j = 1, , n

T

T

T

TThus 1 2 4 R3 and 2 1 4 R3 but 1 2 46= 2 1 4because,even though the same numbers are involved, the order of the numbers is different.For the specific case of R3 , there are three special vectors which we often use. They aregiven by

~i = 1 0 0 T

~j = 0 1 0 T

~k = 0 0 1 T

TWe can write any vector ~u = u1 u2 u3as a linear combination of these vectors, written~~~as ~u = u1i + u2 j + u3 k. This notation will be used throughout this chapter.

Addition and scalar multiplication are two important algebraic operations done withvectors. Notice that these operations apply to vectors in Rn , for any value of n. We willexplore these operations in more detail in the following sections.

4.2.1. Addition of Vectors in Rn

Addition of vectors in Rn is defined as follows.4.3: Addition of Vectors in Rn

v1

. , ~v = .. Rn then ~u + ~v Rn and is defined byunvn

Definition

u1 ..If ~u = .

v1u1

~u + ~v = ... + ... vnun

u1 + v1

..=

un + vn

To add vectors, we simply add corresponding components exactly as we did for matrices.Therefore, in order to add vectors, they must be the same size.Similarly to matrices, addition of vectors satisfies some important properties. These areoutlined in the following theorem.

The Existence of an Additive Inverse

~u + (~u) = ~0The proof of this theorem follows from the similar theorem given for matrices in Proposition 2.7. Thus the additive identity shown in equation 4.1 is also called the zero vector,the n 1 vector in which all components are equal to 0. Further, ~u is simply the vectorwith all components having same value as those of ~u but opposite sign; this is just (1)~u.This will be made more explicit in the next section when we explore scalar multiplication ofvectors. Note that subtraction is defined as ~u ~v = ~u + (~v ) .

4.2.2. Scalar Multiplication of Vectors in Rn

Scalar multiplication of vectors in Rn is defined as follows. Notice that, just like addition,this definition is the same as the corresponding definition for matrices.Definition 4.5: Scalar Multiplication of Vectors in RnIf ~u Rn and k R is a scalar, then k~u Rn is defined by

u1ku1

k~u = k ... = ...

unkunJust as with addition, scalar multiplication of vectors satisfies several important properties. These are outlined in the following theorem.

Proof: Again the verification of these properties follows from the corresponding properties for scalar multiplication of matrices, given in Proposition 2.10.As a refresher we can show thatk (~u + ~v ) = k~u + k~vNote that:

k (~u + ~v) = k [u1 + v1 un + vn ]T

4.3 Geometric Meaning of

Vector Addition

OutcomesA. Understand vector addition, geometrically.Recall that an element of Rn is an ordered list of numbers. For the specific case ofn = 2, 3 this can be used to determine a point in two or three dimensional space. This pointis specified relative to some coordinate axes.Consider the case n = 3. Recall that taking a vector and moving it around without changing its length or direction does not change the vector. This is important in the geometricrepresentation of vector addition.Suppose we have two vectors, ~u and ~v in R3 . Each of these can be drawn geometricallyby placing the tail of each vector at 0 and its point at (u1 , u2 , u3) and (v1 , v2 , v3 ) respectively.Suppose we slide the vector ~v so that its tail sits at the point of ~u. We know that this doesnot change the vector ~v . Now, draw a new vector from the tail of ~u to the point of ~v . Thisvector is ~u + ~v .The geometric significance of vector addition in Rn for any n is given in the followingdefinition.Definition 4.7: Geometry of Vector AdditionLet ~u and ~v be two vectors. Slide ~v so that the tail of ~v is on the point of ~u. Thendraw the arrow which goes from the tail of ~u to the point of ~v. This arrow representsthe vector ~u + ~v .~u + ~v~v~uThis definition is illustrated in the following picture in which ~u +~v is shown for the specialcase n = 3.

148

~v

z ~u

~u + ~v

~v

xNotice the parallelogram created by ~u and ~v in the above diagram. Then ~u + ~v is thedirected diagonal of the parallelogram determined by the two vectors ~u and ~v.When you have a vector ~v , its additive inverse ~v will be the vector which has the samemagnitude as ~v but the opposite direction. When one writes ~u v~, the meaning is ~u + (~v )as with real numbers. The following example illustrates these definitions and conventions.Example 4.8: Graphing Vector AdditionConsider the following picture of vectors ~u and ~v .~u

~v

Sketch a picture of ~u + ~v , ~u ~v .Solution. We will first sketch ~u + ~v . Begin by drawing ~u and then at the point of ~u, placethe tail of ~v as shown. Then ~u + ~v is the vector which results from drawing a vector fromthe tail of ~u to the tip of ~v.~v~u~u + ~vNext consider ~u ~v . This means ~u + (~v ) . From the above geometric description ofvector addition, ~v is the vector which has the same length but which points in the oppositedirection to ~v. Here is a picture.149

~v~u ~v

~u

4.4 Length of a Vector

OutcomesA. Find the length of a vector and the distance between two points in Rn .B. Find the corresponding unit vector to a vector in Rn .In this section, we explore what is meant by the length of a vector in Rn . We developthis concept by first looking at the distance between two points in Rn .First, we will consider the concept of distance for R, that is, for points in R1 . Here, thedistance between two points P and Q is given by the absolute value of their difference. Wedenote the distance between P and Q by d(P, Q) which is defined asq(4.2)d(P, Q) = (P Q)2Consider now the case for n = 2, demonstrated by the following picture.P = (p1 , p2 )

Q = (q1 , q2 )

(p1 , q2 )

There are two points P = (p1 , p2 ) and Q = (q1 , q2 ) in the plane. The distance betweenthese points is shown in the picture as a solid line. Notice that this line is the hypotenuseof a right triangle which is half of the rectangle shown in dotted lines. We want to findthe length of this hypotenuse which will give the distance between the two points. Notethe lengths of the sides of this triangle are |p1 q1 | and |p2 q2 |, the absolute value of thedifference in these values. Therefore, the Pythagorean Theorem implies the length of thehypotenuse (and thus the distance between P and Q) equals1/21/2(4.3)= (p1 q1 )2 + (p2 q2 )2|p1 q1 |2 + |p2 q2 |2150

= (p1 q1 )2 + (p2 q2 )2 + (p3 q3 )2

1/2

1/2

(4.4)

This discussion motivates the following definition for the distance between points in Rn .Definition 4.9: Distance Between PointsLet P = (p1 , , pn ) and Q = (q1 , , qn ) be two points in Rn . Then the distancebetween these points is defined asdistance between P and Q = d(P, Q) =

nXk=1

|pk qk |

!1/2

This is called the distance formula. We may also write |P Q| as the distancebetween P and Q.From the above discussion, you can see that Definition 4.9 holds for the special casesn = 1, 2, 3, as in Equations 4.2, 4.3, 4.4. In the following example, we use Definition 4.9 tofind the distance between two points in R4 .

151

Example 4.10: Distance Between Points

Find the distance between the points P and Q in R4 , where P and Q are given byP = (1, 2, 4, 6)andQ = (2, 3, 1, 0)Solution. We will use the formula given in Definition 4.9 to find the distance between P andQ. Use the distance formula and writed(P, Q) = (1 2)2 + (2 3)2 + (4 (1))2 + (6 0)2

Therefore, d(P, Q) = 47.

12

= 47

There are certain properties of the distance between points which are important in ourstudy. These are outlined in the following theorem.Theorem 4.11: Properties of DistanceLet P and Q be points in Rn , and let the distance between them, d(P, Q), be given asin Definition 4.9. Then, the following properties hold . d(P, Q) = d(Q, P ) d(P, Q) 0, and equals 0 exactly when P = Q.There are many applications of the concept of distance. For instance, given two points,we can ask what collection of points are all the same distance between the given points. Thisis explored in the following example.Example 4.12: The Plane Between Two PointsDescribe the points in R3 which are at the same distance between (1, 2, 3) and (0, 1, 2) .Solution. Let P = (p1 , p2 , p3 ) be such a point. Therefore, P is the same distance from (1, 2, 3)and (0, 1, 2) . Then by Definition 4.9,qq222(p1 1) + (p2 2) + (p3 3) = (p1 0)2 + (p2 1)2 + (p3 2)2

Simplifying, this becomes

Therefore, the points P = (p1 , p2 , p3 ) which are the same distance from each of the givenpoints form a plane whose equation is given by 4.5.We can now use our understanding of the distance between two points to define what ismeant by the length of a vector. Consider the following definition.Definition 4.13: Length of a VectorLet ~u = [u1 un ]T be a vector in Rn . Then, the length of ~u, written k~uk is given byqk~uk = u21 + + u2nThis definition corresponds to Definition 4.9, if you consider the vector ~u to have its tailat the point 0 = (0, , 0) and its tip at the point U = (u1 , , un ). Then the length of ~u

is equal to the distance between 0 and U, d(0, U). In general, d(P, Q) = kP Qk.Consider Example 4.10. By Definition 4.13, we could also find the distance between P

and Q as the length of the vector connecting them. Hence, if we were to draw a vector P Qwith its tail at P and its point at Q, this vector would have length equal to 47.We conclude this section with a new definition for the special case of vectors of length 1.Definition 4.14: Unit VectorLet ~u be a vector in Rn . Then, we call ~u a unit vector if it has length 1, that is ifk~uk = 1Let ~v be a vector in Rn . Then, the vector ~u which has the same direction as ~v but lengthequal to 1 is the corresponding unit vector of ~v . This vector is given by~u =

1~vk~vk

We often use the term normalize to refer to this process. When we normalize a vector,we find the corresponding unit vector of length 1. Consider the following example.Example 4.15: Finding a Unit VectorLet ~v be given by~v =

kk~uk = |k| k~uk

154

In other words, multiplication by a scalar magnifies or shrinks the length of the vector by afactor of |k|. If |k| > 1, the length of the resulting vector will be magnified. If |k| < 1, thelength of the resulting vector will shrink. Remember that by the definition of the absolutevalue, |k| > 0.What about the direction? Draw a picture of ~u and k~u where k is negative. Notice thatthis causes the resulting vector to point in the opposite direction while if k > 0 it preservesthe direction the vector points. Therefore the direction can either reverse, if k < 0, or remainpreserved, if k > 0.Consider the following example.Example 4.16: Graphing Scalar MultiplicationConsider the vectors ~u and ~v drawn below.~u

~v

Draw ~u, 2~v , and 12 ~v.

Solution.In order to find ~u, we preserve the length of ~u and simply reverse the direction. For 2~v ,we double the length of ~v, while preserving the direction. Finally 12 ~v is found by takinghalf the length of ~v and reversing the direction. These vectors are shown in the followingdiagram.~u

12 ~v~v~u

2~v

Now that we have studied both vector addition and scalar multiplication, we can combinethe two actions. Recall Definition 1.32 of linear combinations of column matrices. We canapply this definition to vectors in Rn . A linear combination of vectors in Rn is a sum ofvectors multiplied by scalars.In the following example, we examine the geometric meaning of this concept.

155

Example 4.17: Graphing a Linear Combination of Vectors

Consider the following picture of the vectors ~u and ~v~u

~v

Sketch a picture of ~u + 2~v , ~u 12 ~v .

Solution. The two vectors are shown below.

~u

2~v

~u + 2~v 12 ~v

~u 12 ~v~u

4.6 Parametric Lines

OutcomesA. Find the vector and parametric equations of a line.We can use the concept of vectors and points to find equations for arbitrary lines in Rn ,although in this section the focus will be on lines in R3 .To begin, consider the case n = 1 so we have R1 = R. There is only one line here whichis the familiar number line, that is R itself. Therefore it is not necessary to explore the caseof n = 1 further.

156

Now consider the case where n = 2, in other words R2 . Let P and P0 be two differentpoints in R2 which are contained in a line L. Let ~p and p~0 be the position vectors for thepoints P and P0 respectively. Suppose that Q is an arbitrary point on L. Consider thefollowing diagram.QPP0

Our goal is to be able to define Q in terms of P and P0 . Consider the vector P0 P = p~ p~0which has its tail at P0 and point at P . If we add p~ p~0 to the position vector p~0 for P0 , thesum would be a vector with its point at P . In other words,p~ = p~0 + (~p p~0 )Now suppose we were to add t(~p p~0 ) to ~p where t is some scalar. You can see that bydoing so, we could find a vector with its point at Q. In other words, we can find t such that~q = p~0 + t (~p p~0 )This equation determines the line L in R2 . In fact, it determines a line L in Rn . Considerthe following definition.Definition 4.18: Vector Equation of a LineSuppose a line L in Rn contains the two different points P and P0 . Let p~ and p~0 be theposition vectors of these two points, respectively. Then, L is the collection of pointsQ which have the position vector ~q given by~q = p~0 + t (~p p~0 )where t R.Let d~ = p~ p~0 . Then d~ is the direction vector for L and the vector equation forL is given by~ tRp~ = p~0 + td,Note that this definition agrees with the usual notion of a line in two dimensions and sothis is consistent with earlier concepts. Consider now points in R3 . If a point P R3 isgiven by P = (x, y, z), P0 R3 by P0 = (x0 , y0 , z0 ), then we can write

xx0a y = y0 + t b zz0c157

awhere d~ = b . This is the vector equation of L written in component form .cThe following theorem claims that such an equation is in fact a line.Proposition 4.19: Algebraic Description of a Straight LineLet ~a, ~b Rn with ~b 6= ~0. Then ~x = ~a + t~b, t R, is a line.Proof. Let x~1 , x~2 Rn . Define x~1 = ~a and let x~2 x~1 = ~b. Since ~b 6= ~0, it follows thatx~2 6= x~1 . Then ~a + t~b = x~1 + t (x~2 x~1 ). It follows that ~x = ~a + t~b is a line containing thetwo different points X1 and X2 whose position vectors are given by ~x1 and ~x2 respectively.We can use the above discussion to find the equation of a line when given two distinctpoints. Consider the following example.Example 4.20: A Line From Two PointsFind a vector equation for the line through the points P0 = (1, 2, 0) and P = (2, 4, 6) .Solution. We will use the definition of a line given above in Definition 4.18 to write this linein the form~q = p~0 + t (~p p~0 )

x

Let ~q = y . Then, we can find p~ and p~0 by taking the position vectors of points P andzP0 respectively. Then,~q = p~0 + t (~p p~0 )can be written as

x11 y = 2 + t 6 , t Rz06

121

Here, the direction vector 6 is obtained by ~p p~0 = 4 2 as indicated

660above in Definition 4.18.

Notice that in the above example we said that we found a vector equation for theline, not the equation. The reason for this terminology is that there are infinitely manydifferent vector equations for the same line. To see this, replace t with another parameter,say 3s. Then you obtain a different vector equation for the same line because the same setof points is obtained.

1In Example 4.20, the vector given by 6 is the direction vector defined in Definition64.18. If we know the direction vector of a line, as well as a point on the line, we can find thevector equation.158

Consider the following example.

Example 4.21: A Line From a Point and a Direction VectorFind a vector equation for the line which contains the point P0 = (1, 2, 0) and has1direction vector d~ = 2 1~ t R. WeSolution. We will use Definition 4.18 to write this line in the form p~ = p~0 + td,~are given the direction vector d. In order to find p~0, we can use the position vector of the1xpoint P0 . This is given by 2 . Letting p~ = y , the equation for the line is given by0z

x11 y = 2 + t 2 , t Rz01

We sometimes elect to write a line such as the one given in 4.6 in the form

x=1+t y = 2 + 2twhere t R

z=t

(4.6)

(4.7)

This set of equations give the same information as 4.6, and is called the parametric equation of the line.Consider the following definition.Definition 4.22: Parametric Equation of a Line

aLet L be a line in R3 which has direction vector d~ = b and goes through the pointcP0 = (x0 , y0 , z0 ). Then, letting t be a parameter, we can write L as

x = x0 + ta y = y0 + tbwhere t R

z = z0 + tcThis is called a parametric equation of the line L.

You can verify that the form discussed following Example 4.21 in equation 4.7 is of theform given in Definition 4.22.159

There is one other form for a line which is useful, which is the symmetric form. Considerthe line given by 4.7. You can solve for the parameter t to writet=x1t = y22t=zTherefore,x1 =

y2=z2

This is the symmetric formof the line.

In the following example, we look at how to take the equation of a line from symmetricform to parametric form.Example 4.23: Change Symmetric Form to Parametric FormSuppose the symmetric form of a line isy1x2==z+332Write the line in parametric form as well as vector form.Solution. We want to write this line in the form given by Definition 4.22. This is of the form

z = 3 + tThis is the parametric equation for this line.Now, we want to write this line in the form given by Definition 4.18. This is the form~p = p~0 + td~where t R. This equation becomes

x23 y = 1 + t 2 , t Rz31

160

4.6.1. Exercises1. Find the vector equation for the line through (7, 6, 0) and (1, 1, 4) . Then, find theparametric equations for this line.2. Findparametricequations for the line through the point (7, 7, 1) with a direction vector

1d~ = 6 .23. Parametric equations of the line are

x=t+2y = 6 3tz = t 6Find a direction vector for the line and a point on the line.4. Find the vector equation for the line through the two points (5, 5, 1), (2, 2, 4) . Then,find the parametric equations.5. The equation of a line in two dimensions is written as y = x 5. Find parametricequations for this line.6. Find parametric equations for the line through (6, 5, 2) and (5, 1, 2) .7. Find the vector equation and parametricfor the line through the point equations

1(7, 10, 6) with a direction vector d~ = 1 .38. Parametric equations of the line are

x = 2t + 2y = 5 4tz = t 3Find a direction vector for the line and a point on the line, and write the vectorequation of the line.9. Find the vector equation and parametric equations for the line through the two points(4, 10, 0), (1, 5, 6) .10. Find the point on the line segment from P = (4, 7, 5) to Q = (2, 2, 3) which isof the way from P to Q.

17

11. Suppose a triangle in Rn has vertices at P1 , P2 , and P3 . Consider the lines which aredrawn from a vertex to the mid point of the opposite side. Show these three linesintersect in a point and find the coordinates of this point.161

4.7 The Dot Product

OutcomesA. Compute the dot product of vectors, and use this to compute vector projections.

4.7.1. The Dot Product

There are two ways of multiplying vectors which are of great importance in applications.The first of these is called the dot product. When we take the dot product of vectors, theresult is a scalar. For this reason, the dot product is also called the scalar product andsometimes the inner product. The definition is as follows.Definition 4.24: Dot ProductLet ~u, ~v be two vectors in Rn . Then we define the dot product ~u ~v as~u ~v =

nX

uk vk

k=1

The dot product ~u ~v is sometimes denoted as (~u, ~v ) where a comma replaces . It canalso be written as h~u, ~v i. If we write the vectors as column or row matrices, it is equal tothe matrix product ~v w~T.Consider the following example.Example 4.25: Compute a Dot ProductFind ~u ~v for

01 1 2

~u = 0 , ~v = 231

Solution. By Definition 4.24, we must compute

With this definition, there are several important properties satisfied by the dot product.Proposition 4.26: Properties of the Dot ProductLet k and p denote scalars and ~u, ~v , w~ denote vectors. Then the dot product ~u ~vsatisfies the following properties. ~u ~v = ~v ~u ~u ~u 0 and equals zero if and only if ~u = ~0 (k~u + p~v) w~ = k (~u w)~ + p (~v w)~ ~u (k~v + pw)~ = k (~u ~v ) + p (~u w)~ k~uk2 = ~u ~uThe proof of this proposition is left as an exercise.This proposition tells us that we can also use the dot product to find the length of avector.Example 4.27: Length of a VectorFind the length of

That is, find k~uk.

2 1

~u = 4 2

Solution. By Proposition 4.26, k~uk2 = ~u ~u. Therefore, k~uk =

This is given by

~u ~u. First, compute ~u ~u.

~u ~u = (2)(2) + (1)(1) + (4)(4) + (2)(2)

= 4 + 1 + 16 + 4= 25

163

Then,

k~uk =~u ~u

=25= 5

You may wish to compare this to our previous definition of length, given in Definition4.13.The Cauchy Schwarz inequality is a fundamental inequality satisfied by the dot product. It is given in the following theorem.Theorem 4.28: Cauchy Schwarz InequalityThe dot product satisfies the inequality|~u ~v| k~ukk~v k

(4.8)

Furthermore equality is obtained if and only if one of ~u or ~v is a scalar multiple of the

other.Proof. First note that if ~v = ~0 both sides of 4.8 equal zero and so the inequality holds inthis case. Therefore, it will be assumed in what follows that ~v 6= ~0.Define a function of t R byf (t) = (~u + t~v ) (~u + t~v )Then by Proposition 4.26, f (t) 0 for all t R. Also from Proposition 4.26f (t) = ~u (~u + t~v ) + t~v (~u + t~v )= ~u ~u + t (~u ~v ) + t~v ~u + t2~v ~v= k~uk2 + 2t (~u ~v ) + k~v k2 t2Now this means the graph of y = f (t) is a parabola which opens up and either its vertextouches the t axis or else the entire graph is above the t axis. In the first case, there existssome t where f (t) = 0 and this requires ~u + t~v = ~0 so one vector is a multiple of the other.Then clearly equality holds in 4.8. In the case where ~v is not a multiple of ~u, it followsf (t) > 0 for all t which says f (t) has no real zeros and so from the quadratic formula,(2 (~u ~v ))2 4k~uk2 k~v k2 < 0which is equivalent to |~u ~v | < k~ukk~vk.Notice that this proof was based only on the properties of the dot product listed inProposition 4.26. This means that whenever an operation satisfies these properties, the164

Cauchy Schwarz inequality holds. There are many other instances of these properties besidesvectors in Rn .The Cauchy Schwarz inequality provides another proof of the triangle inequality fordistances in Rn .Theorem 4.29: Triangle InequalityFor ~u, ~v Rn

k~u + ~v k k~uk + k~v k

(4.9)

and equality holds if and only if one of the vectors is a non-negative scalar multiple ofthe other.Alsokk~uk k~v kk k~u ~v k(4.10)Proof. By properties of the dot product and the Cauchy Schwarz inequality,k~u + ~v k2 ===

It follows from 4.11 and 4.12 that 4.10 holds. This is because |k~uk k~v k| equals the left sideof either 4.11 or 4.12 and either way, |k~uk k~v k| k~u ~v k.

4.7.2. The Geometric Significance of the Dot Product

Given two vectors, ~u and ~v , the included angle is the angle between these two vectors whichis less than or equal to 180 degrees. The dot product can be used to determine the includedangle between two vectors. Consider the following picture where gives the included angle.~v

~uProposition 4.30: The Dot Product and the Included AngleLet ~u and ~v be two vectors in Rn , and let be the included angle. Then the followingequation holds.~u ~v = k~ukk~v k cos In words, the dot product of two vectors equals the product of the magnitude (or length)of the two vectors multiplied by the cosine of the included angle. Note this gives a geometricdescription of the dot product which does not depend explicitly on the coordinates of thevectors.Consider the following example.Example 4.31: Find the Angle Between Two VectorsFind the angle between the vectors given by

First, we can compute ~u ~v. By Definition 4.24, this equals

~u ~v = (2)(3) + (1)(4) + (1)(1) = 9Then,

k~uk =p (2)(2) + (1)(1) + (1)(1) = 6

k~v k = (3)(3) + (4)(4) + (1)(1) = 26

Therefore, the cosine of the included angle equals

9cos = = 0.7205766...26 6With the cosine known, the angle can be determined by computing the inverse cosine ofthat angle, giving approximately = 0.76616 radians.Another application of the geometric description of the dot product is in finding the anglebetween two lines. Typically one would assume that the lines intersect. In some situations,however, it may make sense to ask this question when the lines do not intersect, such as theangle between two object trajectories. In any case we understand it to mean the smallestangle between (any of) their direction vectors. The only subtlety here is that if ~u is a directionvector for a line, then so is any multiple k~u, and thus we will find complementary anglesamong all angles between direction vectors for two lines, and we simply take the smaller ofthe two.Example 4.32: Find the Angle Between Two LinesFind the angle between the two lines

x11L1 : y = 2 + t 1 z02and

x02L2 : y = 4 + s 1 z31

Solution. You can verify that these lines do not intersect, but as discussed above this doesnot matter and we simply find the smallest angle between any directions vectors for theselines.To do so we first find the angle between the direction vectors given above:

.to obtain cos = 12 and since we choose included angles between 0 and we obtain = 232Now the angles between any two direction vectors for these lines will either be 3 or itscomplement = 2= 3 . We choose the smaller angle, and therefore conclude that the3angle between the two lines is 3 .We can also use Proposition 4.30 to compute the dot product of two vectors.Example 4.33: Using Geometric Description to Find a Dot ProductLet ~u, ~v be vectors with k~uk = 3 and k~vk = 4. Suppose the angle between ~u and ~v is/3. Find ~u ~v .Solution. From the geometric description of the dot product in Proposition 4.30~u ~v = (3)(4) cos (/3) = 3 4 1/2 = 6

Two nonzero vectors are said to be perpendicular, sometimes also called orthogonal,if the included angle is /2 radians (90 ).Consider the following proposition.Proposition 4.34: Perpendicular VectorsLet ~u and ~v be nonzero vectors in Rn . Then, ~u and ~v are said to be perpendicularexactly when~u ~v = 0Proof. This follows directly from Proposition 4.30. First if the dot product of two nonzerovectors is equal to 0, this tells us that cos = 0 (this is where we need nonzero vectors).Thus = /2 and the vectors are perpendicular.If on the other hand ~v is perpendicular to ~u, then the included angle is /2 radians.Hence cos = 0 and ~u ~v = 0.Consider the following example.

Example 4.35: Determine if Two Vectors are Perpendicular

Determine whether the two vectors,

are perpendicular.

21~u = 1 , ~v = 3 15

168

Solution. In order to determine if these two vectors are perpendicular, we compute the dotproduct. This is given by~u ~v = (2)(1) + (1)(3) + (1)(5) = 0Therefore, by Proposition 4.34 these two vectors are perpendicular.

4.7.3. ProjectionsIn some applications, we wish to write a vector as a sum of two related vectors. Throughthe concept of projections, we can find these two vectors. First, we explore an importanttheorem. The result of this theorem will provide our definition of a vector projection.Theorem 4.36: Vector ProjectionsLet ~v and ~u be nonzero vectors. Then there exist unique vectors ~v|| and ~v such that~v = ~v|| + ~v

(4.13)

where ~v|| is a scalar multiple of ~u, and ~v is perpendicular to ~u.

Proof. Suppose 4.13 holds and ~v|| = k~u. Taking the dot product of both sides of 4.13 with~u and using ~v ~u = 0, this yields~v ~u = (~v|| + ~v ) ~u= k~u ~u + ~v ~u= kk~uk2which requires k = ~v ~u/k~uk2 . Thus there can be no more than one vector ~v|| . It follows ~vmust equal ~v ~v|| . This verifies there can be no more than one choice for both ~v|| and ~vand proves their uniqueness.Now let~v ~u~v|| =~uk~uk2and let

~v = ~v ~v|| = ~v Then ~v|| = k~u where k =

~v ~u.k~uk2

~v ~u~uk~uk2

It only remains to verify ~v ~u = 0. But

~v ~u~u ~uk~uk2= ~v ~u ~v ~u= 0

~v ~u = ~v ~u

169

The vector ~v|| in Theorem 4.36 is called the projection of ~v onto ~u and is denoted by~v|| = proj~u (~v )We now make a formal definition of the vector projection.Definition 4.37: Vector ProjectionLet ~u and ~v be vectors. Then, the projection of ~v onto ~u is given by

~v ~u~v ~uproj~u (~v) =~u =~u~u ~uk~uk2Consider the following example of a projection.Example 4.38: Find the Projection of One Vector Onto AnotherFind proj~u (~v ) if

21~u = 3 , ~v = 2 41

Solution. We can use the formula provided in Definition 4.37 to find proj~u (~v ). First, compute~v ~u. This is given by

We will conclude this section with an important application of projections. Suppose a

line L and a point P are given such that P is not contained in L. Through the use ofprojections, we can determine the shortest distance from P to L.Example 4.39: Shortest Distance from a Point to a LineLet P = (1, 3, 5) be a point in R3 , and let L be the line which goes through point2~P0 = (0, 4, 2) with direction vector d = 1 . Find the shortest distance from P to2the line L, and find the point Q on L that is closest to P .Solution. In order to determine the shortest distance from P to L, we will first find the

vector P0 P and then find the projection of this vector onto L. The vector P0 P is given by

101 3 4 = 1 527Then, if Q is the point on L closest to P , it follows that

P0 Q = projd~P0 P ~!P0 P d ~d=~ 2kdk

215 1=92

25 =132

Now, the distance from P to L is given by

kQP k = kP0 P P0 Qk = 26

The point Q is found by adding the vector P0 Q to the position vector 0P0 for P0 asfollows

250 4 + 5 1 = 2 10 33222 10

171

320343

, 20 , 4 ).Therefore, Q = ( 103 3 3

4.7.4. Exercises

21 2 0

1. Find 3 134

2. Use the formula given in Proposition 4.30 to verify the Cauchy Schwarz inequality andto show that equality occurs if and only if one of the vectors is a scalar multiple of theother.3. For ~u, ~v vectors in R3 , define the product, ~u ~v = u1 v1 + 2u2v2 + 3u3v3 . Show theaxioms for a dot product all hold for this product. Provek~u ~v k (~u ~u)1/2 (~v ~v )1/2

4. Let ~a, ~b be vectors. Show that ~a ~b = 41 k~a + ~bk2 k~a ~bk2 .

5. Using the axioms of the dot product, prove the parallelogram identity:k~a + ~bk2 + k~a ~bk2 = 2k~ak2 + 2k~bk26. Let A be a real m n matrix and let ~u Rn and ~v Rm . Show A~u ~v = ~u AT ~v .Hint: Use the definition of matrix multiplication to do this.7. Use the result of Problem 6 to verify directly that (AB)T = B T AT without makingany reference to subscripts.8. Find the angle between the vectors

31~u = 1 , ~v = 4 129. Find the angle between the vectors

11~u = 2 , ~v = 2 17

1110. Find proj~v (w)~ where w~ = 0 and ~v = 2 .23172

11

11. Find proj~v (w)

~ where w~=2 and ~v = 0 .23

11 2 2

12. Find proj~v (w)~ where w~ = 2 and ~v = 3 .01

313. Let P = (1, 2, 3) be a point in R. Let L be the line through the point P0 = (1, 4, 5)1~

with direction vector d = 1 . Find the shortest distance from P to L, and find1the point Q on L that is closest to P .

314. Let P = (0, 2, 1) be a point inR . Let L be the line through the point P0 = (1, 1, 1)3~

with direction vector d = 0 . Find the shortest distance from P to L, and find the1point Q on L that is closest to P .

15. Does it make sense to speak of proj~0 (w)?

~16. Prove the Cauchy Schwarz inequality in Rn as follows. For ~u, ~v vectors, consider(w~ proj~v w)~ (w~ proj~v w)~ 0Simplify using the axioms of the dot product and then put in the formula for theprojection. Notice that this expression equals 0 and you get equality in the CauchySchwarz inequality if and only if w~ = proj~v w.~ What is the geometric meaning ofw~ = proj~v w?~17. Let ~v, w~ ~u be vectors. Show that (w~ + ~u) = w~ + ~u where w~ = w~ proj~v (w)~ .18. Show that(~v proj~u (~v ) , ~u) = (~v proj~u (~v ) , ~u) = (~v proj~u (~v)) ~u = 0and conclude every vector in Rn can be written as the sum of two vectors, one whichis perpendicular and one which is parallel to the given vector.

173

4.8 Planes in Rn

OutcomesA. Find the vector and scalar equations of a plane.Much like the above discussion with lines, vectors can be used to determine planes in Rn .Given a vector ~n in Rn and a point P0 , it is possible to find a unique plane which containsP0 and is perpendicular to the given vector.Definition 4.40: Normal VectorLet ~n be a nonzero vector in Rn . Then ~n is called a normal vector to a plane if andonly if~n ~v = 0for every vector ~v in the plane.In other words, we say that ~n is orthogonal (perpendicular) to every vector in the plane.Consider now a plane with normal vector given by ~n, and containing a point P0 . Noticethat this plane is unique. If P is an arbitrary point on this plane, then by definition the

normal vector is orthogonal to the vector between P0 and P . Letting 0P and 0P0 be theposition vectors of points P and P0 respectively, it follows that

~n (0P 0P0 ) = 0or

~n P0 P = 0The first of these equations gives the vector equation of the plane.Definition 4.41: Vector Equation of a PlaneLet ~n be the normal vector for a plane which contains a point P0 . If P is an arbitrarypoint on this plane, then the vector equation of the plane is given by

~n (0P 0P0 ) = 0

Notice that this equation can be used to determine if a point P is contained in a certainplane.

174

Example 4.42: A Point in a Plane

1

Let ~n = 2 be the normal vector for a plane which contains the point P0 = (2, 1, 4).3Determine if the point P = (5, 4, 1) is contained in this plane.Solution. By Definition 4.41, P is a point in the plane if it satisfies the equation

Hence, the vector equation of the plane is

and the scalar equation is

2x + 4y + 1z = 9Suppose a point P is not contained in a given plane. We are then interested in theshortest distance from that point P to the given plane. Consider the following example.176

Example 4.45: Shortest Distance From a Point to a Plane

Find the shortest distance from the point P = (3, 2, 3) to the plane given by2x + y + 2z = 2, and find the point Q on the plane that is closest to P .Solution. Pick an arbitrary point P0 on the plane. Then, it follows that

QP = proj~n P0 P

and kQP k is the shortest distance from P to the plane. Further, the vector 0Q = 0P QPgives the necessary point Q.

Next, compute QP = proj~n P0 P .

Then, kQP k = 4 so the shortest distance from P to the plane is 4.

Next, to find the point Q on the plane which is closest to P we have

Therefore, Q = ( 31 , 32 , 31 ).

0Q = 0P QP

234

1=2 323

11 2=31

177

4.9 The Cross Product

OutcomesA. Compute the cross product and box product of vectors in R3 .Recall that the dot product is one of two important products for vectors. The secondtype of product for vectors is called the cross product. It is important to note that thecross product is only defined in R3 . First we discuss the geometric meaning and then adescription in terms of coordinates is given, both of which are important. The geometricdescription is essential in order to understand the applications to physics and geometry whilethe coordinate description is necessary to compute the cross product.Consider the following definition.Definition 4.46: Right Hand System of VectorsThree vectors, ~u, ~v , w~ form a right hand system if when you extend the fingers of yourright hand along the direction of vector ~u and close them in the direction of ~v , thethumb points roughly in the direction of w.~For an example of a right handed system of vectors, see the following picture.

w~~u~v

In this picture the vector w

~ points upwards from the plane determined by the other twovectors. Point the fingers of your right hand along ~u, and close them in the direction of ~v .Notice that if you extend the thumb on your right hand, it points in the direction of w.~You should consider how a right hand system would differ from a left hand system. Tryusing your left hand and you will see that the vector w~ would need to point in the oppositedirection.Notice that the special vectors, ~i, ~j, ~k will always form a right handed system. If youextend the fingers of your right hand along ~i and close them in the direction ~j, the thumbpoints in the direction of ~k.

178

~k

~j~iThe following is the geometric description of the cross product. Recall that the dotproduct of two vectors results in a scalar. In contrast, the cross product results in a vector,as the product gives a direction as well as magnitude.Definition 4.47: Geometric Definition of Cross ProductLet ~u and ~v be two vectors in R3 . Then the cross product, written ~u ~v , is definedby the following two rules.1. Its length is k~u ~v k = k~ukk~v k sin ,where is the included angle between ~u and ~v.2. It is perpendicular to both ~u and ~v, that is (~u ~v) ~u = 0, (~u ~v ) ~v = 0,and ~u, ~v , ~u ~v form a right hand system.The cross product of the special vectors ~i, ~j, ~k is as follows.~i ~j = ~k~k ~i = ~j~j ~k = ~i

~j ~i = ~k~i ~k = ~j~k ~j = ~i

With this information, the following gives the coordinate description of the cross product.

Proof. Formula 1. follows immediately from the definition. The vectors ~u ~v and ~v ~u havethe same magnitude, |~u| |~v | sin , and an application of the right hand rule shows they haveopposite direction.Formula 2. is proven as follows. If k is a non-negative scalar, the direction of (k~u) ~vis the same as the direction of ~u ~v , k (~u ~v ) and ~u (k~v). The magnitude is k times themagnitude of ~u ~v which is the same as the magnitude of k (~u ~v) and ~u (k~v ) . Usingthis yields equality in 2. In the case where k < 0, everything works the same way exceptthe vectors are all pointing in the opposite direction and you must multiply by |k| whencomparing their magnitudes.The distributive laws, 3. and 4., are much harder to establish. For now, it suffices tonotice that if we know that 3. is true, 4. follows. Thus, assuming 3., and using 1.,(~v + w)~ ~u = ~u (~v + w)~= (~u ~v + ~u w)~= ~v ~u + w~ ~uWe will now look at an example of how to compute a cross product.Example 4.50: Find a Cross ProductFind ~u ~v for the following vectors

13~u = 1 , ~v = 2 21

Solution. Note that we can write ~u, ~v in terms of the special vectors ~i, ~j, ~k as~u = ~i ~j + 2~k~v = 3~i 2~j + ~kWe will use the equation given by 4.16 to compute the cross product.

An important geometrical application of the cross product is as follows. The size of thecross product, k~u ~v k, is the area of the parallelogram determined by ~u and ~v , as shown inthe following picture.181

k~v k sin()

~v

~u

We examine this concept in the following example.

Example 4.51: Area of a ParallelogramFind the area of the parallelogram determined by the vectors ~u and ~v given by

13~u = 1 , ~v = 2 21Solution. Notice that these vectors are the same as the ones given in Example 4.50. Recallfrom the geometric description of the cross product, that the area of the parallelogram issimply the magnitude of ~u ~v . From Example 4.50, ~u ~v = 3~i + 5~j + ~k. We can also writethis as

3

~u ~v = 5 1Thus the area of the parallelogram isp

k~u ~v k = (3)(3) + (5)(5) + (1)(1) = 9 + 25 + 1 = 35

We can also use this concept to find the area of a triangle. Consider the following example.Example 4.52: Area of TriangleFind the area of the triangle determined by the points (1, 2, 3) , (0, 2, 5) , (5, 1, 2)Solution. This triangle is obtained by connecting the three points with lines. Picking (1, 2, 3)

T

Tas a starting point, there are two displacement vectors, 1 0 2and 4 1 1 .Notice that if we add either of these vectors to the position vector of the starting point, theresult is the position vectors of the other two points. Now, the area of the triangle is half the

T

Tarea of the parallelogram determined by 1 0 2and 4 1 1. The requiredcross product is given by

14

0 1 = 2 7 121182

Taking the size of this vector gives the area of the parallelogram, given byp

(2)(2) + (7)(7) + (1)(1) = 4 + 49 + 1 = 54

Hence the area of the triangle is 12 54 = 32 6.

In general, if you have three points in R3 , P, Q, R, the area of the triangle is given by1 kP Q P Rk2

Recall that P Q is the vector running from point P to point Q.

Q

In the next section, we explore another application of the cross product.

4.9.1. The Box Product

Recall that we can use the cross product to find the the area of a parallelogram. It followsthat we can use the cross product together with the dot product to find the volume of aparallelepiped.We begin with a definition.Definition 4.53: ParallelepipedA parallelepiped determined by the three vectors, ~u, ~v, and w~ consists of{r~u + s~v + tw~ : r, s, t [0, 1]}That is, if you pick three numbers, r, s, and t each in [0, 1] and form r~u + s~v + tw~ thenthe collection of all such points makes up the parallelepiped determined by these threevectors.The following is an example of a parallelepiped.~u ~v

w~~v~u183

Notice that the base of the parallelepiped is the parallelogram determined by the vectors ~u and ~v . Therefore, its area is equal to k~u ~v k. The height of the parallelepiped iskwk~ cos where is the angle shown in the picture between w~ and ~u ~v . The volume ofthis parallelepiped is the area of the base times the height which is justk~u ~v kkwk~ cos = ~u ~v w~This expression is known as the box product and is sometimes written as [~u, ~v, w]~ . Youshould consider what happens if you interchange the ~v with the w~ or the ~u with the w.~ Youcan see geometrically from drawing pictures that this merely introduces a minus sign. In anycase the box product of three vectors always equals either the volume of the parallelepipeddetermined by the three vectors or else 1 times this volume.Proposition 4.54: The Box ProductLet ~u, ~v , w~ be three vectors in Rn that define a parallelepiped. Then the volume of theparallelepiped is the absolute value of the box product, given by|~u ~v w|~Consider an example of this concept.Example 4.55: Volume of a ParallelepipedFind the volume of the parallelepiped determined by the vectors

113~u = 2 , ~v = 3 , w~ = 2 563Solution. According to the above discussion, pick any two of these vectors, take the crossproduct and then take the dot product of this with the third of these vectors. The resultwill be either the desired volume or 1 times the desired volume. Therefore by taking theabsolute value of the result, we obtain the volume.We will take the cross product of ~u and ~v . This is given by

11~u ~v = 2 3 56

=

~i11

~k ~j3

~

~~

2 5 = 3i + j + k = 1 13 6

184

Now take the dot product of this vector with w

~ which yields

33

1 2 (~u ~v) w~ =13

~~~~~~= 3i + j + k 3i + 2j + 3k= 9+2+3= 14

This shows the volume of this parallelepiped is 14 cubic units.

There is a fundamental observation which comes directly from the geometric definitionsof the cross product and the dot product.Proposition 4.56: Order of the ProductLet ~u, ~v , and w~ be vectors. Then (~u ~v) w~ = ~u (~v w)~ .Proof. This follows from observing that either (~u ~v ) w~ and ~u (~v w)~ both give thevolume of the parallelepiped or they both give 1 times the volume.Recall that we can express the cross product as the determinant of a particular matrix.It turns out that the same can be done for the box product. Suppose you have three vectors,

To take the box product, you can simply take the determinant of the matrix which resultsby letting the rows be the components of the given vectors in the order in which they occurin the box product.This follows directly from the definition of the cross product given above and the way weexpand determinants. Thus the volume of a parallelepiped determined by the vectors ~u, ~v , w~is just the absolute value of the above determinant.

4.9.2. Exercises1. Show that if ~a ~u = ~0 for any unit vector ~u, then ~a = ~0.185

2. Find the area of the triangle determined by the three points, (1, 2, 3) , (4, 2, 0) and(3, 2, 1) .3. Find the area of the triangle determined by the three points, (1, 0, 3) , (4, 1, 0) and(3, 1, 1) .4. Find the area of the triangle determined by the three points, (1, 2, 3) , (2, 3, 4) and(3, 4, 5) . Did something interesting happen here? What does it mean geometrically?

135. Find the area of the parallelogram determined by the vectors 2 , 2 .31

8. Verify directly that the coordinate description of the cross product, ~u ~v has theproperty that it is perpendicular to both ~u and ~v. Then show by direct computationthat this coordinate description satisfiesk~u ~v k2 = k~uk2 k~vk2 (~u ~v )2

= k~uk2 k~vk2 1 cos2 ()

where is the angle included between the two vectors. Explain why k~u ~v k has thecorrect magnitude.9. Suppose A is a 3 3 skew symmetric matrix such that AT = A. Show there exists a~ such that for all ~u R3vector ~ ~uA~u = Hint: Explain why, since A is skew symmetric it is of the form

0 3 20 1 A = 32 10

where the i are numbers. Then consider 1~i + 2~j + 3~k.

110. Find the volume of the parallelepiped determined by the vectors 7 ,5

13 2 , and 2 .63186

11. Suppose ~u, ~v, and w

~ are three vectors whose components are all integers. Can youconclude the volume of the parallelepiped determined from these three vectors willalways be an integer?12. What does it mean geometrically if the box product of three vectors gives zero?13. Using Problem 12, find an equation of a plane containing the two position vectors, p~and ~q and the point 0. Hint: If (x, y, z) is a point on this plane, the volume of theparallelepiped determined by (x, y, z) and the vectors ~p, ~q equals 0.14. Using the notion of the box product yielding either plus or minus the volume of theparallelepiped determined by the given three vectors, show that(~u ~v ) w~ = ~u (~v w)~In other words, the dot and the cross can be switched as long as the order of the vectorsremains the same. Hint: There are two ways to do this, by the coordinate descriptionof the dot and cross product and by geometric reasoning.15. Simplify (~u ~v ) (~v w)~ (w~ ~z) .16. Simplify k~u ~vk2 + (~u ~v )2 k~uk2 k~vk2 .17. For ~u, ~v , w~ functions of t, prove the following product rules:(~u ~v ) = ~u ~v + ~u ~v (~u ~v ) = ~u ~v + ~u ~v

4.10 Spanning, Linear Independence and Basis in Rn

OutcomesA. Determine the span of a set of vectors, and determine if a vector is contained ina specified span.B. Determine if a set of vectors is linearly independent.C. Understand the concepts of subspace, basis, and dimension.D. Find the row space, column space, and null space of a matrix.

187

By generating all linear combinations of a set of vectors one can obtain various subsetsof Rn which we call subspaces. For example what set of vectors in R3 generate the XY plane? What is the smallest such set of vectors can you find? The tools of spanning, linearindependence and basis are exactly what is needed to answer these and similar questionsand are the focus of this section.

4.10.1. Spanning Set of Vectors

We begin this section with a definition.Definition 4.57: Span of a Set of VectorsThe collection of all linear combinations of a set of vectors {~u1 , , ~uk } in Rn is knownas the span of these vectors and is written as span{~u1, , ~uk }.Consider the following example.Example 4.58: Span of VectorsDescribe the span of the vectors ~u =

1 1 0

T

and ~v =

3 2 0

T

R3 .

Solution. You can see that any linear combination of the vectors ~u and ~v yields a vector

Tx y 0in the XY -plane.Moreover every vector in the XY -plane is in fact such a linear combination of the vectors~u and ~v. Thats because

x13 y = (2x + 3y) 1 + (x y) 2 000

An arbitrary vector in the XY -plane can be written as a linear combination of ~u and ~v .

Thus span{~u, ~v } is precisely the XY -plane.You can convince yourself that no single vector can span the XY -plane. In fact, take amoment to consider what is meant by the span of a single vector.However you can make the set larger if you wish. For example consider the larger set of

Tvectors {~u, ~v, w}~ where w~ = 4 5 0 . Since the first two vectors already span the entireXY -plane, the span is once again precisely the XY -plane and nothing has been gained. Of

Tcourse if you add a new vector such as w~= 0 0 1then it does span a different space.What is the span of ~u, ~v, w~ in this case?The distinction between the sets {~u, ~v} and {~u, ~v , w}~ will be made using the concept oflinear independence.Consider the vectors ~u, ~v , and w~ discussed above. In the next example, we will show howto formally demonstrate that w~ is in the span of ~u and ~v .188

Example 4.59: Vector in a Span

T

T

TLet ~u = 1 1 0and ~v = 3 2 0 R3 . Show that w~ = 4 5 0is inspan {~u, ~v}.Solution. For a vector to be in span {~u, ~v }, it must be a linear combination of these vectors.If w~ span {~u, ~v }, we must be able to find scalars a, b such thatw~ = a~u + b~vWe proceed as follows.

413 5 = a 1 + b 2 000

This is equivalent to the following system of equations

a + 3b = 4a + 2b = 5

We solving this system the usual way, constructing the augmented matrix and row reducing to find the reduced row-echelon form.

4.10.2. Linearly Independent Set of Vectors

Together with the notion of a spanning set of vectors, linear independence is a very importantproperty of a set of vectors.Definition 4.60: Linearly Independent Set of VectorsA set of non-zero vectors {~u1, , ~uk } in Rn is said to be linearly independent ifno vector in that set is in the span of the other vectors of that set.Here is an example.

In terms of spanning, a set of vectors is linearly independent if it does not contain

unnecessary vectors. In the previous example you can see that the vector w~ does not help tospan any new vector not already in the span of the other two vectors. However you can verifythat the set {~u, ~v} is linearly independent, since both are required to span the XY -plane.Consider the following important theorem.Theorem 4.62: Linear Independence as a Linear CombinationThe collection of vectors {~u1 , , ~uk } in Rn is linearly independent if and only ifwhenevernXai ~ui = ~0i=1

it follows that each ai = 0.

In other words, {~u1 , , ~uk } in Rn is linearly independent exactly when the system oflinear equations AX = 0 has only the trivial solution, where A is the n k matrixhaving these vectors as columns.Proof. Suppose first {~u1, , ~uk } is linearly independent. Then by Definition 4.90 none ofthe vectors is a linear combination of the others. Now suppose for the sake of a contradictionthatnXai ~ui = ~0i=1

and not all the ai = 0. Then pick the ai which is not zero and divide this equation by it.Solve for ~ui in terms of the other ~uj , contradicting the fact that none of the ~ui equals a linearcombination of the others. Therefore if the set of vectors is linearly independent and a linearcombination of these vectors equals the zero vector, then all the coefficients must equal zero.Now suppose a linear combination of the vectors equals the zero vector such that allcoefficients equal zero. We want to show that the vectors are linearly independent. If ~ui is alinear combination of the other vectors in the list, then you could obtain an equation of theformX~ui =aj ~ujj6=i

and so we could write

~0 =

aj ~uj + (1) ~ui

j6=i

contradicting the condition that all coefficients equal 0.

190

PFinally observe that the expression ni=1 ai ~ui = ~0 can be written as the system of linearequations AX = 0 where A is the n k matrix having these vectors as columns.Sometimes we refer to this last condition about sums as follows: The set of vectors,{~u1 , , ~uk } is linearly independent if and only if there is no nontrivial linear combinationwhich equals zero. A nontrivial linear combination is one in which not all the scalars equalzero. Similarly, a trivial linear combination is one in which all scalars equal zero.We can say that a set of vectors is linearly dependent if it is not linearly independent,and hence if at least one vector is a linear combination of the others.Here is a detailed example in R4 .Example 4.63: Linear IndependenceDetermine whether the set of vectors given by

021

2 , 1 , 1 3 0 1

210

, 2

is linearly independent. If it is linearly dependent, express one of the vectors as a

linear combination of the others.Solution. Form the 4 4 matrix A having these vectors as columns:

1 2 03 2 1 12

A= 3 0 12 0 1 2 1

Then by Theorem 4.62, the given set of vectors is linearly independent exactly if the systemAX = 0 has only the trivial solution.The augmented matrix for this system and corresponding reduced row-echelon form aregiven by

1 2 03 01 0 01 0 2 1 1

2 0 1 0

0 1 0

3 0 1 0 0 1 1 0 2 0 0 1 2 1 00 0 00 0Not all the columns of the coefficient matrix are pivot columns and so the vectors are notlinearly independent. In this case, we say the vectors are linearly dependent.It follows that there are infinitely many solutions to AX = 0, one of which is

1 1

1 1191

Therefore we can write

21 1 2

1 3 +1 010

30 2 1

1 1 2 1

12

This can be rearranged as follows

1 3 +10

02 11 1 10 21

0 0 = 0 0

3 2

= 2 1

This gives the last vector as a linear combination of the first three vectors.Notice that we could rearrange this equation to write any of the four vectors as a linearcombination of the other three.Consider another example.Example 4.64: Linear IndependenceDetermine whether the set of vectors

1

2 , 3

given

21 ,0 1

by

0 1 , 1 2

2

2

is linearly independent. If it is linearly dependent, express one of the vectors as a

linear combination of the others.

Solution. In this case the matrix of the corresponding homogeneous system of linear equations is

1 2 0 3 0 2 1 1 2 0

3 0 1 2 0 0 1 2 0 0

The reduced row-echelon form is

1 0

00

0100

0010

0001

00

0 0

and so every column is a pivot column. Therefore, these vectors are linearly independentand there is no way to obtain one of the vectors as a linear combination of the others.

192

The following corollary follows from the fact that if the augmented matrix of a homogeneous system of linear equations has more columns than rows, the system has infinitelymany solutions.Corollary 4.65: Linearly Dependence in RnLet {~u1 , , ~uk } be a set of vectors in Rn . If k > n, then the set is linearly dependent(i.e. NOT linearly independent).Proof. Form the n k matrix A having the vectors {~u1 , , ~uk } as its columns and supposek > n. Then A has rank r n < k, so the system AX = 0 has a nontrivial solution byTheorem 1.35, and thus not linearly independent by Theorem 4.62.

4.10.3. A Short Application to Chemistry

The following section applies the concepts of spanning and linear independence to the subjectof chemistry.When working with chemical reactions, there are sometimes a large number of reactionsand some are in a sense redundant. Suppose you have the following chemical reactions.CO + 12 O2 CO2H2 + 21 O2 H2 O

CH4 + 32 O2 CO + 2H2 O

CH4 + 2O2 CO2 + 2H2 O

There are four chemical reactions here but they are not independent reactions. There issome redundancy. What are the independent reactions? Is there a way to consider a shorterlist of reactions? To analyze this situation, we can write the reactions in a matrix as follows

CO O2 CO2 H2 H2 O CH4 1 1/2 1000

0 1/20110

1 3/20021 021021

Each row contains the coefficients of the respective elements in each reaction. For example, the top row of numbers comes from CO + 21 O2 CO2 = 0 which represents the first ofthe chemical reactions.We can write these coefficients in the following matrix

1 1/2 1 00 0 0 1/20 1 1 0

1 3/20 0 2 1 02 1 0 2 1193

Rather than listing all of the reactions as above, it would be more efficient to only list thosewhich are independent by throwing out that which is redundant. We can use the conceptsof the previous section to accomplish this.First, take the reduced row-echelon form of the above matrix.

1 0 0 3 1 1 0 1 0 2 20

0 0 1 4 2 1 0 0 0 000

The top three rows represent independent reactions which come from the original fourreactions. One can obtain each of the original four rows of the matrix given above by takinga suitable linear combination of rows of this reduced row-echelon form matrix.With the redundant reaction removed, we can consider the simplified reactions as thefollowing equationsCO + 3H2 1H2 O 1CH4 = 0O2 + 2H2 2H2 O = 0CO2 + 4H2 2H2 O 1CH4 = 0

In terms of the original notation, these are the reactions

CO + 3H2 H2 O + CH4O2 + 2H2 2H2 OCO2 + 4H2 2H2 O + CH4These three reactions provide an equivalent system to the original four equations. Theidea is that, in terms of what happens chemically, you obtain the same information with theshorter list of reactions. Such a simplification is especially useful when dealing with verylarge lists of reactions which may result from experimental evidence.

4.10.4. Subspaces and Basis

A subspace is simply a set of vectors with the property that linear combinations of thesevectors remain in the set. Geometrically, subspaces are represented by lines and planes whichcontain the origin. The precise definition is as follows.Definition 4.66: SubspaceLet V be a nonempty collection of vectors in Rn . Then V is called a subspace ifwhenever a and b are scalars and ~u and ~v are vectors in V, the linear combinationa~u + b~v is also in V .More generally this means that a subspace contains the span of any finite collectionvectors in that subspace. It turns out that in Rn , a subspace is exactly the span of finitelymany of its vectors.

194

Theorem 4.67: Subspaces are Spans

Let V be a nonempty collection of vectors in Rn . Then V is a subspace of Rn if andonly if there exist vectors {~u1 , , ~uk } in V such thatV = span {~u1 , , ~uk }

Proof. Pick a vector ~u1 in V . If V = span {~u1 } , then you have found your list of vectors and are done. If V 6= span {~u1} , then there exists ~u2 a vector of V which is not inspan {~u1 } . Consider span {~u1 , ~u2} . If V = span {~u1 , ~u2 }, we are done. Otherwise, pick ~u3not in span {~u1 , ~u2} . Continue this way. Note that since V is a subspace, these spans areeach contained in V . The process must stop with ~uk for some k n by Corollary P4.65.Now suppose V = span {~u1 , , ~uk }, we must show this is a subspace. So let ki=1 ci ~uiPand ki=1 di~ui be two vectors in V , and let a and b be two scalars. Thena

kXi=1

ci~ui + b

kX

di ~ui =

kX

(aci + bdi ) ~ui

i=1

i=1

which is one of the vectors in span {~u1 , , ~uk } and is therefore contained in V . This showsthat span {~u1 , , ~uk } has the properties of a subspace.Since the vectors ~ui we constructed in the proof above are not in the span of the previousvectors (by definition), they must be linearly independent and thus we obtain the followingcorollary.Corollary 4.68: Subspaces are Spans of Independent VectorsIf V is a subspace of Rn , then there exist linearly independent vectors {~u1 , , ~uk } inV such that V = span {~u1 , , ~uk }.In summary, subspaces of Rn consist of spans of finite, linearly independent collectionsof vectors of Rn .Note that it was just shown in Corollary 4.68 that every subspace of Rn is equal to thespan of a linearly independent collection of vectors of Rn . Such a collection of vectors iscalled a basis.

195

Definition 4.69: Basis of a Subspace

Let V be a subspace of Rn . Then {~u1 , , ~uk } is a basis for V if the following twoconditions hold.1. span {~u1 , , ~uk } = V2. {~u1 , , ~uk } is linearly independentNote the plural of basis is bases.The main theorem about bases is not only they exist, but that they must be of the samesize. To show this, we will need the the following fundamental result, called the ExchangeTheorem.Theorem 4.70: Exchange TheoremSuppose {~u1 , , ~ur } is linearly independent and each ~uk is contained inspan {~v1 , , ~vs } Then s r. In words, spanning sets have at least as many vectorsas linearly independent sets.Proof. Since {~v1 , , ~vs } is a spanning set, there exist scalars aij such that~uj =

which contradicts the assumption that {~u1 , , ~ur } is linearly independent, because not allthe dj = 0. Thus this contradiction indicates that s r.We are now ready to show that any two bases are of the same size.

196

Theorem 4.71: Bases of Rn are of the Same Size

Let V be a subspace of Rn and suppose {~u1 , , ~uk } and {~v1 , , ~vm } are two basesfor V . Then k = m.Proof. This follows right away from Theorem 9.27. Indeed observe that {~u1 , , ~uk } is aspanning set while {~v1 , , ~vm } is linearly independent so k m. Also {~v1 , , ~vm } is aspanning set while {~u1 , , ~uk } is linearly independent so m k.The following definition can now be stated.Definition 4.72: Dimension of a SubspaceLet V be a subspace of Rn . Then the dimension of V , written dim(V ) is defined tobe the number of vectors in a basis.The next result follows.Corollary 4.73: Dimension of RnThe dimension of Rn is n.Proof. You only need to exhibit a basis for Rn which has n vectors. Such a basis is{~e1 , , ~en }.We conclude this section by stating further properties of a set of vectors in Rn .Corollary 4.74: Linearly Independent and Spanning Sets in RnThe following properties hold in Rn : Suppose {~u1 , , ~un } is linearly independent and each ~ui is a vector in Rn . Then{~u1 , , ~un } is a basis for Rn . Suppose {~u1 , , ~um } spans Rn . Then m n. If {~u1 , , ~un } spans Rn , then {~u1 , , ~un } is linearly independent.Proof. Assume first that {~u1 , , ~un } is linearly independent, and we need to show thatthis set spans Rn . To do so, let ~v be a vector of Rn , and we need to write ~v as a linearcombination of ~ui s. Consider the matrix A having the vectors ~ui as columns:

A = ~u1 ~unBy linear independence of the ~ui s, the reduced row-echelon form of A is the identity matrix.Therefore the system A~x = ~v has a (unique) solution for all ~x, so ~v is a linear combinationof the ~ui s.

197

To establish the second claim, suppose that m < n. Then letting ~vi1 , , ~vik be the pivotcolumns of the matrix

~v1 ~vm

it follows k m < n and these k pivot columns would be a basis for Rn having fewer thann vectors, contrary to Corollary 4.73.Finally consider the third claim. If {~v1 , , ~vn } is not linearly independent, then replacethis list with {~vi1 , , ~vik } where these are the pivot columns of the matrix

~v1 ~vn

Then {~vi1 , , ~vik } spans Rn and is linearly independent so it is a basis having less than nvectors again contrary to Corollary 4.73.

4.10.5. Row Space, Column Space, and Null Space of a Matrix

We begin this section with a new definition.Definition 4.75: Row and Column SpaceLet A be an m n matrix. The column space of A, written col(A), is the span ofthe columns. The row space of A, written row(A), is the span of the rows.Using the reduced row-echelon form , we can obtain an efficient description of the rowand column space of a matrix. Of course the column space can be obtained by simply sayingthat it equals the span of all the columns. However, you can often get the column space asthe span of fewer columns than this. This is what we mean by an efficient description. Thenext example illustrates this concept.Example 4.76: Rank, Column and Row SpaceFind the rank of the following matrix andciently.

1 2A= 1 33 7

describe the column and row spaces effi

1 3 26 0 2 8 6 6

Solution. The reduced row-echelon form of A is

1 0 99 2 0 15 3 0 0 000 0

Therefore, the rank is 2. Notice that all columns of this reduced row-echelon form matrixare in

0 1span 0 , 1

00198

For example,

910 5 = 9 0 + 5 1 000Since the original matrix and its reduced row-echelon form are equivalent, all columns of theoriginal matrix are similarly contained in the span of the first two columns of that matrix.For example, consider the third column of the original matrix. It can be written as a linearcombination of the first two columns of the original matrix as follows.

112 6 = 9 1 + 5 3 837

The column space of the original matrix equals the span of the first two columns of theoriginal matrix. This is the desired efficient description of the column space.

2 1

1 , 3 col(A) = span

37

What about an efficient description of the row space? When row operations are used, theresulting vectors remain in the row space. Thus the rows in the reduced row-echelon form arein the row space of the original matrix. Furthermore, by reversing the row operations, eachrow of the original matrix can be obtained as a linear combination of the rows in the reducedrow-echelon form. It follows that the span of the nonzero rows in the reduced row-echelonform equals the span of the original rows. For the above matrix, the row space equals

1 0 9 9 2 , 0 1 5 3 0row(A) = spanNotice that the column space of A is given as the span of columns of the original matrix,while the row space of A is the span of rows of the reduced row-echelon form of A.Consider another example.Example 4.77: Rank, Column and Row SpaceFind the rank of the following matrix andciently.

1 2 1 1 3 6

1 2 11 3 2

describe the column and row spaces effi3

034

22

2 0

Solution. The reduced row-echelon form is

1 0 00 132 0 1 02 25

0 0 1 120 0 0

199

and so the rank is 3. The row space is given by

1 0 0 0 13, 0 1 0 2 52 , 0 0 1 1row(A) = span2

12

Notice that the first three columns of the reduced row-echelon form are pivot columns. Thecolumn space is the span of the first three columns in the original matrix,

1 21

31, , 6 col(A) = span 1 2 1

231Consider the solution given above for Example 4.77, where the rank of A equals 3. Noticethat the row space and the column space each had dimension equal to 3. It turns out thatthis is not a coincidence. This essential result is referred to as the Rank Theorem and isgiven now.Theorem 4.78: Rank TheoremLet A be an m n matrix. Thendim(col(A)) = dim(row(A)) = rank(A)The following statements are results of the Rank Theorem.Corollary 4.79: Results of the Rank Theorem1. For any matrix A, rank(A) = rank(AT ).2. For any m n matrix A, rank(A) m and rank(A) n.3. For any n n matrix A, A is invertible if and only if rank(A) = n.4. For any matrix A and invertible matrices B, C of appropriate size, rank(A) =rank(BA) = rank(AC).Consider the following example.Example 4.80: Rank of the TransposeLetA=Find rank(A) and rank(AT ).

1 21 1

200

Solution. To find rank(A) we first row reduce to find the reduced row-echelon form.

1 21 0A=

1 10 1Therefore the rank of A is 2. Now consider AT given by

1 1TA =21Again we row reduce to find the reduced row-echelon form.

1 01 1

0 121You can see that rank(AT ) = 2, the same as rank(A).We now define what is meant by the null space of a general m n matrix.Definition 4.81: Null Space, or Kernel, of AThe null space of a matrix A, also referred to as the kernel of A, is defined as follows.noker (A) = ~x : A~x = ~0It is also referred to quite often using the notation N (A) , the N signifying null space.To find ker (A) one must solve the system of equations A~x = ~0. This is a familiar procedure.Similarly, we can discuss the image of A, denoted by im (A). The image of A consists ofthe vectors of Rm which get hit by A. The formal definition is as follows.Definition 4.82: Image of AThe image of A, written im (A) is given byim (A) = {A~x : ~x Rn }Consider A as a mapping from Rn to Rm whose action is given by multiplication. Thefollowing diagram displays this scenario.ker(A) A im(A)Rn RmAs indicated, im (A) is a subset of Rm while ker (A) is a subset of Rn .Finding ker (A) is not new! There is just some new terminology being used. ker (A) issimply the solution to the system A~x = ~0.Consider the following example.

201

Example 4.83: Null Space, or Kernel of A

Let

Find ker (A).

12 1A = 0 1 1 23 3

Solution. In order to find ker (A), we need to solve the equation A~x = ~0. This is the usualprocedure of writing the augmented matrix, finding the reduced row-echelon form and thenthe solution. The augmented matrix and corresponding reduced row-echelon form are

1 03 012 1 0 0 1 1 0 0 1 1 0 23 3 00 00 0The third column is not a pivot column, and therefore the solution will contain a parameter.The solution to the system A~x = ~0 is given by

3t t :tRt

which can be written as

3t 1 : t R1

Therefore, the null space of A is all multiples of this vector, which we can write as

3 ker(A) = span 1

1Here is a larger example, but the method is entirely similar.Example 4.84: Null Space, or Kernel of ALet

Find the null space of A.

12 1 0 2 1 1 3A= 31 2 34 2 2 6

202

10

1 0

Solution. To find the null space, we need to solve the equation AX

matrix and corresponding reduced row-echelon form are given by

611 0 35

5512 1 0 1 0

2 1 1 3 0 0 0 1 51 35 52

31 2 3 1 0

0 0 00 04 2 2 6 0 00 0 00 0

= 0. The augmented0

0 0

It follows that the first two columns are pivot columns, and the next three correspond toparameters. Therefore, ker (A) is given by

35 s + 56 t + 51 r 1 s + 3 t + 2 r 555

: s, t, r R.s

trWe write this in the form

53

s 1 + t

0 0

6535

+r0

1 0

15 25

: s, t, r R.0

01

In other words, the null space of this matrix equals the span of the three vectors above. Thus 3 6 1 55

12 3

5 5 5

ker (A) = span 1 , 0 , 0

0 1 0

001Notice also that the three vectors above are linearly independent and so the dimensionof ker (A) is 3. The following is true in general. The number of free variables equals thedimension of the null space while the number of basic variables equals the number of pivotcolumns which is the rank.Before we proceed to an important theorem, we first define what is meant by the nullityof a matrix.Definition 4.85: NullityThe dimension of the null space of a matrix is called the nullity, denoted null (A).

203

We can now state an important theorem.

Theorem 4.86: Rank and NullityLet A be an m n matrix. Then rank (A) + null (A) = n.Consider the following example, which we first explored above in Example 4.83Example 4.87: Rank and NullityLet

Find rank (A) and null (A).

12 1A = 0 1 1 23 3

Solution. In the above Example 4.83 we

is given by

1 00

determined that the reduced row-echelon form of A

03 | 01 1 | 0 00 | 0

Therefore the rank of A is 2. We also determined that kernel of A is given by

3 ker(A) = span 1

Therefore the nullity of A is 1. It follows from Theorem 4.86 that rank (A) + null (A) =2 + 1 = 3, which is the number of columns of A.

4.10.6. Exercises1. Suppose {~x1 , , ~xk } is a set of vectors from Rn . Show that ~0 is in span {~x1 , , ~xk } .

Is S a subspace of R3 ? If so, explain why, give a basis for the subspace and find itsdimension.

18. Consider the set of vectors S given by

2u + 6v + 7w

3u 9v 12wS=

2u + 6v + 6w

u + 3v + 3w

: u, v, w R .

Is S a subspace of R4 ? If so, explain why, give a basis for the subspace and find itsdimension.206

19. Consider the set of vectors S given by

2u + v

S = 6v 3u + 3w : u, v, w R .

3v 6u + 3w

Is this set of vectors a subspace of R3 ? If so, explain why, give a basis for the subspaceand find its dimension.

20. Consider the vectors of the form

2u + v + 7w

u 2v + w : u, v, w R .

6v 6w

Is this set of vectors a subspace of R3 ? If so, explain why, give a basis for the subspaceand find its dimension.

21. Consider the vectors of the form

3u + v + 11w

18u + 6v + 66w : u, v, w R .

28u + 8v + 100w

Is this set of vectors a subspace of R3 ? If so, explain why, give a basis for the subspaceand find its dimension.

22. Consider the vectors of the form

3u + v

2w 4u: u, v, w R .

2w 2v 8u

Is this set of vectors a subspace of R3 ? If so, explain why, give a basis for the subspaceand find its dimension.

23. Consider the set of vectors S given by

u+v+w

2u + 2v + 4w u+v+w

: u, v, w R .

Is S a subspace of R4 ? If so, explain why, give a basis for the subspace and find itsdimension.

24. Consider the set of vectors S given by

3u 3w : u, v, w R .

8u 4v + 4w

Is S a subspace of R3 ? If so, explain why, give a basis for the subspace and find itsdimension.207

25. If you have 5 vectors in R5 and the vectors are linearly independent, can it always beconcluded they span R5 ? Explain.26. If you have 6 vectors in R5 , is it possible they are linearly independent? Explain.27. Suppose A is an mn matrix and {w~ 1, , w~ k } is a linearly independent set of vectorsnmin A (R ) R . Now suppose A~zi = w~ i . Show {~z1 , , ~zk } is also independent.28. Suppose V, W are subspaces of Rn . Let V W be all vectors which are in both V andW . Show that V W is a subspace also.29. Suppose V and W both have dimension equal to 7 and they are subspaces of R10 .What are the possibilities for the dimension of V W ? Hint: Remember that a linearindependent set can be extended to form a basis.30. Suppose V has dimension p and W has dimension q and they are each contained ina subspace, U which has dimension equal to n where n > max (p, q) . What are thepossibilities for the dimension of V W ? Hint: Remember that a linearly independentset can be extended to form a basis.31. Suppose A is an m n matrix and B is an n p matrix. Show thatdim (ker (AB)) dim (ker (A)) + dim (ker (B)) .Hint: Consider the subspace, B (Rp ) ker (A) and suppose a basis for this subspaceis {w~ 1, , w~ k } . Now suppose {~u1 , , ~ur } is a basis for ker (B) . Let {~z1 , , ~zk } besuch that B~zi = w~ i and argue thatker (AB) span {~u1 , , ~ur , ~z1 , , ~zk } .

32. Show that if A is an m n matrix, then ker (A) is a subspace of Rn .

33. Find the rank of the following matrix. Also find a basis for the row and column spaces.

1 3

11

3091313 1

203708

31 1 1 2 10

34. Find the rank of the following matrix. Also find a basis for the row and column spaces.

1 30 2 7 3 3 91 7 23 8

1 31 3 9 2 1 3 1 1 5 4208

35. Find the rank of the following matrix. Also find a basis for the row and column spaces.

10 30 7 0 31 100 23 0

11 41 7 0 1 1 2 2 9 136. Find the rank of the following matrix. Also find a basis for the row and column spaces.

10 3 31 10

11 4 1 1 237. Find the rank of the following matrix. Also find a basis for the row and column spaces.

00 101 123 2 18

122 1 11 1 2 211138. Find the rank of the following matrix. Also find a basis for the row and column spaces.

1030 31 100

11 21 1 12 239. Find ker (A) for the following matrices.

2 3(a) A =4 6

1 0 13 (b) A = 1 13 21

2 40(c) A = 3 6 2 1 2 2

2 135 2012

(d) A = 64 5 6 02 4 6

209

4.11 Orthogonality and the Gram Schmidt

Process

OutcomesA. Determine if a given set is orthogonal or orthonormal.B. Determine if a given matrix is orthogonal.C. Given a linearly independent set, use the Gram-Schmidt Process to find corresponding orthogonal and orthonormal sets.D. Find the orthogonal projection of a vector onto a subspace.E. Find the least squares approximation for a collection of points.

4.11.1. Orthogonal and Orthonormal Sets

In this section, we examine what it means for vectors (and sets of vectors) to be orthogonaland orthonormal. First, it is necessary to review some important concepts. You may recallthe definitions for the span of a set of vectors and a linear independent set of vectors. Weinclude the definitions and examples here for convenience.Definition 4.88: Span of a Set of Vectors and SubspaceThe collection of all linear combinations of a set of vectors {~u1 , , ~uk } in Rn is knownas the span of these vectors and is written as span{~u1, , ~uk }.We call a collection of the form span{~u1, , ~uk } a subspace of Rn .Consider the following example.Example 4.89: Span of VectorsDescribe the span of the vectors ~u =

1 1 0

T

and ~v =

3 2 0

T

R3 .

Solution. You can see that any linear combination of the vectors ~u and ~v yields a vector

Tx y 0in the XY -plane.

210

Moreover every vector in the XY -plane is in fact such a linear combination of the vectors~u and ~v. Thats because

The span of a set of a vectors in Rn is what we call a subspace of Rn . A subspace W

is characterized by the feature that any linear combination of vectors of W is again a vectorcontained in W .Another important property of sets of vectors is called linear independence.Definition 4.90: Linearly Independent Set of VectorsA set of non-zero vectors {~u1, , ~uk } in Rn is said to be linearly independent ifno vector in that set is in the span of the other vectors of that set.Here is an example.Example 4.91: Linearly Independent Vectors

T

T

TConsider vectors ~u = 1 1 0 , ~v = 3 2 0 , and w~ = 4 5 0 R3 .Verify whether the set {~u, ~v, w}~ is linearly independent.Solution. We already verified in Example 4.89 that span{~u, ~v } is the XY -plane. Since w~ isclearly also in the XY -plane, then the set {~u, ~v, w}~ is not linearly independent.In terms of spanning, a set of vectors is linearly independent if it does not containunnecessary vectors. In the previous example you can see that the vector w~ does not helpto span any new vector not already in the span of the other two vectors. However you canverify that the set {~u, ~v } is linearly independent, since you will not get the XY -plane as thespan of a single vector.We can also determine if a set of vectors is linearly independent by examining linearcombinations. A set of vectors is linearly independent if and only if whenever a linearcombination of these vectors equals zero, it follows that all the coefficients equal zero. Itis a good exercise to verify this equivalence, and this latter condition is often used as the(equivalent) definition of linear independence.If a subspace is spanned by a linearly independent set of vectors, then we say that it isa basis for the subspace.

211

Definition 4.92: Basis of a Subspace

Let V be a subspace of Rn . Then {~u1 , , ~uk } is a basis for V if the following twoconditions hold.1. span {~u1 , , ~uk } = V2. {~u1 , , ~uk } is linearly independentThus the set of vectors {~u, ~v } from Example 4.91 is a basis for XY -plane in R3 since itis both linearly independent and spans the XY -plane.We can now discuss what is meant by an orthogonal set of vectors. We saw in a previoussection (see Proposition 4.34) that two vectors ~u and ~v are orthogonal if ~u ~v = 0. This ideacan be extended to a set of vectors as follows.Definition 4.93: Orthogonal Set of VectorsLet {~u1 , ~u2, , ~um } be a set of vectors in Rn . Then this set is called an orthogonalset if the following conditions hold:1. ~ui ~uj = 0 for all i 6= j2. ~ui 6= ~0 for all iIf we have an orthogonal set of vectors and normalize each vector so they have length 1,the resulting set is called an orthonormal set of vectors. They can be described as follows.Definition 4.94: Orthonormal Set of VectorsA set of vectors, {w~ 1, , w~ m } is said to be an orthonormal set if

1 if i = jw~i w~ j = ij =0 if i 6= jNote that all orthonormal sets are orthogonal, but the reverse is not necessarily truesince the vectors may not be normalized. In order to normalize the vectors, we simply needdivide each by its length.If an orthogonal set is a basis for a subspace, we call this an orthogonal basis. Similarly,if an orthonormal set is a basis, we call this an orthonormal basis.

212

Example 4.95: Orthonormal Set

Consider the set of vectors given by{~u1 , ~u2 } =

11

1,1

Show that it is an orthogonal set of vectors but not an orthonormal one. Find thecorresponding orthonormal set.Solution. One easily verifies that ~u1 ~u2 = 0 and {~u1 , ~u2 } isan orthogonal set of vectors.On the other hand one can compute that k~u1 k = k~u2 k = 2 6= 1 and thus it is not anorthonormal set.Thus to find a corresponding orthonormal set, we simply need to normalize each vector.We will write {w~ 1, w~ 2} for the corresponding orthonormal set. Then,1~u1k~u1k

11= 2 1" 1 #

w~1 =

212

=Similarly,

1~u2k~u2 k

11= 12#"1 2=1

w~2 =

Therefore the corresponding orthonormal set is

(" 1 # "{w~ 1, w~ 2} =

212

1212

#)

You can verify that this set is orthogonal.

Consider an orthonormal set of vectors in Rn , written {w~ 1, , w~ k } with k n. Thespan of these vectors is a subspace W of Rn . If we could show that this orthonormal setis also linearly independent, we would have a basis of W . We will show this in the nexttheorem.

Since the set is orthonormal, we know that kw

~ 1 k2 = 1. It follows that a1 = 0.We can continue in this fashion for the rest of the vectors in the set, and determine thatai = 0 for all i = 1, 2, , k. Therefore the set {w~ 1, w~ 2, , w~ k } is linearly independent.Finally since W = span{w~ 1, w~ 2, , w~ k }, the set of vectors also spans W and thereforeforms a basis of W .

4.11.2. Orthogonal Matrices

Recall that the process to find the inverse of a matrix was often cumbersome. In contrast,it was very easy to take the transpose of a matrix. Luckily for some special matrices, thetranspose equals the inverse. When an n n matrix has all real entries and its transposeequals its inverse, the matrix is called an orthogonal matrix.The precise definition is as follows.Definition 4.97: Orthogonal MatricesA real n n matrix U is called an orthogonal matrix if UU T = U T U = I.Note that by Theorem 2.63 it suffices to verify only one of these equalities UU T = I orU U = I.Consider the following example.T

214

Example 4.98: Orthogonal Matrix

Show the matrixU=

"

1212

12 12

is orthogonal.Solution. All we need to do is verify (one of the equations from) the requirements of Definition4.97.#" 1# " 1

1000 1 . Is U orthogonal?Let U = 00 10Solution. Again the answer is yes and this can be verified simply by showing that U T U = I:

T 1001000 1 0 1 0UT U = 00 100 10

1001000 1 00 1 = 00 100 10

1 0 0= 0 1 0 0 0 1

When we say that U is orthogonal, we are saying that

XXuij uTjk =uij ukj = ikj

In words, the product of the ith row of U with the k th row gives 1 if i = k and 0 if i 6= k.The same is true of the columns because U T U = I also. Therefore,XXuTij ujk =uji ujk = ikj

215

which says that the product of one column with another column gives 1 if the two columnsare the same and 0 if the two columns are different.More succinctly, this states that if ~u1 , , ~un are the columns of U, an orthogonal matrix,then

1 if i = j~ui ~uj = ij =0 if i 6= j

We will say that the columns form an orthonormal set of vectors, and similarly forthe rows. Thus a matrix is orthogonal if its rows (or columns) form an orthonormalset of vectors. Notice that the convention is to call such a matrix orthogonal rather thanorthonormal (although this may make more sense!).Proposition 4.100: Orthonormal BasisThe rows of an n n orthogonal matrix form an orthonormal basis of Rn . Further,any orthonormal basis of Rn can be used to construct an n n orthogonal matrix.

Proof. Recall from Theorem 4.96 that an orthonormal set is linearly independent and formsa basis for its span. Since the rows of an n n orthogonal matrix form an orthonormalset, they must be linearly independent. Now we have n linearly independent vectors, and itfollows that their span equals Rn . Therefore these vectors form an orthonormal basis for Rn .Suppose we have an orthonormal basis for Rn . Since the basis will contain n vectors,these can be used to construct an n n matrix, with each vector becoming a row. Thereforethe matrix is composed of orthonormal rows, which by our above discussion, means that thematrix is orthogonal.Consider the following proposition.Proposition 4.101: Determinant of Orthogonal MatricesSuppose U is an orthogonal matrix. Then det (U) = 1.Proof. This result follows from the properties of determinants. Recall that for any matrixA, det(A)T = det(A). Consider

Orthogonal matrices are divided into two classes, proper and improper. The proper orthogonal matrices are those whose determinant equals 1 and the improper ones are thosewhose determinants equal 1. The reason for the distinction is that the improper orthogonal matrices are sometimes considered to have no physical significance. These matricescause a change in orientation which would correspond to material passing through itself in anon physical manner. Thus in considering which coordinate systems must be considered incertain applications, you only need to consider those which are related by a proper orthogonal transformation. Geometrically, the linear transformations determined by the properorthogonal matrices correspond to the composition of rotations.216

4.11.3. Gram-Schmidt Process

The Gram-Schmidt process is an algorithm to transform a set of vectors into an orthonormalset generating the same collection of linear combinations (see Definition 1.32).The goal of the Gram-Schmidt process is to take a linearly independent set of vectors andtransform it into an orthonormal set with the same span. The first objective is to constructan orthogonal set of vectors with the same span, since from there an orthonormal set can beobtained by simply dividing each vector by its length.Algorithm 4.102: Gram-Schmidt ProcessLet {~u1 , , ~un } be a set of linearly independent vectors in Rn .I: Construct a new set of vectors {~v1 , , ~vn } as follows:~v1 = ~u1~v2~v3...

~v1 ~v2 = ~v1 (~u2 a2~v1 )

Now that you have shown that {~v1 , ~v2 } is orthogonal, use the same method as above to showthat {~v1 , ~v2 , ~v3 } is also orthogonal, and so on.217

Then in a similar fashion you show that span {~u1 , , ~un } = span {~v1 , , ~vn }.~viFinally defining w~i =for i = 1, , n does not affect orthogonality and yieldsk~vi kvectors of length 1, hence an orthonormal set. You can also observe that it does not affectthe span either and the proof would be complete.Consider the following example.Example 4.103: Find Orthonormal Set with Same SpanConsider the set of vectors {~u1, ~u2 } given as in Example 4.89. That is

13~u1 = 1 , ~u2 = 2 R300

Find an orthonormal set of vectors {w

~ 1, w~ 2 } having the same span.

Solution. We already remarked that the set of vectors in {~u1 , ~u2 } is linearly independent, sowe can proceed with the Gram-Schmidt algorithm:

1~v1 = ~u1 = 1 0

~u2 ~v1~v1~v2 = ~u2 k~v1 k2

135

12 =200

12

= 12 0

218

Now simply let

w~1 =

~v1k~v1 k 1

w~2 =

21

~v2k~v2 k

= 12 0

You can verify that {w

~ 1, w~ 2 } is an orthonormal set of vectors having the same span as{~u1 , ~u2}, namely the XY -plane.

4.11.4. Orthogonal Projections

An important use of the Gram-Schmidt Process is in orthogonal projections, the focus ofthis section.You may recall that a subspace of Rn is a set of vectors which contains the zero vector,and is closed under addition and scalar multiplication. Lets call such a subspace W . Inparticular, a plane in Rn which contains the origin, (0, 0, , 0), is a subspace of Rn .Suppose a point Y in Rn is not contained in W . What point Z in W is closest to Y ?Using the Gram-Schmidt Process, we can find such a point. Let ~y , ~z represent the positionvectors of the points Y and Z respectively, with ~y ~z representing the vector connectingthe two points Y and Z. It will follow that if Z is the point on W closest to Y , then ~y ~zwill be perpendicular to W ; in other words, ~y ~z is orthogonal to W (and to every vectorcontained in W ) as in the following diagram.Y~y

~y ~zZ

W~z

The vector ~z is called the orthogonal projection of ~y on W . The definition is given as

follows.219

Definition 4.104: Orthogonal Projection

Let W be a subspace of Rn , and Y be any point in Rn . Then the orthogonal projectionof Y onto W is given by

~y w~1~y w~2~y w~m~z = projW (~y ) =w~1 +w~2 + +w~mkw~ 1 k2kw~ 2 k2kw~ m k2where {w~ 1, w~ 2, , w~ m } is any orthogonal basis of W .Therefore, in order to find the orthogonal projection, we must first find an orthogonalbasis for the subspace. Note that one could use an orthonormal basis, but it is not necessaryin this case since as you can see above the normalization of each vector is included in theformula for the projection.Before we explore this further through an example, we show that the orthogonal projection yields a point Z (the point whose position vector is the vector ~z above) which is thepoint of W closest to Y .Theorem 4.105: Approximation TheoremLet W be a subspace of Rn and Y any point in Rn . Let Z be the point whose positionvector is the orthogonal projection of Y onto W .Then, Z is the point in W closest to Y .Proof. To show that Z is the point in W closest to Y , we wish to show that |~y ~z1 | > |~y ~z |for all ~z1 6= ~z W . We begin by writing ~y ~z1 = (~y ~z ) + (~z ~z1 ). Now, the vector ~y ~zis orthogonal to W , and ~z ~z1 is contained in W . Therefore these vectors are orthogonal toeach other. By the Pythagorean Theorem, we have thatk~y ~z1 k2 = k~y ~zk2 + k~z ~z1 k2 > k~y ~z k2This follows because ~z 6= ~z1 so k~z ~z1 k2 > 0.Hence, k~y ~z1 k2 > k~y ~zk2 . Taking the square root of each side, we obtain the desiredresult.Consider the following example.Example 4.106: Orthogonal ProjectionLet W be a plane given by points satisfying a 2b + c = 0. Find the point in W closestto the point Y = (1, 0, 3).Solution. We must first find an orthogonal basis for W . Notice that W is characterized byall points (a, b, c) where c = 2b a. In other words,

a , a, b RbW =2b a220

We can write W asW = span {~u1 , ~u2}

10

0 , 1 = span

12

Notice that this span is a basis of W as it is linearly independent. We will use theGram-Schmidt Process to convert this to an orthogonal basis, {w~ 1, w~ 2}. In this case, it isonly necessary to find an orthogonal basis, and it is not required that it be orthonormal.

1w~ 1 = ~u1 = 0 1w~2 ==

~u2 w~1w~1~u2 kw~ 1 k2

10 1 2 0 212

01 1 + 0 21

1 1 1

Therefore an orthogonal basis of W is

{w~ 1, w~ 2} =

11

0 , 1

11

We can now use this basis to find the orthogonal projection of the point Y = (1, 0, 3) on1the subspace W . We will write the position vector ~y of Y as ~y = 0 . Using Definition3

221

4.104, we continue as follows:

~z = projW (~y )

~y w~2~y w~1w~1 +w~2=kw~ 1 k2kw~ 2 k2

1

14 2 0 +1=2311 1

34373

Therefore the point on W closest to the point (1, 0, 3) is

1 4 7, ,3 3 3

Recall that the vector ~y ~z is perpendicular (orthogonal) to all the vectors contained inthe plane W . Using a basis for W , we can in fact find all such vectors which are perpendicularto W . We call this set of vectors the orthogonal complement of W and denote it W .Definition 4.107: Orthogonal ComplementLet W be a subspace of Rn . Then the orthogonal complement of W , written W , isthe set of all vectors ~x such that ~x ~z = 0 for all vectors ~z in W .W = {~x Rn such that ~x ~z = 0 for all ~z W }In the next example, we will look at how to find W .Example 4.108: Orthogonal ComplementLet W be a plane given by points satisfying a 2b + c = 0. Find the orthogonalcomplement of W .Solution.From Example 4.106 we know that we can write W as

10

0 , 1 W = span {~u1 , ~u2 } = span

12

In order to find W , we need to find all ~x which are orthogonal to every vector in thisspan.

Recall that Z is the point in W closest to the point Y . Notice that since the origin iscontained in W , the position vector ~z of Z is contained in W . The vector ~y ~z is in W . Now,let Z1 be any other point in W not equal to Z. Then it follows that the distance betweenY and Z is shorter than that between Y and Z1 for all Z1 . These results are summarized inthe following important theorem.Theorem 4.109: Orthogonal ProjectionLet W be a subspace of Rn , Y be any point in Rn , and let Z be the point in W closestto Y . Then,1. The position vector ~z of the point Z is given by ~z = projW (~y )2. ~z W and ~y ~z W 3. |Y Z| < |Y Z1 | for all Z1 6= Z WWe conclude this section with a final example.

223

Example 4.110: Vector Written as a Sum of Two Vectors

20Find the point Z in W closest to Y , and moreover write ~y as the sum of a vector inW and a vector in W .Solution. From Theorem 4.105, the point Z in W closest to Y is given by ~z = projW (~y ).Notice that since the above vectors already give an orthogonal basis for W , we have:~z = projW (~y )

~y w~1~y w~2=w~1 +w~2kw~ 1 k2kw~ 2 k2

1 0

4 0 + 10 1 =2 1 5 0 02

2 2

= 2 4

Therefore the point in W closest to Y is Z = (2, 2, 2, 4).

Now, we need to write ~y as the sum of a vector in W and a vector in W . This can easilybe done as follows:~y = ~z + (~y ~z)

since ~z is in W and as we have seen ~y ~z is in W .

The vector ~y ~z is given by

121 2 2 0

~y ~z = 3 2 = 1 044Therefore, we can write ~y as

21 2 2 = 3 244

1 0

+ 1 0

224

4.11.5. Least Squares Approximation

It should not be surprising to hear that many problems do not have a perfect solution, and inthese cases the objective is always to try to do the best possible. For example what does onedo if there are no solutions to a system of linear equations A~x = ~b? It turns out that whatwe do is find ~x such that A~x is as close to ~b as possible. A very important technique thatfollows from orthogonal projections is that of the least square approximation, and allows usto do exactly that.We begin with a lemma.Lemma 4.111: Matrix Image SubspaceLet A be an m n matrix and let A (Rn ) denote the set of vectors in Rm which are ofthe form A~x for some ~x Rn . Then A (Rn ) is a subspace of Rm .Proof. Let A~x and A~y be two vectors of A (Rn ) . It suffices to verify that if a, b are scalars,then aA~x + bA~y is also in A (Rn ) . But aA~x + bA~y = A (a~x + b~y ) because A is linear. Since(a~x + b~y ) is a vector in Rn , it follows that A (a~x + b~y ) is in Rm as required.The following theorem is a rewording of Theorem 4.109 using the subspace W = A (Rn )and gives the equivalence of an orthogonality condition with a minimization condition. Thefollowing picture illustrates this orthogonality condition and geometric meaning of this theorem.~y

Lemma 4.113: Transpose and Dot Product

The next corollary gives the technique of least squares.

Corollary 4.114: Least Squares and Normal EquationA specific value of ~x which solves the problem of Theorem 4.112 is obtained by solvingthe equationAT A~x = AT ~yFurthermore, there always exists a solution to this system of equations.Proof. For ~x the minimizer of Theorem 4.112, (~y A~x) A~u = 0 for all ~u Rn and fromLemma 4.113, this is the same as sayingAT (~y A~x) ~u = 0for all u Rn . This implies

AT ~y AT A~x = ~0.

Therefore, there is a solution to the equation of this corollary, and it solves the minimizationproblem of Theorem 4.112.Note that ~x might not be unique but A~x, the closest point of A (Rn ) to ~y is unique aswas shown in the above argument.An important application of Corollary 4.114 is the problem of finding the least squaresregression line in statistics. Suppose you are given points in the xy plane{(x1 , y1 ) , (x2 , y2 ) , , (xn , yn )}and you would like to find constants m and b such that the line ~y = m~x + b goes throughall these points. Of course this will be impossible in general. Therefore, we try to find m, bsuch that the line will be as close as possible. The desired system is

x1 1 y1

.. .. .. m . = . . bxn 1yn226

which is of the form ~y = A~x. It is desired to choose m and b to make

The least squares regression line for the set of data points is:~y = ~x + .8One could use this line to approximate other values for the data. For example for x = 6one could use y(6) = 6 + .8 = 6.8 as an approximate value for the data.The following diagram shows the data points and the corresponding regression line.6 y4

2x1

Regression LineData Points

One could clearly do a least squares fit for curves of the form y = ax2 + bx + c in thesame way. In this case you want to solve as well as possible for a, b, and c the system

x21 x1 1y1a .... .. b = .. . . . . 2cxn xn 1ynand one would use the same technique as above. Many other similar problems are important,including many in higher dimensions and they are all solved the same way.

4.11.6. Exercises1. Here are some matrices. Label according to whether they are symmetric, skew symmetric, or orthogonal.

1 0(a)

01212

12

228

1 2 3(b) 2 14 3 47

0 2 30 4 (c) 2340

2. For U an orthogonal matrix, explain why kU~xk = k~xk for any vector x.~ Next explainwhy if U is an n n matrix with the property that kU~xk = k~xk for all vectors, ~x, thenU must be orthogonal. Thus the orthogonal matrices are exactly those which preservelength.3. Suppose U is an orthogonal n n matrix. Explain why rank (U) = n.4. Fill in the missing entries to make the matrix orthogonal. 1 1 1

12

63

5. Fill in the missing entries to make the matrix orthogonal.

2122

23 2 6

306. Fill in the missing entries to make the matrix orthogonal. 1

253 2

45157. Find an orthonormal basis for the span of each of the following sets of vectors.

371

4 , 1 , 7 (a)001

3111(b) 0 , 0 , 1 427

357(c) 0 , 0 , 1 4101229

8. Using the Gram Schmidt process find an orthonormal basis for the following span:

OutcomesA. Apply the concepts of vectors in Rn to the applications of physics and work.

230

4.12.1. Vectors and Physics

Suppose you push on something. Then, your push is made up of two components, how hardyou push and the direction you push. This illustrates the concept of force.Definition 4.116: ForceForce is a vector. The magnitude of this vector is a measure of how hard it is pushing.It is measured in units such as Newtons or pounds or tons. The direction of this vectoris the direction in which the push is taking place.Vectors are used to model force and other physical vectors like velocity. As with allvectors, a vector modeling force has two essential ingredients, its magnitude and its direction.Recall the special vectors which point along the coordinate axes. These are given by~ei = [0 0 1 0 0]Twhere the 1 is in the ith slot and there are zeros in all the other spaces. The direction of ~eiis referred to as the ith direction.Consider the following picture which illustrates the case of R3 . Recall that in R3 , we mayrefer to these vectors as ~i, ~j, and ~k.z~e3~e2

~e1

Given a vector ~u = [u1 un ]T , it follows that

~u = u1~e1 + + un~en =

nX

ui~ei

k=1

What does addition of vectors mean physically? Suppose two forces are applied to someobject. Each of these would be represented by a force vector and the two forces actingtogether would yield an overall force acting on the objectPwhich would also beP a force vectorknown as the resultant. Suppose the two vectors are ~u = nk=1 ui~ei and ~v = nk=1 vi~ei . Thenthe vector ~u involves a component in the ith direction given by ui~ei , while the componentin the ith direction of ~v is vi~ei . Then the vector ~u + ~v should have a component in the ith

Thus the addition of vectors according to the rules of addition in Rn which were presentedearlier, yields the appropriate vector which duplicates the cumulative effect of all the vectorsin the sum.Consider now some examples of vector addition.Example 4.117: The Resultant of Three ForcesThere are three ropes attached to a car and three people pull on these ropes. The firstexerts a force of F~1 = 2~i+3~j 2~k Newtons, the second exerts a force of F~2 = 3~i+5~j + ~kNewtons and the third exerts a force of 5~i ~j + 2~k Newtons. Find the total force inthe direction of ~i.Solution. To find the total force, we add the vectors as described above. This is given by(2~i + 3~j 2~k) + (3~i + 5~j + ~k) + (5~i ~j + 2~k)= (2 + 3 + 5)~i + (3 + 5 + 1)~j + (2 + 1 + 2)~k= 10~i + 7~j + ~kHence, the total force is 10~i + 7~j + ~k Newtons. Therefore, the force in the ~i direction is 10Newtons.Consider another example.Example 4.118: Finding a Vector from Geometric DescriptionAn airplane flies North East at 100 miles per hour. Write this as a vector.Solution.A picture of this situation follows.

Therefore, we need to find the vector ~u which has length 100 and direction as shown inthis diagram. We can consider the vector ~u as the hypotenuse of a right triangle having232

equal sides, since the direction of ~u corresponds with the 45 line. The sides, correspondingto the ~i and ~j directions, should be each of length 100/ 2. Therefore, the vector is given byh 100 100 iT100100

~u = ~i + ~j =2222

This example also motivates the concept of velocity, defined below.

Definition 4.119: Speed and VelocityThe speed of an object is a measure of how fast it is going. It is measured in unitsof length per unit time. For example, miles per hour, kilometers per minute, feetper second. The velocity is a vector having the speed as the magnitude but alsospecifying the direction.Thus the velocity vector in the above example is

100 ~ ~i + 100j,22

while the speed is 100 miles

per hour.Consider the following example.Example 4.120: Position From Velocity and TimeThe velocity of an airplane is 100~i + ~j + ~k measured in kilometers per hour and at acertain instant of time its position is (1, 2, 1) .Find the position of this airplane one minute later.Solution. Here imagine a Cartesian coordinate system in which the third component isaltitude and the first and second components are measured on a line from West to East anda line from South to North.

TConsider the vector 1 2 1, which is the initial position vector of the airplane. Asthe plane moves, the position vector changes according to the velocity vector. After one1minute (considered as 60of an hour) the airplane has moved in the ~i direction a distance of11100 60= 53 kilometer. In the ~j direction it has moved 60kilometer during this same time,1~while it moves 60 kilometer in the k direction. Therefore, the new displacement vector forthe airplane is

T 5 1 1 T 8 121 121 T1 2 1= 3 60 60+ 3 60 60Now consider an example which involves combining two velocities.Example 4.121: Sum of Two VelocitiesA certain river is one half kilometer wide with a current flowing at 4 kilometers perhour from East to West. A man swims directly toward the opposite shore from theSouth bank of the river at a speed of 3 kilometers per hour. How far down the riverdoes he find himself when he has swam across? How far does he end up swimming?

233

Solution. Consider the following picture which demonstrates the above scenario.34

First we want to know the total time of the swim across the river. The velocity in thedirection across the river is 3 kilometers per hour, and the river is 12 kilometer wide. Itfollows the trip takes 1/6 hour or 10 minutes.Now, we can compute how far downstream he will end up. Since the river runs at a rateof 4 kilometershour, and the trip takes 1/6 hour, the distance traveled downstream is per12given by 4 6 = 3 kilometers.The distance traveled by the swimmer is given by the hypotenuse of a right triangle. Thetwo arms of the triangle are given by the distance across the river, 12 km, and the distancetraveled downstream, 32 km. Then, using the Pythagorean Theorem, we can calculate thetotal distance d traveled.s 22512+= kmd=326Therefore, the swimmer travels a total distance of

56

kilometers.

4.12.2. WorkThe mathematical concept of work is an application of vectors in Rn . The physical conceptof work differs from the notion of work employed in ordinary conversation. For example,suppose you were to slide a 150 pound weight off a table which is three feet high and shufflealong the floor for 50 yards, keeping the height always three feet and then deposit this weighton another three foot high table. The physical concept of work would indicate that the forceexerted by your arms did no work during this project. The reason for this definition is thateven though your arms exerted considerable force on the weight, the direction of motion wasat right angles to the force they exerted. The only part of a force which does work in thesense of physics is the component of the force in the direction of motion.Work is defined to be the magnitude of the component of this force times the distanceover which it acts, when the component of force points in the direction of motion. In thecase where the force points in exactly the opposite direction of motion work is given by (1)times the magnitude of this component times the distance. Thus the work done by a forceon an object as the object moves from one point to another is a measure of the extent towhich the force contributes to the motion. This is illustrated in the following picture in thecase where the given force contributes to the motion.

234

F~

F~

QF~||

Recall that for any vector ~u in Rn , we can write ~u as a sum of two vectors, as in~u = ~u|| + ~uFor any force F~ , we can write this force as the sum of a vector in the direction of the motionand a vector perpendicular to the motion. In other words,F~ = F~|| + F~In the above picture the force, F~ is applied to an object which moves on the straightline from P to Q. There are two vectors shown, F~|| and F~ and the picture is intended toindicate that when you add these two vectors you get F~ . In other words, F~ = F~|| + F~ .Notice that F~|| acts in the direction of motion and F~ acts perpendicular to the direction ofmotion. Only F~|| contributes to the work done by F~ on the object as it moves from P to Q.F~|| is called the component of the force in the direction of motion. From trigonometry, yousee the magnitude of F~|| should equal kF~ k |cos | . Thus, since F~|| points in the direction ofthe vector from P to Q, the total work done should equal

kF~ kkP Qk cos = kF~ kk~q p~k cos

Now, suppose the included angle had been obtuse. Then the work done by the force F~on the object would have been negative because F~|| would point in 1 times the direction ofthe motion. In this case, cos would also be negative and so it is still the case that the workdone would be given by the above formula. Thus from the geometric description of the dotproduct given above, the work equalskF~ kk~q p~k cos = F~ (~q ~p)This explains the following definition.Definition 4.122: Work Done on an Object by a ForceLet F~ be a force acting on an object which moves from the point P to the point Q,which have position vectors given by ~p and ~q respectively. Then the work done onthe object by the given force equals F~ (~q ~p) .Consider the following example.

235

Example 4.123: Finding Work

TLet F~ = 2 7 3Newtons. Find the work done by this force in moving fromthe point (1, 2, 3) to the point (9, 3, 4) along the straight line segment joining thesepoints where distances are measured in meters.Solution. First, compute the vector ~q ~p, given by

T T T9 3 4 1 2 3= 10 5 1

According to Definition 4.122 the work done is

T T2 7 3 10 5 1= 20 + (35) + (3)= 58 Newton meters

Note that if the force had been given in pounds and the distance had been given in feet,the units on the work would have been foot pounds. In general, work has units equal tounits of a force times units of a length. Recall that 1 Newton meter is equal to 1 Joule. Alsonotice that the work done by the force can be negative as in the above example.

4.12.3. Exercises1. The wind blows from the South at 20 kilometers per hour and an airplane which fliesat 600 kilometers per hour in still air is heading East. Find the velocity of the airplaneand its location after two hours.2. The wind blows from the West at 30 kilometers per hour and an airplane which fliesat 400 kilometers per hour in still air is heading North East. Find the velocity of theairplane and its position after two hours.3. The wind blows from the North at 10 kilometers per hour. An airplane which flies at300 kilometers per hour in still air is supposed to go to the point whose coordinatesare at (100, 100) . In what direction should the airplane fly?

314. Three forces act on an object. Two are 1 and 3 Newtons. Find the third14force if the object is not to move.

62

5. Three forces act on an object. Two are 3 and 1 Newtons. Find the third3 3

7force if the total force on the object is to be 1 .3236

6. A river flows West at the rate of b miles per hour. A boat can move at the rate of 8miles per hour. Find the smallest value of b such that it is not possible for the boat toproceed directly across the river.7. The wind blows from West to East at a speed of 50 miles per hour and an airplanewhich travels at 400 miles per hour in still air is heading North West. What is thevelocity of the airplane relative to the ground? What is the component of this velocityin the direction North?8. The wind blows from West to East at a speed of 60 miles per hour and an airplanecan travel travels at 100 miles per hour in still air. How many degrees West of Northshould the airplane head in order to travel exactly North?9. The wind blows from West to East at a speed of 50 miles per hour and an airplanewhich travels at 400 miles per hour in still air heading somewhat West of North so that,with the wind, it is flying due North. It uses 30.0 gallons of gas every hour. If it hasto travel 600.0 miles due North, how much gas will it use in flying to its destination?10. An airplane is flying due north at 500.0 miles per hour but it is not actually going dueNorth because there is a wind which is pushing the airplane due east at 40.0 miles perhour. After one hour, the plane starts flying 30 East of North. Assuming the planestarts at (0, 0) , where is it after 2 hours? Let North be the direction of the positive yaxis and let East be the direction of the positive x axis.11. City A is located at the origin (0, 0) while city B is located at (300, 500) where distancesare in miles. An airplane flies at 250 miles per hour in still air. This airplane wantsto fly from city A to city B but the wind is blowing in the direction of the positive yaxis at a speed of 50 miles per hour. Find a unit vector such that if the plane headsin this direction, it will end up at city B having flown the shortest possible distance.How long will it take to get there?12. A certain river is one half mile wide with a current flowing at 3.0 miles per hour fromEast to West. A man takes a boat directly toward the opposite shore from the Southbank of the river at a speed of 5.0 miles per hour. How far down the river does he findhimself when he has swam across? How far does he end up traveling?13. A certain river is one half mile wide with a current flowing at 2 miles per hour fromEast to West. A man can swim at 3 miles per hour in still water. In what directionshould he swim in order to travel directly across the river? What would the answer tothis problem be if the river flowed at 3 miles per hour and the man could swim onlyat the rate of 2 miles per hour?14. Three forces are applied to a point which does not move. Two of the forces are2~i + 2~j 6~k Newtons and 8~i + 8~j + 3~k Newtons. Find the third force.

237

15. The total force acting on an object is to be 4~i+2~j3~k Newtons. A force of 3~i1~j +8~kNewtons is being applied. What other force should be applied to achieve the desiredtotal force?16. A bird flies from its nest 8 km in the direction 65 north of east where it stops to reston a tree. It then flies 1 km in the direction due southeast and lands atop a telephonepole. Place an xy coordinate system so that the origin is the birds nest, and thepositive x axis points east and the positive y axis points north. Find the displacementvector from the nest to the telephone pole.

~ is a vector, show proj ~ F~ = kF~ k cos ~u where ~u is the unit17. If F~ is a force and DD~ where ~u = D/k~ Dk~ and is the included angle betweenvector in the direction of D,~ kF~ k cos is sometimes called the component of the force,the two vectors, F~ and D.~F~ in the direction, D.18. A boy drags a sled for 100 feet along the ground by pulling on a rope which is 20degrees from the horizontal with a force of 40 pounds. How much work does this forcedo?19. A girl drags a sled for 200 feet along the ground by pulling on a rope which is 30degrees from the horizontal with a force of 20 pounds. How much work does this forcedo?20. A large dog drags a sled for 300 feet along the ground by pulling on a rope which is 45degrees from the horizontal with a force of 20 pounds. How much work does this forcedo?21. How much work does it take to slide a crate 20 meters along a loading dock by pullingon it with a 200 Newton force at an angle of 30 from the horizontal? Express youranswer in Newton meters.22. An object moves 10 meters in the direction of ~j. There are two forces acting on thisobject, F~1 = ~i + ~j + 2~k, and F~2 = 5~i + 2~j 6~k. Find the total work done on theobject by the two forces. Hint: You can take the work done by the resultant of thetwo forces or you can add the work done by each force. Why?23. An object moves 10 meters in the direction of ~j + ~i. There are two forces acting onthis object, F~1 = ~i + 2~j + 2~k, and F~2 = 5~i + 2~j 6~k. Find the total work done on theobject by the two forces. Hint: You can take the work done by the resultant of thetwo forces or you can add the work done by each force. Why?24. An object moves 20 meters in the direction of ~k + ~j. There are two forces acting onthis object, F~1 = ~i + ~j + 2~k, and F~2 = ~i + 2~j 6~k. Find the total work done on theobject by the two forces. Hint: You can take the work done by the resultant of thetwo forces or you can add the work done by each force.

238

5. Linear Transformations5.1 Linear Transformations

OutcomesA. Understand the definition of a linear transformation, and that all linear transformations are determined by matrix multiplication.Recall that when we multiply an m n matrix by an n 1 column vector, the result isan m 1 column vector. In this section we will discuss how, through matrix multiplication,an m n matrix transforms an n 1 column vector into an m 1 column vector.Recall that the n 1 vector given by

x1 x2

~x = .. . xn

is said to belong to Rn , which is the set of all n 1 vectors. In this section, we will discusstransformations of vectors in Rn .Consider the following example.Example 5.1: A Function Which Transforms Vectors

1 2 0Consider the matrix A =. Show that by matrix multiplication A trans2 1 0forms vectors in R3 into vectors in R2 .

Solution. First, recall that vectors in R3 are vectors of size 3 1, while vectors in R2 are ofsize 2 1. If we multiply A, which is a 2 3 matrix, by a 3 1 vector, the result will be a2 1 vector.Thiswhat we mean when we say that A transforms vectors.xNow, for y in R3 , multiply on the left by the given matrix to obtain the new vector.z239

This product looks like

1 2 02 1 0

xx+2y y =2x + yz

The resulting product is a 2 1 vector which is determined by the choice of x and y. Hereare some numerical examples.

1

1 2 0 52 =2 1 043

153Here, the vector 2 in R was transformed by the matrix into the vectorin R2 .43Here is another example:

101 2 0 205 =2 1 0253The idea is to define a function which takes vectors in R3 and delivers new vectors in R2 .In this case, that function is multiplication by the matrix A.Let T denote such a function. The notation T : Rn 7 Rm means that the function Ttransforms vectors in Rn into vectors in Rm . The notation T (~x) means the transformationT applied to the vector ~x. The above example demonstrated a transformation achieved bymatrix multiplication. In this case, we often writeTA (~x) = A~xTherefore, TA is the transformation determined by the matrix A. In this case we say that Tis a matrix transformation.Recall the properties of matrix multiplication. The pertinent property here is 2.6 whichstates that for k and p scalars,A (kB + pC) = kAB + pACIn particular, for A an m n matrix and B and C, n 1 vectors in Rn , this formula holds.In other words, this means that matrix multiplication gives an example of a linear transformation, which we will now define.Definition 5.2: Linear TransformationLet T : Rn 7 Rm be a function, where for each ~x Rn , T (~x) Rm . Then T is alinear transformation if whenever k, p are scalars and ~x1 and ~x2 are vectors in Rn(n 1 vectors),T (k~x1 + p~x2 ) = kT (~x1 ) + pT (~x2 )

240

We began this section by discussing matrix transformations, where multiplication by a

matrix transforms vectors. These matrix transformations are in fact linear transformations.Theorem 5.3: Matrix Transformations are Linear TransformationLet T : Rn 7 Rm be a transformation defined by T (~x) = A~x. Then T is a lineartransformation.It turns out that every linear transformation can be expressed as a matrix transformation,and thus linear transformations are exactly the same as matrix transformations. This willbe the content of the next section.

5.1.1. Exercises1. Show the map T : Rn 7 Rm defined by T (~x) = A~x where A is an m n matrix and~x is an m 1 column vector is a linear transformation.2. Show that the function T~v defined by T~v (w)~ =w~ proj~v (w)~ is also a linear transformation.3. Let ~u be a fixed vector. The function T~u defined by T~u~v = ~u + ~v has the effect oftranslating all vectors by adding ~u 6= ~0. Show this is not a linear transformation.Explain why it is not possible to represent T~u in R3 by multiplying by a 3 3 matrix.

5.2 The Matrix of a Linear

Transformation

OutcomesA. Find the matrix of a linear transformation and determine the action on a vectorin Rn .In the above examples, the action of the linear transformations was to multiply by amatrix. It turns out that this is always the case for linear transformations. If T is any lineartransformation which maps Rn to Rm , there is always an m n matrix A with the propertythatT (~x) = A~x(5.1)for all ~x Rn .

241

Theorem 5.4: Matrix of a Linear Transformation

Let T : Rn 7 Rm be a linear transformation. Then we can find a matrix A such thatT (~x) = A~x. In this case, we say that T is determined or induced by the matrix A.Here is why. Suppose T : Rn 7 Rm is a linear transformation and you want to find thematrix defined by this linear transformation as described in 5.1. Note that

001x1n 0 X 1 0 x2

xi~ei~x = .. = x1 .. + x2 .. + + xn .. = . i=1 . . . 100xn

where ~ei is the ith column of In , that is the n 1 vector which has zeros in every slot butthe ith and a 1 in this slot.Then since T is linear,T (~x) =

nX

xi T (~ei )

i=1

= T

= A

x1||

(~e1 ) T (~en ) ...

||xn

x1.. . xn

Therefore, the desired matrix is obtained from constructing the ith column as T (~ei ) . Westate this formally as the following theorem.Theorem 5.5: Matrix of a Linear TransformationLet T : Rn 7 Rm be a linear transformation. Then the matrix A satisfying T (~x) = A~xis given by

||A = T (~e1 ) T (~en ) ||

where ~ei is the ith column of In , and then T (~ei ) is the ith column of A.The following Corollary is an essential result.Corollary 5.6: Matrix and Linear Transformation

A transformation T is a linear transformation if and only if it is a matrix transformation.

242

Consider the following example.

Example 5.7: The Matrix of a Linear TransformationSuppose T is a linear transformation, T : R3 R2 where

100191

T 0 =, T 1 =, T 0 =231001Find the matrix A of T such that T (~x) = A~x for all ~x.

Solution. By Theorem 5.5 we construct A as follows:

||A = T (~e1 ) T (~en ) ||

In this case, A will be a 2 3 matrix, so we need to find T (~e1 ) , T (~e2 ) , and T (~e3 ).Luckily, we have been given these values so we can fill in A as needed, using these vectorsas the columns of A. Hence,

19 1A=2 3 1In this example, we were given the resulting vectors of T (~e1 ) , T (~e2 ) , and T (~e3 ). Constructing the matrix A was simple, as we could simply use these vectors as the columns ofA.The next example shows how to find A when we are not given the T (~ei ) so clearly.Example 5.8: The Matrix of Linear Transformation: InconvenientlyDefinedSuppose T is known to be a linear transformation, T : R2 R2 and

3011=, T=T2121Find the matrix A of the transformation T .Solution. By Theorem 5.5 to find this matrix, we need to determine the action of T on ~e1and ~e2 . In Example 5.7, we were given these resulting vectors. However, in this example, wehave been given T of two different vectors. How can we find out the action of T on ~e1 and~e2 ? In particular for ~e1 , suppose there exist x and y such that

011(5.2)+y=x110243

Then, since T is linear,

T

10

= xT

11

+ yT

01

Substituting in values, this sum becomes

113T=x+y022

(5.3)

Therefore, if we know the values of x and y which satisfy 5.2, we can substitute theseinto equation 5.3. By doing so, we find T (~e1 ) which is the first column of the matrix A.We proceed to find x and y. We do so by solving 5.2, which can be done by solving thesystemx=1xy =0We see that x = 1 and y = 1 is the solution to this system. Substituting these valuesinto equation 5.3, we have

431311=+=+1=1T422220

4is the first column of A.Therefore4Computing the second column is done in the same way, and is left as an exercise.The resulting matrix A is given by

4 3A=4 2This example illustrates a very long procedure for finding the matrix of A. While thismethod is reliable and will always result in the correct matrix A, the following procedureprovides an alternative method.Recall that

Then the matrix of T must be of the form

hi1~b1 ~bn~a1 ~an244

We will illustrate this procedure in the following example. You may also find it useful towork through Example 5.8 using this procedure.Example 5.10: Matrix of a Linear TransformationGiven InconvenientlySuppose T : R3 R3 is a linear transformation and

Recall the dot product discussed earlier. Consider the map ~v 7 proj~u (~v ). It turns outthat this map is linear, a result which follows from the properties of the dot product. Thisis shown as follows.

(k~v + pw)~ ~uproj~u (k~v + pw)~ =~u~u ~u

w~ ~u~v ~u~u + p~u= k~u ~u~u ~u= k proj~u (~v ) + p proj~u (w)~245

Consider the following example.

Example 5.11: Matrix of a Projection Map

1

2 and let T be the projection map T : R3 7 R3 defined by T (~v) =

Let ~u =3proj~u (~v) for any ~v R3 .1. Does this transformation come from multiplication by a matrix?2. If so, what is the matrix?Solution.1. First, we have just seen that T (~v ) = proj~u (~v) is linear. Therefore by Theorem 5.4, wecan find a matrix A such that T (~x) = A~x.2. The columns of the matrix for T are defined above as T (~ei ). It follows that T (~ei ) =proj~u (~ei ) gives the ith column of the desired matrix. Therefore, we need to find

~ei ~u~uproj~u (~ei ) =~u ~uFor the given vector ~u , this implies the columns of the desired matrix are

1111 2 3 2 ,2 ,2141414333which you can verify using Definition 4.37.

11 2143

Hence the matrix of T is

2 34 6 6 9

5.2.1. Exercises1. Consider the following functions which map Rn to Rn .(a) T multiplies the j th component of ~x by a nonzero number b.(b) T replaces the ith component of ~x with b times the j th component added to theith component.246

(c) T switches the ith and j th components.

Show these functions are linear transformations and describe their matrices A suchthat T (~x) = A~x.2. You are given a linear transformation T : Rn Rm and you know thatT (Ai ) = Biwhere

A1 An

1

exists. Show that the matrix of T is of the form

B1 Bn

A1 An

3. Suppose T is a linear transformation such

1T 2 6

1T 1 5

0T 1 2

that

4. Suppose T is a linear transformation such

1T 1 8

1T 0 6

0T 1 3

that

1

5= 1 3

1

1 =5

5= 3 2

Find the matrix of T . That is find A such that T (~x) = A~x.

1= 3 1

2= 4 1

6= 1 1

Find the matrix of T . That is find A such that T (~x) = A~x.

247

5. Suppose T is a linear transformation such

1T 3 7

1T 2 6

0T 1 2

that

6. Suppose T is a linear transformation such

1T 1 7

1T 0 6

0T 1 2

that

7. Suppose T is a linear transformation such

12 T18

1T 1 15

0T 1 4

that

3= 1 3

1= 3 3

5= 3 3

Find the matrix of T . That is find A such that T (~x) = A~x.

3= 3 3

1

2 =3

1= 3 1

Find the matrix of T . That is find A such that T (~x) = A~x.

5= 2 5

3= 3 5

2= 5 2

Find the matrix of T . That is find A such that T (~x) = A~x.

8. Consider the following functions T : R3 R2 . Show that each is a linear transformation and determine for each the matrix A such that T (~x) = A~x.248

7Solution. Using the third property in Theorem 5.12, we can find T 3 by writing9

714 3 as a linear combination of 3 and 0 .915250

Therefore we want to find a, b R such that

714 3 = a 3 + b 0 915

The necessary augmented matrix and resulting reduced row-echelon form are given by:

1 011 4 7 3 03 0 1 2 01 5 90 0

Hence a = 1, b = 2 and

714 3 = 1 3 + (2) 0 915

Now, using the third property above, we have

714T 3 = T 1 3 + (2) 0 915

14= T 3 2T 0 15

44

4 2 5 = 1 0 52

4 6

2 12

47 6 .

3 =Therefore, T

2 912

Suppose two linear transformations act on the same vector ~x, first the transformation Tand then a second transformation given by S. We can find the composite transformationthat results from applying both transformations.

251

Definition 5.14: Composition of Linear Transformations

Let T : Rk 7 Rn and S : Rn 7 Rm be linear transformations. Then the compositeof S and T isS T : Rk 7 RmThe action of S T is given by(S T )(~x) = S(T (~x)) for all ~x RkNotice that the resulting vector will be in Rm . Be careful to observe the order of transformations. We write S T but apply the transformation T first, followed by S.Theorem 5.15: Composition of TransformationsLet T : Rk 7 Rn and S : Rn 7 Rm be linear transformations such that T is induced bythe matrix A and S is induced by the matrix B. Then S T is a linear transformationwhich is induced by the matrix BA.Consider the following example.Example 5.16: Composition of TransformationsLet T be a linear transformation induced by the matrix

To check, first determine T (~x):

1 22 0

14

92

Then, compute S(T (~x)) as follows:

2492 3=220 1

Consider a composite transformation S T , and suppose that this transformation acted

such that (S T )(~x) = ~x. That is, the transformation S took the vector T (~x) and returnedit to ~x. In this case, S and T are inverses of each other. Consider the following definition.Definition 5.17: Inverse of a TransformationLet T : Rn 7 Rn and S : Rn 7 Rn be linear transformations. Suppose that for each~x Rn ,(S T )(~x) = ~xand(T S)(~x) = ~xThen, S is called an inverse of T and T is called an inverse of S. Geometrically, theyreverse the action of each other.The following theorem is crucial, as it claims that the above inverse transformations areunique.Theorem 5.18: Inverse of a TransformationLet T : Rn 7 Rn be a linear transformation induced by the matrix A. Then T hasan inverse transformation if and only if the matrix A is invertible. In this case, theinverse transformation is unique and denoted T 1 : Rn 7 Rn . T 1 is induced by thematrix A1 .Consider the following example.Example 5.19: Inverse of a TransformationLet T : R2 7 R2 be a linear transformation induced by the matrix

2 3A=3 4Show that T 1 exists and find the matrix B which it is induced by.

253

Solution. Since the matrix A is invertible, it follows that the transformation T is invertible.Therefore, T 1 exists.You can verify that A1 is given by:

7. Let T be a linear transformation and suppose T

4. Find the matrix of T 1 .3

12

09=, T18

5.4 Special Linear Transformations in R2

OutcomesA. Find the matrix of rotations and reflections in R2 and determine the action ofeach on a vector in R2 .In this section, we will examine some special examples of linear transformations in R2including rotations and reflections. We will use the geometric descriptions of vector additionand scalar multiplication discussed earlier to show that a rotation of vectors through anangle and reflection of a vector across a line are examples of linear transformations.First, consider the rotation of a vector through an angle. Such a rotation would achievesomething like the following if applied to each vector from (0, 0) to the point in the picturecorresponding to the person shown standing upright.

More generally, denote a transformation given by a rotation by T . Why is such a transformation linear? Consider the following picture which illustrates a rotation. Let ~u, ~v denotevectors.

255

T (~u) + T (~v)

T (~v )T (~u)~u + ~v

~vT (~v )

~v~u

Lets consider how to obtain T (~u + ~v). Simply, you add T (~u) and T (~v ). Here is why.If you add T (~u) to T (~v) you get the diagonal of the parallelogram determined by T (~u) andT (~v), as this action is our usual vector addition. Now, suppose we first add ~u and ~v , andthen apply the transformation T to ~u +~v . Hence, we find T (~u +~v). As shown in the diagram,this will result in the same vector. In other words, T (~u + ~v ) = T (~u) + T (~v).This is because the rotation preserves all angles between the vectors as well as theirlengths. In particular, it preserves the shape of this parallelogram. Thus both T (~u) + T (~v )and T (~u + ~v ) give the same vector. It follows that T distributes across addition of thevectors of R2 .Similarly, if k is a scalar, it follows that T (k~u) = kT (~u). Thus rotations are an exampleof a linear transformation by Definition 5.2.The following theorem gives the matrix of a linear transformation which rotates all vectorsthrough an angle of .Theorem 5.20: RotationLet R : R2 R2 be a linear transformation given by rotating vectors through anangle of . Then the matrix A of R is given by

From Theorem 5.5, we need to find R (~e1 ) and R (~e2 ), and use these as the columns ofthe matrix A of T . We can use cos, sin of the angle to find the coordinates of R (~e1 ) asshown in the above picture. The coordinates of R (~e2 ) also follow from trigonometry. Thus

sin cos , R (~e2 ) =R (~e1 ) =cos sin Therefore, from Theorem 5.5,A=

cos sin sin cos

We can also prove this algebraically without the use of the above picture. The definitionof (cos () , sin ()) is as the coordinates of the point of R (~e1 ). Now the point of the vector ~e2is exactly /2 further along the unit circle from the point of ~e1 , and therefore after rotationthrough an angle of the coordinates x and y of the point of R (~e2 ) are given by(x, y) = (cos ( + /2) , sin ( + /2)) = ( sin , cos )

Consider the following example.

1.R 2 (~x) where ~x =2Solution. By Theorem 5.20, the matrix of R 2 is given by

cos () sin ()sin ()cos ()

cos (/2) sin (/2)

sin (/2)cos (/2)

0 110

To find R 2 (~x), we multiply the matrix of R 2 by ~x as follows

0 110

12

21

We now look at an example of a linear transformation involving two angles.

Example 5.22: The Rotation Matrix of the Sum of Two AnglesFind the matrix of the linear transformation which is obtained by first rotating allvectors through an angle of and then through an angle . Hence the linear transformation rotates all vectors through an angle of + .

257

Solution. Let R+ denote the linear transformation which rotates every vector through anangle of + . Then to obtain R+ , we first apply R and then R where R is the lineartransformation which rotates through an angle of and R is the linear transformation whichrotates through an angle of . Denoting the corresponding matrices by A+ , A , and A , itfollows that for every ~uR+ (~u) = A+~u = A A~u = R R (~u)Notice the order of the matrices here!Consequently, you must have

cos cos sin sin cos sin sin cos =sin cos + cos sin cos cos sin sin = A ADont these look familiar? They are the usual trigonometric identities for the sum of twoangles derived here using linear algebra concepts.Here we have focused on rotations in two dimensions. However, you can consider rotations and other geometric concepts in any number of dimensions. This is one of the majoradvantages of linear algebra. You can break down a difficult geometrical procedure into smallsteps, each corresponding to multiplication by an appropriate matrix. Then by multiplyingthe matrices, you can obtain a single matrix which can give you numerical information onthe results of applying the given sequence of simple procedures.Linear transformations which reflect vectors across a line are a second important type oftransformations in R2 . Consider the following theorem.Theorem 5.23: ReflectionLet Qm : R2 R2 be a linear transformation given by reflecting vectors over the line~y = m~x. Then the matrix of Qm is given by

11 m22m2mm2 11 + m2Consider the following example.258

Example 5.24: Reflection in R2

Let Q2 : R2 R2 denote reflection over the line ~y = 2~x. Then Q2 is a linear1.transformation. Find the matrix of Q2 . Then, find Q2 (~x) where ~x =2Solution. By Theorem 5.23, the matrix of Q2 is given by

1 3 815=228355Consider the following example which incorporates a reflection as well as a rotation ofvectors.Example 5.25: Rotation Followed by a ReflectionFind the matrix of the linear transformation which is obtained by first rotating allvectors through an angle of /6 and then reflecting through the x axis.Solution. By Theorem 5.20, the matrix of the transformation which involves rotating throughan angle of /6 is 1

3 122cos (/6) sin (/6)

= 11sin (/6)cos (/6)322

Reflecting across the x axis is the same action as reflecting vectors over the line ~y = m~xwith m = 0. By Theorem 5.23, the matrix for the transformation which reflects all vectorsthrough the x axis is

111 m22m1 (0)22(0)10==2mm2 12(0)(0)2 10 11 + m21 + (0)2

Therefore, the matrix of the linear transformation which first rotates through /6 andthen reflects through the x axis is given by

1

1 3 13 1222210 =

11110 13

2222

259

5.4.1. Exercises1. Find the matrix for the linear transformation which rotates every vector in R2 throughan angle of /3.2. Find the matrix for the linear transformation which rotates every vector in R2 throughan angle of /4.3. Find the matrix for the linear transformation which rotates every vector in R2 throughan angle of /3.4. Find the matrix for the linear transformation which rotates every vector in R2 throughan angle of 2/3.5. Find the matrix for the linear transformation which rotates every vector in R2 throughan angle of /12. Hint: Note that /12 = /3 /4.6. Find the matrix for the linear transformation which rotates every vector in R2 throughan angle of 2/3 and then reflects across the x axis.7. Find the matrix for the linear transformation which rotates every vector in R2 throughan angle of /3 and then reflects across the x axis.8. Find the matrix for the linear transformation which rotates every vector in R2 throughan angle of /4 and then reflects across the x axis.9. Find the matrix for the linear transformation which rotates every vector in R2 throughan angle of /6 and then reflects across the x axis followed by a reflection across they axis.10. Find the matrix for the linear transformation which reflects every vector in R2 acrossthe x axis and then rotates every vector through an angle of /4.11. Find the matrix for the linear transformation which reflects every vector in R2 acrossthe y axis and then rotates every vector through an angle of /4.12. Find the matrix for the linear transformation which reflects every vector in R2 acrossthe x axis and then rotates every vector through an angle of /6.13. Find the matrix for the linear transformation which reflects every vector in R2 acrossthe y axis and then rotates every vector through an angle of /6.14. Find the matrix for the linear transformation which rotates every vector in R2 throughan angle of 5/12. Hint: Note that 5/12 = 2/3 /4.15. Find the matrix of the linear transformation which rotates every vector in R3 counterclockwise about the z axis when viewed from the positive z axis through an angle of30 and then reflects through the xy plane.260

abe a unit vector in R2 . Find the matrix which reflects all vectors across16. Let ~u =bthis vector, as shown in the following picture.

~u

acos Hint: Notice that=for some . First rotate through . Next reflectbsin through the x axis. Finally rotate through .

5.5 Linear Transformations which are One

To One or Onto

OutcomesA. Determine if a linear transformation is onto or one to one.Let T : Rn 7 Rm be a linear transformation. We define the range or image of T as theset of vectors of Rm which are of the form T (~x) (equivalently, A~x) for some ~x Rn . It iscommon to write T Rn , T (Rn ), or Im (T ) to denote these vectors.Lemma 5.26: Range of a Matrix TransformationLet A be an m n matrix where A1 , , An denote the columns of A. Then, for ax1 .. vector ~x = . in Rn ,xnnXxk AkA~x =k=1

Therefore, A (R ) is the collection of all linear combinations of these products.

Proof. This follows from the definition of matrix multiplication in Definition 2.13.This section is devoted to studying two important characterizations of linear transformations, called one to one and onto. We define them now.261

Definition 5.27: One to One

Suppose ~x1 and ~x2 are vectors in Rn . A linear transformation T : Rn 7 Rm is calledone to one (often written as 1 1) if whenever ~x1 6= ~x2 it follows that :T (~x1 ) 6= T (~x2 )Equivalently, if T (~x1 ) = T (~x2 ) , then ~x1 = ~x2 . Thus, T is 1 1 if it never takes twodifferent vectors to the same vector.The second important characterization is called onto.Definition 5.28: OntoLet T : Rn 7 Rm be a linear transformation. Then T is called onto if whenever~x2 Rm there exists ~x1 Rn such that T (~x1 ) = ~x2 .We often call a linear transformation which is one-to-one an injection. Similarly, a lineartransformation which is onto is often called a surjection.The following proposition is an important result.Proposition 5.29: One to OneLet TA : Rn 7 Rm be a linear transformation induced by the m n matrix A. ThenTA is one to one if and only if TA (~x) = ~0 implies ~x = ~0.Proof. First note that we can rewrite the statement TA (~x) = ~0 implies ~x = ~0 in terms ofthe matrix A as A~x = ~0 implies ~x = ~0. Therefore we can prove this theorem using A.We need to prove two things here. First, we will prove that if A is one to one, thenA~x = ~0 implies that ~x = ~0. Second, we will show that if A~x = ~0 implies that ~x = ~0, then itfollows that A is one to one.

~~~First note that A0 = A 0 + 0 = A~0 + A~0 and so A~0 = ~0.Now suppose A is one to one and A~x = ~0. We need to show that this implies ~x = ~0.Since A is one to one, by Definition 5.27 A can only map one vector to the zero vector ~0.Now A~x = ~0 and A~0 = ~0, so it follows that ~x = ~0. Thus if A is one to one and A~x = ~0, then~x = ~0.Next assume that A~x = ~0 implies ~x = ~0. We need to show that A is one to one. SupposeA~x = A~y . Then A~x A~y = ~0. Hence A~x A~y = A (~x ~y ) = ~0. However, we have assumedthat A~x = ~0 implies ~x = ~0. This means that whenever A times a vector equals ~0, that vectoris also equal to ~0. Therefore, ~x ~y = ~0 and so ~x = ~y . Thus A is one to one by Definition5.27.

Note that this proposition says that if A = A1 An then A is one to one if andonly if whenevernXck Ak0=k=1

262

it follows that each scalar ck = 0.

We will now take a look at an example of a one to one and onto linear transformation.Example 5.30: A One to One and Onto Linear TransformationSupposeT

xy

1 11 2

xy

Then, T : R2 R2 is a linear transformation. Is T onto? Is it one to one?

Solution. Recall that because T can be expressed as matrix multiplication, weknow that Tais a linear transformation. We will start by looking at onto. So suppose R2 . Doesb

xxaa2there exist R such that T=? If so, then sinceis an arbitraryyybbvector in R2 , it will follow that T is onto.This question is familiar to you. It is asking whether there is a solution to the equation

ax1 1=by1 2This is the same thing as asking for a solution to the following system of equations.x+y =ax + 2y = bSet up the augmented matrix and row reduce.

1 1 a1 0 2a b

1 2 b0 1 ba

(5.4)

You can see from this

point that the systemTherefore, we have shown that has asolution.

xxafor any a, b, there is asuch that T=. Thus T is onto.yybNow we want to know if T is one to one. By Lemma 5.29 it is enough to show thatA~x = 0 implies ~x = 0. Consider the system A~x = 0 given by:

0x1 1=0y1 2This is the same as the system given byx+y = 0x + 2y = 0We need to show that the solution to this system is x = 0 and y = 0. By setting up theaugmented matrix and row reducing, we end up with

1 0 00 1 0263

This tells us that x = 0 and y = 0. Returning to the original system, this says that if

1 1x0=1 2y0then

xy

00

In other words, A~x = 0 implies that ~x = 0. By Proposition 5.29, A is one to one, and soT is also one to one.We also could have seen that T is one to one from our above solution for onto. By lookingat the matrix given by 5.4, you can see that there is a uniquesolution given byx = 2a bx2a band y = b a. Therefore, there is only one vector, specifically=such thatyba

ax. Hence by Definition 5.27, T is one to one.=TbyConsider the following important definition.Definition 5.31: IsomorphismLet T : Rn 7 Rm be a linear transformation. Then T is called an isomorphism if itis both one to one and onto.The above Example 5.30 demonstrated that the given transformation T is both one toone and onto. We can now say that this transformation is an isomorphism.

5.5.1. Exercises1. Let T be a linear transformation given by

x2 1T=y0 1Is T one to one? Is T onto?2. Let T be a linear transformation given byTIs T one to one? Is T onto?

xy

1 2= 2 1 1 4

264

3. Let T be a linear transformation given by

2 01x=T1 2 1yIs T one to one? Is T onto?4. Let T be a linear transformation given by

1 3 5x2 = 2 0Ty2 4 6Is T one to one? Is T onto?

5. Give an example of a 3 2 matrix with the property that the linear transformationdetermined by this matrix is one to one but not onto.6. Suppose A is an m n matrix in which m n. Suppose also that the rank of A equalsm. Show that the transformation T determined by A maps Rn onto Rm . Hint: Thevectors ~e1 , , ~em occur as columns in the reduced row-echelon form for A.7. Suppose A is an m n matrix in which m n. Suppose also that the rank of A equalsn. Show that A is one to one. Hint: If not, there exists a vector, ~x such that A~x = 0,and this implies at least one column of A is a linear combination of the others. Showthis would require the rank to be less than n.8. Explain why an n n matrix A is both one to one and onto if and only if its rank isn.

5.6 The General Solution of a Linear System

OutcomesA. Use linear transformations to determine the particular solution and general solution to a system of equations.B. Find the kernel of a linear transformation.Recall the definition of a linear transformation discussed above. T is a linear transformation if whenever ~x, ~y are vectors and k, p are scalars,T (k~x + p~y ) = kT (~x) + pT (~y )265

Thus linear transformations distribute across addition and pass scalars to the outside.It turns out that we can use linear transformations to solve linear systems of equations.Indeed given a system of linear equations of the form A~x = ~b, one may rephrase this asT (~x) = ~b where T is the linear transformation TA induced by the coefficient matrix A. Withthis in mind consider the following definition.Definition 5.32: Particular Solution of a System of EquationsSuppose a linear system of equations can be written in the formT (~x) = ~bIf T (~xp ) = ~b, then ~xp is called a particular solution of the linear system.Recall that a system is called homogeneous if every equation in the system is equal to 0.Suppose we represent a homogeneous system of equations by T (~x) = 0. It turns out thatthe ~x for which T (~x) = 0 are part of a special set called the null space of T . We may alsorefer to the null space as the kernel of T , and we write ker (T ).Consider the following definition.Definition 5.33: Null Space or Kernel of a Linear TransformationLet T be a linear transformation. Defineno~ker (T ) = ~x : T (~x) = 0

The kernel, ker (T ) consists of the set of all vectors ~x for which T (~x) = ~0. This is alsocalled the null space of T .We may also refer to the kernel of T as the solution space of the equation T (~x) = ~0.Consider the following example.Example 5.34: The Kernel of the DerivativedLet dxdenote the linear transformation defined on f, the functions which are defineddon R and have a continuous derivative. Find ker dx.dfSolution. The example asks for functions f which the property that dx= 0. As you maydknow from calculus, these functions are the constant functions. Thus ker dxis the set ofconstant functions.

Definition 5.33 states that ker (T ) is the set of solutions to the equation,T (~x) = ~0Since we can write T (~x) as A~x, you have been solving such equations for quite some time.266

We have spent a lot of time finding solutions to systems of equations in general, as wellas homogeneous systems. Suppose we look at a system given by A~x = ~b, and consider therelated homogeneous system. By this, we mean that we replace ~b by ~0 and look at A~x = ~0.It turns out that there is a very important relationship between the solutions of the originalsystem and the solutions of the associated homogeneous system. In the following theorem,we use linear transformations to denote a system of equations. Remember that T (~x) = A~x.Theorem 5.35: Particular Solution and General SolutionSuppose ~xp is a solution to the linear system given by ,T (~x) = ~bThen if ~y is any other solution to T (~x) = ~b, there exists ~x0 ker (T ) such that~y = ~xp + ~x0Hence, every solution to the linear system can be written as a sum of a particularsolution, ~xp , and a solution ~x0 to the associated homogeneous system given by T (~x) =~0.Proof. Consider ~y ~xp = ~y + (1) ~xp . Then T (~y ~xp ) = T (~y ) T (~xp ). Since ~y and ~xp areboth solutions to the system, it follows that T (~y ) = ~b and T (~xp ) = ~b.Hence, T (~y ) T (~xp ) = ~b ~b = ~0. Let ~x0 = ~y ~xp . Then, T (~x0 ) = ~0 so ~x0 is a solutionto the associated homogeneous system and so is in ker (T ).Sometimes people remember the above theorem in the following form. The solutions tothe system T (~x) = ~b are given by ~xp + ker (T ) where ~xp is a particular solution to T (~x) = ~b.For now, we have been speaking about the kernel or null space of a linear transformationT . However, we know that every linear transformation T is determined by some matrixA. Therefore, we can also speak about the null space of a matrix. Consider the followingexample.Example 5.36: The Null Space of a MatrixLet

1 2 3 0A= 2 1 1 2 4 5 7 2

Find ker (A). Equivalently, find the solutions to the system of equations A~x = ~0.Solution. We are asked to find

z01wConsider the following example.Example 5.37: A General SolutionThe general solution of a linear system of equations is the set of all possible solutions.Find the general solution to the linear system,

x91 2 3 0

2 1 1 2 y = 7 z 254 5 7 2w

1x y 1

given that z = 21w

is one solution.

268

Solution. Note the matrix of this system is the same as the matrix in Example 5.36. Therefore, from Theorem 5.35, you will obtain all solutions to the above linear system by addinga particular solution ~xp to the solutions of the associated homogeneous system, ~x. Oneparticular solution is given above by

1 1 2x1 1 2 0 y = 2 3 4 4z47. Write the solution set of the following system as a linear combination of vectors

0 1 2x0 1

0 1y = 0 1 2 5z08. Using Problem 7 find the general solution to the following linear system.

0 1 2x1 10 1 y = 1 1 2 5z19. Write the solution set of the following

10 1 1 1 1

3 1 333 0

system

2 3

as a linear combination of vectors

0x 0 y = z 0 0w

10. Using Problem 9 find the general solution to the following linear system.

1x10 1 1 1 1 1 0 y 2

3 1 3 2 z = 4 3w33 0 311. Write the solution set of the following

1 1 0 2 1 1

1 0 10 0 0

system as a linear combination of vectors

0x1

2 y 0

=1 z 0 0w0

270

12. Using Problem 11 find the general

11 21

100 1

solution to the following linear system.

2x0 1

1 2 y r = 1

3 z1 10w1 1

13. Write the solution set of the following

11 0 1 1 1

31 133 0

system

2 3

as a linear combination of vectors

0x 0 y = z 0 0w

14. Using Problem 13 find the general solution to the following linear system.

1x11 0 1 1 1 1 0 y 2 =

31 1 2 z 4 3w33 0 315. Write the solution set of the following

11 0 21 1

10 10 1 116. Using Problem 15 find the general

11 21

100 1

system

1 1

as a linear combination of vectors

0x

y 0

=z 0 0w

solution to the following linear system.

2x0 1

1 2 y = 1 1 1 z 3 1w1 1

17. Suppose A~x = ~b has a solution. Explain why the solution is unique precisely whenA~x = ~0 has only the trivial solution.

271

272

6. Complex Numbers6.1 Complex Numbers

OutcomesA. Understand the geometric significance of a complex number as a point in theplane.B. Prove algebraic properties of addition and multiplication of complex numbers,and apply these properties. Understand the action of taking the conjugate of acomplex number.C. Understand the absolute value of a complex number and how to find it as wellas its geometric significance.Although very powerful, the real numbers are inadequate to solve equations such asx2 + 1 = 0, and this is where complex numbers come in. We define the number i as theimaginary number such that i2 = 1, and define complex numbers as those of the formz = a + bi where a and b are real numbers. We call this the standard form, or Cartesianform, of the complex number z. Then, we refer to a as the real part of z, and b as theimaginary part of z. It turns out that such numbers not only solve the above equation,but in fact also solve any polynomial of degree at least 1 with complex coefficients. Thisproperty, called the Fundamental Theorem of Algebra, is sometimes referred to by saying Cis algebraically closed. Gauss is usually credited with giving a proof of this theorem in 1797but many others worked on it and the first completely correct proof was due to Argand in1806.Just as a real number can be considered as a point on the line, a complex numberz = a + bi can be considered as a point (a, b) in the plane whose x coordinate is a andwhose y coordinate is b. For example, in the following picture, the point z = 3 + 2i can berepresented as the point in the plane with coordinates (3, 2) .z = (3, 2) = 3 + 2i

Theorem 6.3: Properties of Multiplication of Complex Numbers

Let z, w and v be complex numbers. Then, the following properties of multiplicationhold. Commutative Law for Multiplicationzw = wz Associative Law for Multiplication(zw) v = z (wv) Multiplicative Identity

1z = z

Existence of Multiplicative Inverse

For each z 6= 0, there exists z 1 such that zz 1 = 1 Distributive Law

z (w + v) = zw + zv

You may wish to verify some of these statements. The real numbers also satisfy theabove axioms, and in general any mathematical structure which satisfies these axioms iscalled a field. There are many other fields, in particular even finite ones particularly usefulfor cryptography, and the reason for specifying these axioms is that linear algebra is all aboutfields and we can do just about anything in this subject using any field. Although here, thefields of most interest will be the familiar field of real numbers, denoted as R, and the fieldof complex numbers, denoted as C.An important construction regarding complex numbers is the complex conjugate denotedby a horizontal line above the number, z. It is defined as follows.Definition 6.4: Conjugate of a Complex NumberLet z = a + bi be a complex number. Then the conjugate of z, written z is given bya + bi = a biGeometrically, the action of the conjugate is to reflect a given complex number acrossthe x axis. Algebraically, it changes the sign on the imaginary part of the complex number.Therefore, for a real number a, a = a.

Example 6.7: Division of Complex Numbers

Interestingly every nonzero complex number a + bi has a unique multiplicative inverse.

In other words, for a nonzero complex number z, there exists a number z 1 (or z1 ) so thatzz 1 = 1. Note that z = a + bi is nonzero exactly when a2 + b2 6= 0, and its inverse can bewritten in standard form as defined now.Definition 6.8: Inverse of a Complex NumberLet z = a + bi be a complex number. Then the multiplicative inverse of z, written z 1exists if and only if a2 + b2 6= 0 and is given byz 1 =

1a bia biab1=

= 2= 2i 222a + bia + bi a bia +ba +ba + b2

Note that we may write z 1 as 1z . Both notations represent the multiplicative inverse ofthe complex number z. Consider now an example.Example 6.9: Inverse of a Complex NumberConsider the complex number z = 2 + 6i. Then z 1 is defined, and11=z2 + 6i12 6i=

Another important construction of complex numbers is that of the absolute value, alsocalled the modulus. Consider the following definition.Definition 6.10: Absolute ValueThe absolute value, or modulus, of a complex number, denoted |z| is defined as follows.

= (a + c)2 + (b + d)2 = a2 + c2 + 2ac + 2bd + b2 + d2

Taking the square root, we have that

and so by the first form of the inequality we get both:

|z| |z w| + |w| , |w| |z w| + |z|Hence, both |z| |w| and |w| |z| are no larger than |z w|. This proves the secondversion because ||z| |w|| is one of |z| |w| or |w| |z|.With this definition, it is important to note the following. You may wish to take the timeto verify this remark.q

between the point in the plane determined by the ordered pair (a, b) and the ordered pair(c, d) equals |z w| where z and w are as just described.For example, consider the distance between (2, 5) and (1, 8) . Letting z = 2+ 5i andw = 1 + 8i, z w = 1 3i, (z w) (z w) = (1 3i) (1 + 3i) = 10 so |z w| = 10.Recall that we refer to z = a + bi as the standard form of the complex number. In thenext section, we examine another form in which we can express the complex number.

2. Let z = 1 4i. Compute the following.

4. If z is a complex number, show there exists a complex number w with |w| = 1 andwz = |z| .

279

5. If z, w are complex numbers prove zwP= z w and

Pmthen show by induction thatmz=z1 zm = z1 zm . Also verify thatk=1 zk . In words this says thek=1 kconjugate of a product equals the product of the conjugates and the conjugate of asum equals the sum of the conjugates.6. Suppose p (x) = an xn + an1 xn1 + + a1 x + a0 where all the ak are real numbers.Suppose also that p (z) = 0 for some z C. Show it follows that p (z) = 0 also.7. I claim that 1 = 1. Here is why.2

1 = i =

1 1 =

(1)2 =

1=1

This is clearly a remarkable result but is there something wrong with it? If so, whatis wrong?

6.2 Polar Form

OutcomesA. Convert a complex number from standard form to polar form, and from polarform to standard form.In the previous section, we identified a complex number z = a + bi with a point (a, b)in the coordinate plane. There is another form in which we can express the same number,called the polar form. The polar form is the focus of this section. It will turn out to be veryuseful if not crucial for certain calculations as we shall soon see.Suppose z = a + bi is a complex number, and let r = a2 + b2 = |z|. Recall that r is themodulus of z . Note first that a 2 b 2 a2 + b2+==1rrr2

and so ar , rb is a point on the unit circle. Therefore, there exists an angle (in radians)such thatbacos = , sin =rrIn other words is an angle such that a = r cos and b = r sin , that is = cos1 (a/r) and = sin1 (b/r). We call this angle the argument of z.We often speak of the principal argument of z. This is the unique angle (, ]such thatabcos = , sin =rr280

The polar form of the complex number z = a + bi = r (cos + i sin ) is for conveniencewritten as:z = reiwhere is the argument of z.Definition 6.12: Polar Form of a Complex NumberLet z = a + bi be a complex number. Then the polar form of z is written asz = reiwhere r =

a2 + b2 and is the argument of z.

When given z = rei , the identity ei = cos + i sin will convert z back to standardform. Here we think of ei as a short cut for cos + i sin . This is all we will need in thiscourse, but in reality ei can be considered as the complex equivalent of the exponentialfunction where this turns out to be a true equality.

r=

z = a + bi = reia2 + b2

Thus we can convert any complex number in the standard (Cartesian) form z = a + biinto its polar form. Consider the following example.Example 6.13: Standard to Polar FormLet z = 2 + 2i be a complex number. Write z in the polar formz = reiSolution. First, find r. By the above discussion, r =r=

22 + 22 =

a2 + b2 = |z|. Therefore,

8=2 2

Now, to find , we plot the point (2, 2) and find the angle from the positive x axis tothe line between this point and the origin. In this case, = 45 = 4 . That is we found the

unique angle such that = cos1 (1/ 2) and = sin1 (1/ 2).Note that in polar form, we always express angles in radians, not degrees.Hence, we can write z as281

z = 2 2ei 4Notice that the standard and polar forms are completely equivalent. That is not onlycan we transform a complex number from standard form to its polar form, we can also takea complex number in polar form and convert it back to standard form.Example 6.14: Polar to Standard FormLet z = 2e2i/3 . Write z in the standard formz = a + biSolution. Let z = 2e2i/3 be the polar form of a complex number. Recall that ei =cos + i sin . Therefore using standard values of sin and cos we get:z = 2ei2/3 = 2(cos(2/3) + i sin(2/3)) !13= 2 +i22

= 1 + 3iwhich is the standard form of this complex number.You can always verify your answer by converting it back to polar form and ensuring youreach the original answer.

6.2.1. Exercises1. Let z = 3 + 3i be a complex number written in standard form. Convert z to polarform, and write it in the form z = rei .2. Let z = 2i be a complex number written in standard form. Convert z to polar form,and write it in the form z = rei .2

3. Let z = 4e 3 i be a complex number written in polar form. Convert z to standard form,

and write it in the form z = a + bi.

4. Let z = 1e 6 i be a complex number written in polar form. Convert z to standard

form, and write it in the form z = a + bi.5. If z and w are two complex numbers and the polar form of z involves the angle whilethe polar form of w involves the angle , show that in the polar form for zw the angleinvolved is + .

282

6.3 Roots of Complex Numbers

OutcomesA. Understand De Moivres theorem and be able to use it to find the roots of acomplex number.A fundamental identity is the formula of De Moivre with which we begin this section.Theorem 6.15: De Moivres TheoremFor any positive integer n, we haveei

Since the cosine and sine are periodic of period 2, there are exactly k distinct numberswhich result from this formula.The procedure for finding the k k th roots of z C is as follows.

284

Procedure 6.17: Finding Roots of a Complex Number

Let w be a complex number. We wish to find the nth roots of w, that is all z such thatz n = w.There are n distinct nth roots and they can be found as follows:.1. Express both z and w in polar form z = rei , w = sei . Then z n = w becomes:(rei )n = r n ein = seiWe need to solve for r and .2. Solve the following two equations:rn = sein = ei3. The solutions to r n = s are given by r =

(6.1)

s.

4. The solutions to ein = ei are given by:

n = + 2, for = 0, 1, 2, , n 1or=

2+ , for = 0, 1, 2, , n 1n n

5. Using the solutions r, to the equations given in (6.1) construct the nth roots ofthe form z = rei .Notice that once the roots are obtained in the final step, they can then be converted tostandard form if necessary. Lets consider an example of this concept. Note that accordingto Corollary 6.16, there are exactly 3 cube roots of a complex number.Example 6.18: Finding Cube RootsFind the three cube roots of i. In other words find all z such that z 3 = i.Solution. First, convert each number to polar form: z = rei and i = 1ei/2 . The equationnow becomes(rei )3 = r 3 e3i = 1ei/2Therefore, the two equations that we need to solve are r 3 = 1 and 3i = i/2. Given thatr R and r 3 = 1 it follows that r = 1.Solving the second equation is as follows. First divide by i. Then, since the argument of

285

i is not unique we write 3 = /2 + 2 for = 0, 1, 2.

3 = /2 + 2 for = 0, 1, 22 = /6 + for = 0, 1, 23For = 0:For = 1:

2 = /6 + (0) = /6325 = /6 + (1) = 36

For = 2:

32 = /6 + (2) = 32Therefore, the three roots are given by5

1ei/6 , 1ei 6 , 1ei 2

Written in standard form, these roots are, respectively,

3311+ i ,+ i , i2222The ability to find k th roots can also be used to factor some polynomials.Example 6.19: Solving a Polynomial EquationFactor the polynomial x3 27.Solution. First !find the cube roots of!27. By the above procedure , these cube roots are

3311, and 3. You may wish to verify this using the above steps.+ii3, 32222Therefore, x3 27 = !! !!1313(x 3) x 3x3+ii2222

331Note also x 3 1x

3= x2 + 3x + 9 and so+i

i2222

x3 27 = (x 3) x2 + 3x + 9

where the quadratic polynomial x2 + 3x + 9 cannot be factored without using complex

numbers.

Note that even though

the polynomial x3!27 has all real coefficients, it has some complex !

1331zeros, 3, and 3. These zeros are complex conjugates of each+ii2222other. It is always the case that if a polynomial has real coefficients and a complex root, itwill also have a root equal to the complex conjugate.286

6.3.1. Exercises1. Give the complete solution to x4 + 16 = 0.2. Find the complex cube roots of 8.3. Find the four fourth roots of 16.4. De Moivres theorem says [r (cos t + i sin t)]n = r n (cos nt + i sin nt) for n a positiveinteger. Does this formula continue to hold for all integers n, even negative integers?Explain.5. Factor x3 + 8 as a product of linear factors. Hint: Use the result of 2.6. Write x3 + 27 in the form (x + 3) (x2 + ax + b) where x2 + ax + b cannot be factoredany more using only real numbers.7. Completely factor x4 + 16 as a product of linear factors. Hint: Use the result of 3.8. Factor x4 + 16 as the product of two quadratic polynomials each of which cannot befactored further without using complex numbers.9. If n is an integer, is it always true that (cos i sin )n = cos (n)i sin (n)? Explain.10. Suppose p (x) = an xn + an1 xn1 + + a1 x + a0 is a polynomial and it has n zeros,z1 , z2 , , znlisted according to multiplicity. (z is a root of multiplicity m if the polynomial f (x) =(x z)m divides p (x) but (x z) f (x) does not.) Show thatp (x) = an (x z1 ) (x z2 ) (x zn )

6.4 The Quadratic Formula

OutcomesA. Use the Quadratic Formula to find the complex roots of a quadratic equation.The roots (or solutions) of a quadratic equation ax2 + bx + c = 0 where a, b, c are realnumbers are obtained by solving the familiar quadratic formula given by

b b2 4acx=2a287

When working with real numbers, we cannot solve this formula if b2 4ac < 0. However,complex numbers allow us to find square roots of negative numbers, and the quadraticformula remains valid for finding roots of the corresponding quadratic equation. In this case2thereareexactlytwodistinct(complex)squarerootsofb

4ac,whicharei4ac b2 and

i 4ac b2 .Here is an example.Example 6.20: Solutions to Quadratic EquationFind the solutions to x2 + 2x + 5 = 0.Solution. In terms of the quadratic equation above, a = 1, b = 2, and c = 5. Therefore, wecan use the quadratic formula with these values, which becomesq

2 (2)2 4(1)(5)b b2 4ac=x=2a2(1)Solving this equation, we see that the solutions are given by

2 4i2i 4 20== 1 2ix=22We can verify that these are solutions of the original equation. We will show x = 1 + 2iand leave x = 1 2i as an exercise.x2 + 2x + 5 = (1 + 2i)2 + 2(1 + 2i) + 5= 1 4i 4 2 + 4i + 5= 0Hence x = 1 + 2i is a solution.What if the coefficients of the quadratic equation are actually complex numbers? Doesthe formula hold even in this case? The answer is yes. This is a hint on how to do Problem 4below, a special case of the fundamental theorem of algebra, and an ingredient in the proofof some versions of this theorem.Consider the following example.Example 6.21: Solutions to Quadratic EquationFind the solutions to x2 2ix 5 = 0.Solution. In terms of the quadratic equation above, a = 1, b = 2i, and c = 5. Therefore,we can use the quadratic formula with these values, which becomesq

2(2i)2 4(1)(5)2i

b b 4acx==2a2(1)288

Solving this equation, we see that the solutions are given by

2i 42i 4 + 20==i2x=22We can verify that these are solutions of the original equation. We will show x = i + 2and leave x = i 2 as an exercise.x2 2ix 5 = (i + 2)2 2i(i + 2) 5= 1 + 4i + 4 + 2 4i 5= 0Hence x = i + 2 is a solution.We conclude this section by stating an essential theorem.Theorem 6.22: The Fundamental Theorem of AlgebraAny polynomial of degree at least 1 with complex coefficients has a root which is acomplex number.

6.4.1. Exercises1. Show that 1 + i, 2 + i are the only two roots top (x) = x2 (3 + 2i) x + (1 + 3i)Hence complex zeros do not necessarily come in conjugate pairs if the coefficients ofthe equation are not real.2. Give the solutions to the following quadratic equations having real coefficients.(a) x2 2x + 2 = 0

(c) 4x2 + (4 + 4i) x + 1 + 2i = 0

(d) x2 4ix 5 = 0

(e) 3x2 + (1 i) x + 3i = 0

4. Prove the fundamental theorem of algebra for quadratic polynomials having coefficientsin C. That is, show that an equation of the formax2 + bx + c = 0 where a, b, c are complex numbers, a 6= 0 has a complex solution.Hint: Consider the fact, noted earlier that the expressions given from the quadraticformula do in fact serve as solutions.

290

7. Spectral Theory7.1 Eigenvalues and Eigenvectors of a Matrix

OutcomesA. Describe eigenvalues geometrically and algebraically.B. Find eigenvalues and eigenvectors for a square matrix.Spectral Theory refers to the study of eigenvalues and eigenvectors of a matrix. It is offundamental importance in many areas and is the subject of our study for this chapter.

7.1.1. Definition of Eigenvectors and Eigenvalues

In this section, we will work with the entire set of complex numbers, denoted by C. Recallthat the real numbers, R are contained in the complex numbers, so the discussions in thissection apply to both real and complex numbers.To illustrate the idea behind what will be discussed, consider the following example.Example 7.1: Eigenvectors and EigenvaluesLet

Compute the product AX for

05 1016 A = 0 220 9 2

51

X = 4 , X = 0 30

What do you notice about AX in each of these products?

291

Solution. First, compute AX for

5X = 4 3

This product is given by

05 10550516 4 = 40 = 10 4 AX = 0 220 9 23303

In this case, the product AX resulted in a vector which is equal to 10 times the vectorX. In other words, AX = 10X.Lets see what happens in the next product. Compute AX for the vector

1X= 0 0This product is given by

05 10101

AX = 0 22160 = 0 =0 0 0 9 2000

In this case, the product AX resulted in a vector equal to 0 times the vector X, AX = 0X.Perhaps this matrix is such that AX results in kX, for every vector X. However, consider

05 1015 0 2216 1 = 38 0 9 2111

In this case, AX did not result in a vector of the form kX for some scalar k.

There is something special about the first two products calculated in Example 7.1. Noticethat for each, AX = kX where k is some scalar. When this equation holds for some X andk, we call the scalar k an eigenvalue of A. We often use the special symbol instead of kwhen referring to eigenvalues. In Example 7.1, the values 10 and 0 are eigenvalues for thematrix A and we can label these as 1 = 10 and 2 = 0.When AX = X for some X 6= 0, we call such an X an eigenvector of the matrix A.The eigenvectors of A are associated to an eigenvalue. Hence, if 1 is an eigenvalue of Aand AX = 1 X, we can label this eigenvector as X1 . Note again that in order to be aneigenvector, X must be nonzero.There is also a geometric significance to eigenvectors. When you have a nonzero vectorwhich, when multiplied by a matrix results in another vector which is parallel to the first orequal to 0, this vector is called an eigenvector of the matrix. This is the meaning when thevectors are in Rn .The formal definition of eigenvalues and eigenvectors is as follows.292

Definition 7.2: Eigenvalues and Eigenvectors

Let A be an n n matrix and let X Cn be a nonzero vector for whichAX = X

(7.1)

for some scalar . Then is called an eigenvalue of the matrix A and X is called aneigenvector of A associated with , or a -eigenvector of A.The set of all eigenvalues of an n n matrix A is denoted by (A) and is referred toas the spectrum of A.The eigenvectors of a matrix A are those vectors X for which multiplication by A resultsin a vector in the same direction or opposite direction to X. Since the zero vector 0 has nodirection this would make no sense for the zero vector. As noted above, 0 is never allowedto be an eigenvector.Lets look at eigenvectors in more detail. Suppose X satisfies 7.1. ThenAX X = 0or(A I) X = 0for some X 6= 0. Equivalently you could write (I A) X = 0, which is more commonlyused. Hence, when we are looking for eigenvectors, we are looking for nontrivial solutions tothis homogeneous system of equations!Recall that the solutions to a homogeneous system of equations consist of basic solutions,and the linear combinations of those basic solutions. In this context, we call the basicsolutions of the equation (I A) X = 0 basic eigenvectors. It follows that any (nonzero)linear combination of basic eigenvectors is again an eigenvector.Suppose the matrix (I A) is invertible, so that (I A)1 exists. Then the followingequation would be true.X = IX

= (I A)1 (I A) X= (I A)1 ((I A) X)= (I A)1 0= 0

This claims that X = 0. However, we have required that X 6= 0. Therefore (I A) cannot

have an inverse!Recall from Theorem 3.33 that if a matrix is not invertible, then its determinant is equalto 0. Therefore we can conclude thatdet (I A) = 0Note that this is equivalent to det (A I) = 0.293

(7.2)

The expression det (xI A) is a polynomial (in the variable x) called the characteristicpolynomial of A, and det (xI A) = 0 is called the characteristic equation. For thisreason we may also refer to the eigenvalues of A as characteristic values, but the formeris often used for historical reasons.The following theorem claims that the roots of the characteristic polynomial are theeigenvalues of A. Thus when 7.2 holds, A has a nonzero eigenvector.Theorem 7.3: The Existence of an EigenvectorLet A be an n n matrix and suppose det (I A) = 0 for some C.Then is an eigenvalue of A and thus there exists a nonzero vector X Cn such thatAX = X.Proof. For A an nn matrix, the method of Laplace Expansion demonstrates that det (I A)is a polynomial of degree n. As such, the equation 7.2 has a solution C by the Fundamental Theorem of Algebra. The fact that is an eigenvalue follows from Theorem 3.33 andis left as an exercise.

7.1.2. Finding Eigenvectors and Eigenvalues

Now that eigenvalues and eigenvectors have been defined, we will study how to find themfor a matrix A.First, consider the following definition.Definition 7.4: Multiplicity of an EigenvalueLet A be an n n matrix with characteristic polynomial given by det (xI A). Then,the multiplicity of an eigenvalue of A is the number of times occurs as a root ofthat characteristic polynomial.For example, suppose the characteristic polynomial of A is given by (x 2)2 . Solving forthe roots of this polynomial, we set (x 2)2 = 0 and solve for x. We find that = 2 is aroot that occurs twice. Hence, in this case, = 2 is an eigenvalue of A of multiplicity equalto 2.We will now look at how to find the eigenvalues and eigenvectors for a matrix A in detail.The steps used are summarized in the following procedure.

294

Procedure 7.5: Finding Eigenvalues and Eigenvectors

Let A be an n n matrix.1. First, find the eigenvalues of A by solving the equation det (xI A) = 0.2. For each , find the basic eigenvectors X 6= 0 by finding the basic solutions to(I A) X = 0.To verify your work, make sure that AX = X for each and associated eigenvectorX.We will explore these steps further in the following example.Example 7.6: Find the Eigenvalues and Eigenvectors

5 2. Find its eigenvalues and eigenvectors.Let A =7 4Solution. We will use Procedure 7.5. First we find the eigenvalues of A by solving theequationdet (xI A) = 0This gives

5 21 0= 0

det x7 40 1

x + 5 2det= 07x4Computing the determinant as usual, the result isx2 + x 6 = 0Solving this equation, we find that 1 = 2 and 2 = 3.Now we need to find the basic eigenvectors for each . First we will find the eigenvectorsfor 1 = 2. We wish to find all vectors X 6= 0 such that AX = 2X. These are the solutionsto (2I A)X = 0.

0x5 21 0=

20y7 40 1

7 2x0=7 2y0The augmented matrix for this system and corresponding reduced row-echelon form aregiven by"#

1 72 07 2 0

7 2 000 0295

The solution is any vector of the form

"#2s7s

=s

"

27

Multiplying this vector by 7 we obtain a simpler description for the solution to thissystem, given by

2t7This gives the basic eigenvector for 1 = 2 as

27

To check, we verify that AX = 2X for this basic eigenvector.

5 2242==27 47147

This is what we wanted, so we know this basic eigenvector is correct.

Next we will repeat this process to find the basic eigenvector for 2 = 3. We wish tofind all vectors X 6= 0 such that AX = 3X. These are the solutions to ((3)I A)X = 0.

0x5 21 0=

(3)0y7 40 1

0x2 2=0y7 7The augmented matrix for this system and corresponding reduced row-echelon form aregiven by

1 1 02 2 0

7 7 000 0The solution is any vector of the form

s1=ss1

This gives the basic eigenvector for 2 = 3 as

11To check, we verify that AX = 3X for this basic eigenvector.

5 2131== 37 4131

This is what we wanted, so we know this basic eigenvector is correct.

296

The following is an example using Procedure 7.5 for a 3 3 matrix.

Example 7.7: Find the Eigenvalues and EigenvectorsFind the eigenvalues and eigenvectors for the matrix

5 10 5142 A= 24 86Solution. We will use Procedure 7.5. First we need to find the eigenvalues of A. Recall thatthey are the solutions of the equationdet (xI A) = 0In this case the equation is

which becomes

1 0 05 10 5det x 0 1 0 2142 = 00 0 14 86

x5105det 2 x 14 2 = 048x6

Using Laplace Expansion, compute this determinant and simplify. The result is thefollowing equation.

(x 5) x2 20x + 100 = 0Solving this equation, we find that the eigenvalues are 1 = 5, 2 = 10 and 3 = 10.Notice that 10 is a root of multiplicity two due tox2 20x + 100 = (x 10)2Therefore, 2 = 10 is an eigenvalue of multiplicity two.Now that we have found the eigenvalues for A, we can compute the eigenvectors.First we will find the basic eigenvectors for 1 = 5. In other words, we want to find all nonzero vectors X so that AX = 5X. This requires that we solve the equation (5I A) X = 0for X as follows.

1 0 05 10 5x05 0 1 0 2

142y = 0 0 0 14 86z0That is you need to find the solution to

0 105x0 2 9 2 y = 0 48 1z0297

By now this is a familiar problem. You set up the augmented matrix and row reduce toget the solution. Thus the matrix you must row reduce is

0 105 0 2 9 2 0 48 1 0The reduced row-echelon form is

1 0 54 012

0 10 0

0 0 0

and so the solution is any vector of the form

5 5 s44 1 1

s=s 2 2 1s

where s R. If we multiply this vector by 4, we obtain a simpler description for the solutionto this system, as given by

5t 2 (7.3)4

where t R. Here, the basic eigenvector is given by

5X1 = 2 4

Notice that we cannot let t = 0 here, because this would result in the zero vector andeigenvectors are never equal to 0! Other than this value, every other choice of t in 7.3 resultsin an eigenvector.It is a good idea to check your work! To do so, we will take the original matrix andmultiply by the basic eigenvector X1 . We check to see if we get 5X1 .

5 10 55255 2142 2 = 10 = 5 2 4 864204This is what we wanted, so we know that our calculations were correct.Next we will find the basic eigenvectors for 2 , 3 = 10. These vectors are the basicsolutions to the equation,

1 0 05 10 5x010 0 1 0 2142 y = 0 0 0 14 86z0298

That is you must find the solutions to

5 105x0 2 4 2 y = 0 484z0Consider the augmented matrix

5 105 0 2 4 2 0 484 0

The reduced row-echelon form for this matrix

1 2 0 00 0

is

1 00 0 0 0

and so the eigenvectors are of the form

2s t21

= s 1 + t 0 st01

Note that you cant pick t and s both equal to zero because this would result in the zerovector and eigenvectors are never equal to zero.Here, there are two basic eigenvectors, given by

21X2 = 1 , X3 = 0 01

Taking any (nonzero) linear combination of X2 and X3 will also result in an eigenvectorfor the eigenvalue = 10. As in the case for = 5, always check your work! For the firstbasic eigenvector, we can check AX2 = 10X2 as follows.

5 10 51101 2142 0 = 0 = 10 0 4 861101

This is what we wanted. Checking the second basic eigenvector, X3 , is left as an exercise.

It is important to remember that for any eigenvector X, X 6= 0. However, it is possible

to have eigenvalues equal to zero. This is illustrated in the following example.Example 7.8: A Zero EigenvalueLet

2 2 2A = 1 3 1 1 11

Find the eigenvalues and eigenvectors of A.

299

Solution. First we find the eigenvalues of A. We will do so using Definition 7.2.

In order to find the eigenvalues of A, we solve the following equation.

x 2 22det (xI A) = det 1 x 31 =011 x 1

This reduces to x3 6x2 + 8x = 0. You can verify that the solutions are 1 = 0, 2 =2, 3 = 4. Notice that while eigenvectors can never equal 0, it is possible to have an eigenvalueequal to 0.Now we will find the basic eigenvectors. For 1 = 0, we need to solve the equation(0I A) X = 0. This equation becomes AX = 0, and so the augmented matrix for findingthe solutions is given by

2 22 0 1 31 0 1 1 1 0The reduced row-echelon form is

1 0 1 0 0 10 0 0 00 0

1

Therefore, the eigenvectors are of the form t 0 where t 6= 0 and the basic eigenvector is1given by

1

X1 = 0 1

We can verify that this eigenvector is correct by

holds. The product AX1 is given by

2 2 2AX1 = 1 3 1 1 11

checking that the equation AX1 = 0X1

100 = 0 10

This clearly equals 0X1 , so the equation holds. Hence, AX1 = 0X1 and so 0 is aneigenvalue of A.Computing the other basic eigenvectors is left as an exercise.

In the following sections, we examine ways to simplify this process of finding eigenvaluesand eigenvectors by using properties of special types of matrices.

7.1.3. Eigenvalues and Eigenvectors for Special Types of Matrices

There are three special kinds of matrices which we can use to simplify the process of findingeigenvalues and eigenvectors. Throughout this section, we will discuss similar matrices,elementary matrices, as well as triangular matrices.300

We begin with a definition.

Definition 7.9: Similar MatricesLet A and B be n n matrices. Suppose there exists an invertible matrix P such thatA = P 1 BPThen A and B are called similar matrices.It turns out that we can use the concept of similar matrices to help us find the eigenvaluesof matrices. Consider the following lemma.Lemma 7.10: Similar Matrices and EigenvaluesLet A and B be similar matrices, so that A = P 1 BP where A, B are n n matricesand P is invertible. Then A, B have the same eigenvalues.Proof. We need to show two things. First, we need to show that if A = P 1 BP , then A andB have the same eigenvalues. Secondly, we show that if A and B have the same eigenvalues,then A = P 1BP .Here is the proof of the first statement. Suppose A = P 1BP and is an eigenvalue ofA, that is AX = X for some X 6= 0. ThenP 1 BP X = Xand soBP X = P XSince P is one to one and X 6= 0, it follows that P X 6= 0. Here, P X plays the role ofthe eigenvector in this equation. Thus is also an eigenvalue of B. One can similarly verifythat any eigenvalue of B is also an eigenvalue of A, and thus both matrices have the sameeigenvalues as desired.Proving the second statement is similar and is left as an exercise.Note that this proof also demonstrates that the eigenvectors of A and B will (generally)be different. We see in the proof that AX = X, while B (P X) = (P X). Therefore, foran eigenvalue , A will have the eigenvector X while B will have the eigenvector P X.The second special type of matrices we discuss in this section is elementary matrices.Recall from Definition 2.43 that an elementary matrix E is obtained by applying one rowoperation to the identity matrix.It is possible to use elementary matrices to simplify a matrix before searching for itseigenvalues and eigenvectors. This is illustrated in the following example.

301

Example 7.11: Simplify Using Elementary Matrices

Find the eigenvalues for the matrix

33 105 1052830 A = 1020 60 62

Solution. This matrix has big numbers and therefore we would like to simplify as much aspossible before computing the eigenvalues.We will do so using row operations. First, add 2 times the second row to the third row.To do so, left multiply A by E (2, 2). Then right multiply A by the inverse of E (2, 2) asillustrated.

1 0 033 105 10510 033 105 105 0 1 0 102830 01 0 = 10 32 30 0 2 120 60 620 2 100 2By Lemma 7.10, the resulting matrix has the same eigenvalues as A where here, the matrixE (2, 2) plays the role of P .We do this step again, as follows. In this step, we use the elementary matrix obtainedby adding 3 times the second row to the first row.

Again by Lemma 7.10, this resulting matrix has the same eigenvalues as A. At this point,we can easily find the eigenvalues. Let

30 15B = 10 2 30 00 2

Then, we find the eigenvalues of B (and therefore of A) by solving the equation det (xI B) =0. You should verify that this equation becomes(x + 2) (x + 2) (x 3) = 0Solving this equation results in eigenvalues of 1 = 2, 2 = 2, and 3 = 3. Therefore,these are also the eigenvalues of A.

Through using elementary matrices, we were able to create a matrix for which findingthe eigenvalues was easier than for A. At this point, you could go back to the original matrixA and solve (I A) X = 0 to obtain the eigenvectors of A.Notice that when you multiply on the right by an elementary matrix, you are doing thecolumn operation defined by the elementary matrix. In 7.4 multiplication by the elementary302

matrix on the right merely involves taking three times the first column and adding to thesecond. Thus, without referring to the elementary matrices, the transition to the new matrixin 7.4 can be illustrated by

33 105 1053 9 1530 15 10 32 30 10 32 30 10 2 30 00 200 200 2

The third special type of matrix we will consider in this section is the triangular matrix.Recall Definition 3.12 which states that an upper (lower) triangular matrix contains all zerosbelow (above) the main diagonal. Remember that finding the determinant of a triangularmatrix is a simple procedure of taking the product of the entries on the main diagonal.. Itturns out that there is also a simple way to find the eigenvalues of a triangular matrix.In the next example we will demonstrate that the eigenvalues of a triangular matrix arethe entries on the main diagonal.Example 7.12:

Solution. We need to solve the equation det (xI A) = 0 as follows

1, 2 = 4 and 3 = 6. Thus the eigenvalues are the entries on the main diagonal of theoriginal matrix.

The same result is true for lower triangular matrices. For any triangular matrix, theeigenvalues are equal to the entries on the main diagonal. To find the eigenvectors of atriangular matrix, we use the usual procedure.In the next section, we explore an important process involving the eigenvalues and eigenvectors of a matrix.

7.1.4. Exercises1. If A is an invertible n n matrix, compare the eigenvalues of A and A1 . Moregenerally, for m an arbitrary integer, compare the eigenvalues of A and Am .2. If A is an n n matrix and c is a nonzero constant, compare the eigenvalues of A andcA.303

X is an eigenvector of B. Show that then AX must also be an eigenvector for B.4. Suppose A is an n n matrix and it satisfies Am = A for some m a positive integerlarger than 1. Show that if is an eigenvalue of A then || equals either 0 or 1.5. Show that if AX = X and AY = Y , then whenever k, p are scalars,A (kX + pY ) = (kX + pY )Does this imply that kX + pY is an eigenvector? Explain.6. Suppose A is a 3 3 matrix and the following information is available.

00A 1 = 0 1 11

11A 1 = 2 1 11

22A 3 = 2 3 22

1Find A 4 .3

7. Suppose A is a 3 3 matrix and the following information is available.

11A 2 = 1 2 22

11A 1 = 0 1 11

11A 4 = 2 4 33

3Find A 4 .3

304

8. Suppose A is a 3 3 matrix and the following information is available.

00A 1 = 2 1 11

11A 1 = 1 1 11

33A 5 = 3 5 44

2Find A 3 .3

9. Find the eigenvalues and eigenvectors of the matrix

6 92 12 00 0 2 31 4One eigenvalue is 2.

10. Find the eigenvalues and eigenvectors of the matrix

2 17 6 000 193One eigenvalue is 1.

11. Find the eigenvalues and eigenvectors of the matrix

928 2 6 2 82 5One eigenvalue is 3.

12. Find the eigenvalues and eigenvectors of the matrix

676 16 2 21 4 264 17One eigenvalue is 2.

305

13. Find the eigenvalues and eigenvectors of the matrix

352 8 11 4 10113One eigenvalue is -3.

14. If A is the matrix of a linear transformation which rotates all vectors in R2 through60 , explain why A cannot have any real eigenvalues. Is there an angle such thatrotation through this angle would have a real eigenvalue? What eigenvalues would beobtainable in this way?15. Let A be the 2 2 matrix of the linear transformation which rotates all vectors in R2through an angle of . For which values of does A have a real eigenvalue?16. Is it possible for a nonzero matrix to have only 0 as an eigenvalue?17. Let A be the 2 2 matrix of the linear transformation which rotates all vectors in R2through an angle of . For which values of does A have a real eigenvalue?18. Let T be the linear transformation which reflects vectors about the x axis. Find amatrix for T and then find its eigenvalues and eigenvectors.19. Let T be the linear transformation which rotates all vectors in R2 counterclockwisethrough an angle of /2. Find a matrix of T and then find eigenvalues and eigenvectors.20. Let T be the linear transformation which reflects all vectors in R3 through the xyplane. Find a matrix for T and then obtain its eigenvalues and eigenvectors.

7.2 Diagonalization

OutcomesA. Determine when it is possible to diagonalize a matrix.B. When possible, diagonalize a matrix.We begin this section by recalling Definition 7.9 of similar matrices. Recall that if A, Bare two n n matrices, then they are similar if and only if there exists an invertible matrixP such thatA = P 1 BPThe following are important properties of similar matrices.306

Proposition 7.13: Properties of Similarity

Define for n n matrices A, B and C by A B if A is similar to B. Then AA If A B then B A If A B and B C then A CProof. It is clear that A A, taking P = I.Now, if A B, then for some P invertible,A = P 1 BPand soP AP 1 = BBut thenP 1

which shows that B A by Definition 7.9.

showing that A is similar to C by Definition 7.9.

When a matrix is similar to a diagonal matrix, the matrix is said to be diagonalizable.

We define a diagonal matrix D as a matrix containing a zero in every entry except thoseon the main diagonal. More precisely, if dij is the ij th entry of a diagonal matrix D, thendij = 0 unless i = j. Such matrices look like the following.

..D=

.0

where is a number which might not be zero.

The following is the formal definition of a diagonalizable matrix.Definition 7.14: Diagonalizable

Let A be an n n matrix. Then A is said to be diagonalizable if there exists an

invertible matrix P such thatP 1 AP = Dwhere D is a diagonal matrix.

307

Notice that the above equation can be rearranged as A = P DP 1. Suppose we wanted

100to compute A100 . By diagonalizing A first it suffices to then compute (P DP 1) , whichreduces to P D 100 P 1. This last computation is much simpler than A100 . While this processis described in detail later, it provides motivation for diagonalization.

7.2.1. Diagonalizing a Matrix

The most important theorem about diagonalizability is the following major result.Theorem 7.15: Eigenvectors and Diagonalizable MatricesAn n n matrix A is diagonalizable if and only if there is an invertible matrix P givenby

P = X1 X2 Xn

where the Xk are eigenvectors of A.

Moreover if A is diagonalizable, the corresponding eigenvalues of A are the diagonalentries of the diagonal matrix D.Proof. Suppose P is given as above as an invertible matrix whose columns are eigenvectorsof A. Then P 1 is of the form

where the columns are the Xk and

D=

ThenAP = P D =and so

0..

.n

X1 X2 Xn

AX1 AX2 AXn

0..

1 X1 2 X2 n Xn

showing the Xk are eigenvectors of A and the k are eigenvectors.

We demonstrate this concept in the next example. Note that not only are the columnsof the matrix P formed by eigenvectors, but P must be invertible so must consist of a widevariety of eigenvectors. We achieve this by using basic eigenvectors for the columns of P .Example 7.16: Diagonalize a MatrixLet

Solution. By Theorem 7.15 we use the eigenvectors of A as the columns of P , and thecorresponding eigenvalues of A as the diagonal entries of D.First, we will find the eigenvalues of A. To do so, we solve det (xI A) = 0 as follows.

1 0 02004 1 = 0det x 0 1 0 10 0 12 44This computation is left as an exercise, and you should verify that the eigenvalues are1 = 2, 2 = 2, and 3 = 6.Next, we need to find the eigenvectors. We first find the eigenvectors for 1 , 2 = 2.Solving (2I A) X = 0 to find the eigenvectors, we find that the eigenvectors are

21

1 +s 0 t01309

where t, s are scalars. Hence there are two basic eigenvectors which are given by

21X1 = 1 , X2 = 0 01

0You can verify that the basic eigenvector for 3 = 6 is X3 = 1 2Then, we construct the matrix P as follows.

210

1 P = X1 X2 X3 = 1 00 1 2

That is, the columns of P are the basic eigenvectors of

1 114 24

111P 1 = 2 2 1 1 1442

Thus,

P 1 AP =

141214

12

14

A. Then, you can verify that

2002 10

11 14 1 1 01 2

2 4140 1 2 142

2 0 0= 0 2 0 0 0 6

You can see that the result here is a diagonal matrix where the entries on the maindiagonal are the eigenvalues of A. We expected this based on Theorem 7.15. Notice thateigenvalues on the main diagonal must be in the same order as the corresponding eigenvectorsin P .It is possible that a matrix A cannot be diagonalized. In other words, we cannot find aninvertible matrix P so that P 1 AP = D.Consider the following example.Example 7.17: A Matrix which cannot be DiagonalizedLetA=

In this case, the matrix A has one eigenvalue of multiplicity two, but only one basiceigenvector. In order to diagonalize A, we need to construct an invertible 2 2 matrix P .However, because A only has one basic eigenvector, we cannot construct this P . Notice thatif we were to use X1 as both columns of P , P would not be invertible. For this reason, wecannot repeat eigenvectors in P .Hence this matrix cannot be diagonalized.Recall Definition 7.4 of the multiplicity of an eigenvalue. It turns out that we can determine when a matrix is diagonalizable based on the multiplicity of its eigenvalues. In orderfor A to be diagonalizable, the number of basic eigenvectors associated with an eigenvaluemust be the same number as the multiplicity of the eigenvalue. In Example 7.17, A had oneeigenvalue = 1 of multiplicity 2. However, there was only one basic eigenvector associatedwith this eigenvalue. Therefore, we can see that A is not diagonalizable.We summarize this in the following theorem.Theorem 7.18: Diagonalizability ConditionAn n n matrix A is diagonalizable exactly when the number of basic eigenvectorsassociated with an eigenvalue is the same number as the multiplicity of that eigenvalue.You may wonder if there is a need to find P 1 , since we can use Theorem 7.15 to constructP and D. We will see this is needed to compute high powers of matrices, which is one of themajor applications of diagonalizability.Before we do so, we first discuss complex eigenvalues.311

7.2.2. Complex Eigenvalues

In some applications, a matrix may have eigenvalues which are complex numbers. Forexample, this often occurs in differential equations. These questions are approached in thesame way as above.Consider the following example.Example 7.19: A Real Matrix with Complex EigenvaluesLet

As usual, be sure to check your answers! To verify, we check that AX3 = (2 i) X3 as

Notice that in Example 7.19, two of the eigenvalues were given by 2 = 2+i and 3 = 2i.You may recall that these two complex numbers are conjugates. It turns out that whenevera matrix containing real entries has a complex eigenvalue , it also has an eigenvalue equalto , the conjugate of .

7.2.3. Exercises1. Find the eigenvalues and eigenvectors of the matrix

7. Suppose A is an n n matrix and let V be an eigenvector such that AV = V . Also

If A is diagonalizable, give a proof of the Cayley Hamilton theorem based on this. Thistheorem says A satisfies its characteristic equation,An + an1 An1 + + a1 A + a0 I = 0314

8. Suppose the characteristic polynomial of an n n matrix A is 1 xn . Find Amn where

m is an integer.9. Find the eigenvalues and eigenvectors of the matrix

15 247 65 1 5876 20

One eigenvalue is 2. Diagonalize if possible. Hint:

eigenvalues.

10. Find the eigenvalues and eigenvectors of the matrix

15 256 1323 4 91 155 30

One eigenvalue is 2. Diagonalize if possible. Hint:

eigenvalues.

11. Find the eigenvalues and eigenvectors of the matrix

11 124

817 4 428 3

One eigenvalue is 1. Diagonalize if possible. Hint:

eigenvalues.

12. Find the eigenvalues and eigenvectors of the matrix

14 125 62 1 6951 21

One eigenvalue is 3. Diagonalize if possible. Hint:

eigenvalues.

This one has some complex

This one has some complex

This one has some complex

This one has some complex

13. Suppose A is an n n matrix consisting entirely of real entries but a + ib is a complex

eigenvalue having the eigenvector, X + iY Here X and Y are real vectors. Show thatthen a ib is also an eigenvalue with the eigenvector, X iY . Hint: You shouldremember that the conjugate of a product of complex numbers equals the product ofthe conjugates. Here a + ib is a complex number whose conjugate equals a ib.

315

7.3 Applications of Spectral Theory

OutcomesA. Use diagonalization to find a high power of a matrix.B. Use diagonalization to solve dynamical systems.

7.3.1. Raising a Matrix to a High Power

Suppose we have a matrix A and we want to find A50 . One could try to multiply A with itself50 times, but this is computationally extremely intensive (try it!). However diagonalizationallows us to compute high powers of a matrix relatively easily. Suppose A is diagonalizable,so that P 1 AP = D. We can rearrange this equation to write A = P DP 1.Now, consider A2 . Since A = P DP 1, it follows that2A2 = P DP 1 = P DP 1P DP 1 = P D 2P 1Similarly,

A3 = P DP 1In general,

3

= P DP 1P DP 1P DP 1 = P D 3 P 1

An = P DP 1

n

= P D n P 1

Therefore, we have reduced the problem to finding D n . In order to compute D n , then

because D is diagonal we only need to raise every entry on the main diagonal of D to thepower of n.Through this method, we can compute large powers of matrices. Consider the followingexample.Example 7.20: Raising a Matrix to a High Power

21 01 0 . Find A50 .Let A = 01 1 1Solution. We will first diagonalize A. The steps are left as an exercise and you may wish toverify that the eigenvalues of A are 1 = 1, 2 = 1, and 3 = 2.The basic eigenvectors corresponding to 1 , 2 = 1 are

01X1 = 0 , X2 = 1 10316

The basic eigenvector corresponding to 3 = 2 is

1X3 = 0 1

Now we construct P by using the basic eigenvectors of A as the columns of P . Thus

011

10 P = X1 X2 X3 = 0101

Then also

P 1which you may wish to verify.Then,

11 11 0 = 01 1 0

0 1 121 011 110 1 0 01 0 0P 1 AP = 01011 1 11 1 0

1 0 0

0 1 0 =0 0 2= D

Now it follows by rearranging the equation that

11 11 0 00 1 11 0 10 0 1 0 0A = P DP 1 = 00 0 21 1 0101

Therefore,

A50 = P D 50P 1

50

11 10 1 11 0 01 0 10 0 1 0 0= 01 1 01010 0 2

By our discussion above, D 50

1 00

is found as follows.

50 500 01001 0 = 0 1500 500 200 2317

It follows thatA50

5011 10 1 11001 0 10 0 1500 0= 0501 1 010100 2

2501 + 250 0010 = 505012121

Through diagonalization, we can efficiently compute a high power of A. Without this,

we would be forced to multiply this by hand!The next section explores another interesting application of diagonalization.

7.3.2. Raising a Symmetric Matrix to a High Power

We already have seen how to use matrix diagonalization to compute powers of matrices.This requires computing eigenvalues of the matrix A, and finding an invertible matrix ofeigenvectors P such that P 1AP is diagonal. In this section we will see that if the matrixA is symmetric (see Definition 2.29), then we can actually find such a matrix P that isan orthogonal matrix of eigenvectors. Thus P 1 is simply its transpose P T , and P T AP isdiagonal. When this happens we say that A is orthogonally diagonalizableIn fact this happens if and only if A is a symmetric matrix as shown in the followingimportant theorem.Theorem 7.21: Principal Axis TheoremThe following conditions are equivalent for an n n matrix A:1. A is symmetric.2. A has an orthonormal set of eigenvectors.3. A is orthogonally diagonalizable.Proof. The complete proof is beyond this course, but to give an idea assume that A has anorthonormal set of eigenvectors, and let P consist of these eigenvectors as columns. ThenP 1 = P T , and P T AP = D a diagonal matrix. But then A = P DP T , andAT = (P DP T )T = (P T )T D T P T = P DP T = Aso A is symmetric.Now given a symmetric matrix A, one shows that eigenvectors corresponding to differenteigenvalues are always orthogonal. So it suffices to apply the Gram-Schmidt process on theset of basic eigenvectors of each eigenvalue to obtain an orthonormal set of eigenvectors.318

We demonstrate this in the following example.

Solution.In this case, verify that the eigenvalues are 2 and 1. First we will find an eigenvector forthe eigenvalue 2. This involves row reducing the following augmented matrix.

02100 02 23 12 0

1302 2 2 0

The reduced row-echelon form is

and so an eigenvector is

1 00 0 0 1 1 0 0 00 0

0 1 1

Finally to obtain an eigenvector of length one (unit eigenvector) we simply divide this vectorby its length to yield:

0 1/ 2

1/ 2

Next consider the case of the eigenvalue 1. To obtain basic eigenvectors, the matrix whichneeds to be row reduced in this case is

01100 01 23 12 0

3102 1 2 0The reduced row-echelon form is

0 1 1 0 0 0 0 0 0 0 0 0

319

Therefore, the eigenvectors are of the form

s t t

Note that all these vectors are automatically orthogonal to eigenvectors corresponding tothe first eigenvalue. This follows from the fact that A is symmetric, as mentioned earlier.We obtain basic eigenvectors

10 0 and 1 01

Since they are themselves orthogonal (by luck here) we do not need to use the Gram-Schmidtprocess and instead simply normalize these vectors to obtain

01 0 and 1/ 2

01/ 2

An orthogonal matrix P to orthogonally diagonalize A is then obtained by letting these basic

7.3.3. Markov Matrices

There are applications which are of great importance which feature a special type of matrix. Matrices in which the columns are non-negative numbers which sum to one are calledMarkov matrices. An important application of Markov matrices is in population migration, as illustrated in the following definition.Definition 7.24: Migration MatricesLet n locations be denoted by the numbers 1, 2, , n. Suppose it is the case thateach year the proportion of residents in location j which move to location i is aij .Also suppose no onePescapes or emigrates from without these n locations. This lastassumption requires i aij = 1, and means that the matrix A, such that A = [aij ], isa Markov matrix. In this context, A is also called a migration matrix.Consider the following example which demonstrates this situation.

321

Example 7.25: Migration Matrix

Let A be a Markov matrix given byA=

.4 .2.6 .8

Verify that A is a Markov matrix and describe the entries of A in terms of populationmigration.Solution. The columns of A are comprised of non-negative numbers which sum to 1. Hence,A is a Markov matrix.Now, consider the entries aij of A in terms of population. The entry a11 = .4 is theproportion of residents in location one which stay in location one in a given time period.Entry a21 = .6 is the proportion of residents in location 1 which move to location 2 in thesame time period. Entry a12 = .2 is the proportion of residents in location 2 which move tolocation 1. Finally, entry a22 = .8 is the proportion of residents in location 2 which stay inlocation 2 in this time period.Considered as a Markov matrix, these numbers are usually identified with probabilities.Hence, we can say that the probability that a resident of location one will stay in locationone in the time period is .4.Observe that in Example 7.25 if there was initially say 15 thousand people in location1 and 10 thousands in location 2, then after one year there would be .4 15 + .2 10 = 8thousands people in location 1 the following year, and similarly there would be .6 15 + .8 10 = 17 thousands people in location 2 the following year.More generally let Xn = [x1n xmn ]T where xin is the population of location i at timeperiod n. We call Xn the state vector at period n. In particular, we call X0 the initialstate vector. Letting A be the migration matrix, we compute the population in each locationi one time period later by AXn . In order to find the population of location i after k years, wecompute the ith component of Ak X. This discussion is summarized in the following theorem.Theorem 7.26: State VectorLet A be the migration matrix of a population and let Xn be the vector whose entriesgive the population of each location at time period n. Then Xn is the state vector atperiod n and it follows thatXn+1 = AXnThe sum of the entries of Xn will equal the sum of the entries of the initial vector X0 .Since the columns of A sum to 1, this sum is preserved for every multiplication by A asdemonstrated below.!XXXXXaij xj =xjaij =xji

Consider the following example.

322

Example 7.27: Using a Migration Matrix

Consider the migration matrix

.6 0 .1A = .2 .8 0 .2 .2 .9

for locations 1, 2, and 3. Suppose initially there are 100 residents in location 1, 200 inlocation 2 and 400 in location 3. Find the population in the three locations after 1, 2,and 10 units of time.Solution. Using Theorem 7.26 we can find the population in each location using the equationXn+1 = AXn . For the population after 1 unit, we calculate X1 = AX0 as follows.X1 = AX0

x11.6 0 .1100 x21 = .2 .8 0 200 x31.2 .2 .9400

100

180 =420

Therefore after one time period, location 1 has 100 residents, location 2 has 180, and location3 has 420. Notice that the total population is unchanged, it simply migrates within the givenlocations. We find the locations after two time periods in the same way.X2 = AX1

x12.6 0 .1100 x22 = .2 .8 0 180 x32.2 .2 .9420

102= 164 434

We could progress in this manner to find the populations after 10 time periods. Howeverfrom our above discussion, we can simply calculate (An X0 )i , where n denotes the numberof time periods which have passed. Therefore, we compute the populations in each locationafter 10 units of time as follows.X10 = A10 X0

10 x110100.6 0 .1 x210 = .2 .8 0 200 x310400.2 .2 .9

115. 085 829 22

= 120. 130 672 44 464. 783 498 34

323

Since we are speaking about populations, we would need to round these numbers to providea logical answer. Therefore, we can say that after 10 units of time, there will be 115 residentsin location one, 120 in location two, and 465 in location three.Suppose we wish to know how many residents will be in a certain location after a verylong time. It turns out that if some power of the migration matrix has all positive entries,then there is a vector Xs such that An X0 approaches Xs as n becomes very large. Hence asmore time passes and n increases, An X0 will become closer to the vector Xs .Consider Theorem 7.26. Let n increase so that Xn approaches Xs . As Xn becomes closerto Xs , so too does Xn+1 . For sufficiently large n, the statement Xn+1 = AXn can be writtenas Xs = AXs .This discussion motivates the following theorem.Theorem 7.28: Steady State VectorLet A be a migration matrix. Then there exists a steady state vector written Xssuch thatXs = AXswhere Xs has positive entries which have the same sum as the entries of X0 .As n increases, the state vectors Xn will approach Xs .Note that the condition in Theorem 7.28 can be written as (I A)Xs = 0, representinga homogeneous system of equations.Consider the following example. Notice that it is the same example as the Example 7.27but here it will involve a longer time frame.Example 7.29: Populations over the Long RunConsider the migration matrix

.6 0 .1A = .2 .8 0 .2 .2 .9

for locations 1, 2, and 3. Suppose initially there are 100 residents in location 1, 200 inlocation 2 and 400 in location 4. Find the population in the three locations after along time.Solution. By Theorem 7.28 the steady state vector Xs can be found by solving the system(I A)Xs = 0.Thus we need to find a solution to

1 0 0.6 0 .1x1s0 0 1 0 .2 .8 0 x2s = 0 0 0 1.2 .2 .9x3s0324

The augmented matrix and the resulting reduced row-echelon form are given by

Therefore the population in the long run is given

Again, because we are working with populations, these values need to be rounded. Thesteady state vector Xs is given by

117 117 466We can see that the numbers we calculated in Example 7.27 for the populations after the10th unit of time are not far from the long term values.Consider another example.Example 7.30: Populations After a Long TimeSuppose a migration matrix is given by

A=

15

12

15

14

14

12

1120

14

310

Find the comparison between the populations in the three locations after a long time.

325

Solution. In order to compare

state vector Xs . Solve

1 0

0 1

0 0

the populations in the long term, we want to find the steady

15

12

15

14

14

12

1120

14

310

x1s0

x2s = 0

x3s0

The augmented matrix and the resulting reduced row-echelon form are given by

4 12 15 0

51 0 160

193 1

12 0 18

4401

19

1170 00 0

1020

10

and so an eigenvector is

16 18 19

Therefore, the proportion of population in location 2 to location 1 is given by

proportion of population 3 to location 2 is given by

18.16

The

19.18

Eigenvalues of Markov Matrices

The following is an important proposition.Proposition 7.31: Eigenvalues of a Migration MatrixLet A = [aij ] be a migration matrix. Then 1 is always an eigenvalue for A.Proof. Remember that the determinant of a matrix always equals that of its transpose.Therefore,

Tdet (xI A) = det (xI A) = det xI AT

because I T = I. Thus the characteristic equation for A is the same as the characteristicequation for AT . Consequently, A and AT have the same eigenvalues. We will show that 1is an eigenvalue for AT and then it will followP that 1 is an eigenvalueTfor A.Remember that for a migration matrix, i aij = 1. Therefore, if A = [bij ] with bij = aji ,it follows thatXXbij =aji = 1j

326

Therefore, from matrix multiplication,

P

11j bij

.. T .. ..A . = = .. P11j bij

1 .. Notice that this shows that . is an eigenvector for AT corresponding to the eigen1value, = 1. As explained above, this shows that = 1 is an eigenvalue for A because Aand AT have the same eigenvalues.

7.3.4. Dynamical Systems

The migration matrices discussed above give an example of a discrete dynamical system. Wecall them discrete because they involve discrete values taken at a sequence of points ratherthan on a continuous interval of time.An example of a situation which can be studied in this way is a predator prey model.Consider the following model where x is the number of prey and y the number of predatorsin a certain area at a certain time. These are functions of n N where n = 1, 2, are theends of intervals of time which may be of interest in the problem. In other words, x (n) isthe number of prey at the end of the nth interval of time. An example of this situation maybe modeled by the following equation

x (n)2 3x (n + 1)=y (n)14y (n + 1)This says that from time period n to n + 1, x increases if there are more x and decreases asthere are more y. In the context of this example, this means that as the number of predatorsincreases, the number of prey decreases. As for y, it increases if there are more y and also ifthere are more x.This is an example of a matrix recurrence which we define now.Definition 7.32: Matrix RecurrenceSuppose a dynamical system is given byxn+1 = axn + bynyn+1 = cxn + dynThis system can be expressed as Vn+1 = AVn where Vn =

xnyn

and A =

a b.c d

In this section, we will examine how to find solutions to a dynamical system given certaininitial conditions. This process involves several concepts previously studied, including matrix327

diagonalization and Markov matrices. The procedure is given as follows. Recall that whendiagonalized, we can write An = P D n P 1 .Procedure 7.33: Solving a Dynamical SystemSuppose a dynamical system is given byxn+1 = axn + bynyn+1 = cxn + dynGiven initial conditions x0 and y0 , the solutions to the system are found as follows:1. Express the dynamical system in the form Vn+1 = AVn .2. Diagonalize A to be written as A = P DP 1.3. Then Vn = P D n P 1V0 where V0 is the vector containing the initial conditions.4. If given specific values for n, substitute into this equation. Otherwise, find ageneral solution for n.We will now consider an example in detail.Example 7.34: Solutions of a Discrete Dynamical SystemSuppose a dynamical system is given byxn+1 = 1.5xn 0.5ynyn+1 = 1.0xnExpress this system as a matrix recurrence and find solutions to the dynamical systemfor initial conditions x0 = 20, y0 = 10.Solution. First, we express the system as a matrix recurrence.

Vn+1 = AVn

x (n)1.5 0.5x (n + 1)=y (n)1.00y (n + 1)

ThenA=

1.5 0.51.00

You can verify that the eigenvalues of A are 1 and .5. By diagonalizing, we can write A inthe form

2 11 01 11P DP =110 .51 2328

Now given an initial condition

V0 =

x0y0

the solution to the dynamical system is given by

Vn = P D n P 1 V0

n

x02 11 01 1x (n)=y0110 .51 2y (n)

10x01 12 1=y00 (.5)n1 211

nny0 ((.5) 1) x0 ((.5) 2)=y0 (2 (.5)n 1) x0 (2 (.5)n 2)

If we let n become arbitrarily large, this vector approaches

2x0 y02x0 y0Thus for large n,

x (n)y (n)

2x0 y02x0 y0

Now suppose the initial condition is given by

20x0=10y0

Then, we can find solutions for various values of n. Here are the solutions for values ofn between 1 and 5

28.7527.525.0,n = 3 :,n = 2 :n=1:27.525.020.0

29.37529.688n=4:,n = 5 :28.7529.375Notice that as n increases, we approach the vector given by

The following example demonstrates another system which exhibits some interestingbehavior. When we graph the solutions, it is possible for the ordered pairs to spiral aroundthe origin.Example 7.35: Finding Solutions to a Dynamical SystemSuppose a dynamical system is of the form

Suppose the initial condition is

Then one obtains the following sequence of values which are graphed below by letting n =1, 2, , 20330

In this picture, the dots are the values and the dashed line is to help to picture what ishappening.These points are getting gradually closer to the origin, but theythe origin

in are circling0x (n)approachesthe clockwise direction as they do so. As n increases, the vector0y (n)This type of behavior along with complex eigenvalues is typical of the deviations from anequilibrium point in the Lotka Volterra system of differential equations which is a famousmodel for predator-prey interactions. These differential equations are given byx = x (a by)y = y (c dx)where a, b, c, d are positive constants. For example, you might have X be the population ofmoose and Y the population of wolves on an island.Note that these equations make logical sense. The top says that the rate at whichthe moose population increases would be aX if there were no predators Y . However, this ismodified by multiplying instead by (a bY ) because if there are predators, these will militateagainst the population of moose. The more predators there are, the more pronounced is thiseffect. As to the predator equation, you can see that the equations predict that if thereare many prey around, then the rate of growth of the predators would seem to be high.However, this is modified by the term cY because if there are many predators, there wouldbe competition for the available food supply and this would tend to decrease Y .The behavior near an equilibrium point, which is a point where the right side of thedifferential equations equals zero, is of great interest. In this case, the equilibrium point iscax = ,y =dbThen one defines new variables according to the formulax+

ac= x, y = y +db

331

In terms of these new variables, the differential equations become

ca

x = x+ab y+ d a b c

y = y+cd x+bd

Multiplying out the right sides yields

cx = bxy b yda

y = dxy + dxbThe interest is for x, y small and so these equations are essentially equal toacx = b y, y = dxdbwhere h is a small positive numberReplace x with the difference quotient x(t+h)x(t)hand y with a similar difference quotient. For example one could have h correspond to oneday or even one hour. Thus, for h small enough, the following would seem to be a goodapproximation to the differential equations.cx (t + h) = x (t) hb yday (t + h) = y (t) + h dxbLet 1, 2, 3, denote the ends of discrete intervals of time having length h chosen above.Then the above equations take the form " 1 hbc #

x (n + 1)x (n)d= hady (n + 1)y (n)1bNote that the eigenvalues of this matrix are always complex.We are not interested in time intervals of length h for h very small. Instead, we areinterested in much longer lengths of time. Thus, replacing the time interval with mh,#m

1 ach2 2b dc hx (n + 2)x (n)=y (n + 2)2 ab dhy (n)1 ach2Note that most of the time, the eigenvalues of the new matrix will be complex.

332

You can also notice that the upper right corner will be negative by considering higherpowers of the matrix. Thus letting 1, 2, 3, denote the ends of discrete intervals of time,the desired discrete dynamical system is of the form

x (n + 1)a bx (n)=y (n + 1)cdy (n)where a, b, c, d are positive constants and the matrix will likely have complex eigenvaluesbecause it is a power of a matrix which has complex eigenvalues.You can see from the above discussion that if the eigenvalues of the matrix used to definethe dynamical system are less than 1 in absolute value, then the origin is stable in the sensethat as n , the solution converges to the origin. If either eigenvalue is larger than 1 inabsolute value, then the solutions to the dynamical system will usually be unbounded, unlessthe initial condition is chosen very carefully. The next example exhibits the case where oneeigenvalue is larger than 1 and the other is smaller than 1.The following example demonstrates a familiar concept as a dynamical system.Example 7.36: The Fibonacci SequenceThe Fibonacci sequence is the sequence given by1, 1, 2, 3, 5, which is defined recursively in the formx (0) = 1 = x (1) , x (n + 2) = x (n + 1) + x (n)Show how the Fibonacci Sequence can be considered a dynamical system.Solution. This sequence is extremely important in the study of reproducing rabbits. It canbe considered as a dynamical system as follows. Let y (n) = x (n + 1) . Then the aboverecurrence relation can be written as

x (n + 1)0 1x (n)x (0)1=,=y (n + 1)1 1y (n)y (0)1Let

0 1A=1 1

The eigenvalues of the matrix A are 1 = 21 21 5 and 2 = 12 5 + 12 . The corresponding

eigenvectors are, respectively," 1#" 1# 2 5 125 122X1 =, X2 =11

You can see from a short computation that one of the eigenvalues is smaller than 1 inabsolute value while the other is larger than 1 in absolute value. Now, diagonalizing A gives333

us

12

12

21 5

12

0 11 1

5+

12

12

12

12

51

12

12 5 1

12

Then it follows that for a given initial condition, the solution to this dynamical systemis of the form

There is so much more that can be said about dynamical systems. It is a major topic ofstudy in differential equations and what is given above is just an introduction.

7.3.5. Exercises1. Let A =

1 2. Diagonalize A to find A10 .2 1334

1 4 12. Let A = 0 2 5 . Diagonalize A to find A50 .0 0 5

1 2 11 . Diagonalize A to find A100 .3. Let A = 2 1231

4. The following is a Markov (migration) matrix for three locations

710

19

15

110

79

25

15

19

25

(a) Initially, there are 90 people in location 1, 81 in location 2, and 85 in location 3.

How many are in each location after one time period?(b) The total number of individuals in the migration process is 256. After a longtime, how many are in each location?5. The following is a Markov (migration) matrix for three locations

15

15

25

25

25

15

25

25

25

(a) Initially, there are 130 individuals in location 1, 300 in location 2, and 70 inlocation 3. How many are in each location after two time periods?(b) The total number of individuals in the migration process is 500. After a long time,how many are in each location?6. The following is a Markov (migration) matrix for three locations

310

38

13

110

38

13

35

14

13

The total number of individuals in the migration process is 480. After a long time,how many are in each location?

335

7. The following is a Markov (migration) matrix for three locations

310

13

15

310

13

710

25

13

110

The total number of individuals in the migration process is 1155. After a long time,how many are in each location?8. The following is a Markov (migration) matrix for three locations

25

110

18

310

25

58

310

12

14

The total number of individuals in the migration process is 704. After a long time, howmany are in each location?9. You own a trailer rental company in a large city and you have four locations, one inthe South East, one in the North East, one in the North West, and one in the SouthWest. Denote these locations by SE,NE,NW, and SW respectively. Suppose that thefollowing table is observed to take place.SE

NE

NW

SW

SE

13

110

110

15

NE

13

710

15

110

NW

29

110

35

15

SW

19

110

110

12

In this table, the probability that a trailer starting at NE ends in NW is 1/10, theprobability that a trailer starting at SW ends in NW is 1/5, and so forth. Approximately how many will you have in each location after a long time if the total numberof trailers is 413?10. You own a trailer rental company in a large city and you have four locations, one inthe South East, one in the North East, one in the North West, and one in the SouthWest. Denote these locations by SE,NE,NW, and SW respectively. Suppose that the

336

following table is observed to take place.

SE

NE

NW

SW

SE

17

14

110

15

NE

27

14

15

110

NW

17

14

35

15

SW

37

14

110

12

In this table, the probability that a trailer starting at NE ends in NW is 1/10, theprobability that a trailer starting at SW ends in NW is 1/5, and so forth. Approximately how many will you have in each location after a long time if the total numberof trailers is 1469.11. The following table describes the transition probabilities between the states rainy,partly cloudy and sunny. The symbol p.c. indicates partly cloudy. Thus if it starts offp.c. it ends up sunny the next day with probability 15 . If it starts off sunny, it ends upsunny the next day with probability 52 and so forth.rains sunny p.c.rains

15

15

13

sunny

15

25

13

p.c.

35

25

13

Given this information, what are the probabilities that a given day is rainy, sunny, orpartly cloudy?12. The following table describes the transition probabilities between the states rainy,partly cloudy and sunny. The symbol p.c. indicates partly cloudy. Thus if it starts off1p.c. it ends up sunny the next day with probability 10. If it starts off sunny, it ends2up sunny the next day with probability 5 and so forth.rains sunny p.c.rains

15

15

13

sunny

110

25

49

p.c.

710

25

29

Given this information, what are the probabilities that a given day is rainy, sunny, orpartly cloudy?337

13. You own a trailer rental company in a large city and you have four locations, one inthe South East, one in the North East, one in the North West, and one in the SouthWest. Denote these locations by SE,NE,NW, and SW respectively. Suppose that thefollowing table is observed to take place.SE

NE

NW

SW

SE

511

110

110

15

NE

111

710

15

110

NW

211

110

35

15

SW

311

110

110

12

In this table, the probability that a trailer starting at NE ends in NW is 1/10, theprobability that a trailer starting at SW ends in NW is 1/5, and so forth. Approximately how many will you have in each location after a long time if the total numberof trailers is 407?14. The University of Poohbah offers three degree programs, scouting education (SE),dance appreciation (DA), and engineering (E). It has been determined that the probabilities of transferring from one program to another are as in the following table.SEDAE

SE.8.1.1

DA.1.7.2

E.3.5.2

where the number indicates the probability of transferring from the top program tothe program on the left. Thus the probability of going from DA to E is .2. Find theprobability that a student is enrolled in the various programs.15. In the city of Nabal, there are three political persuasions, republicans (R), democrats(D), and neither one (N). The following table shows the transition probabilities betweenthe political parties, the top row being the initial political party and the side row beingthe political affiliation the following year.R D NR

15

16

27

15

13

47

35

12

17

Find the probabilities that a person will be identified with the various political persuasions. Which party will end up being most important?338

16. The following table describes the transition probabilities between the states rainy,partly cloudy and sunny. The symbol p.c. indicates partly cloudy. Thus if it starts offp.c. it ends up sunny the next day with probability 51 . If it starts off sunny, it ends upsunny the next day with probability 72 and so forth.rains sunny p.c.rains

15

27

59

sunny

15

27

13

p.c.

35

37

19

Given this information, what are the probabilities that a given day is rainy, sunny, orpartly cloudy?

7.4 Orthogonality

7.4.1. Orthogonal Diagonalization

We begin this section by recalling some important definitions. Recall from Definition 4.93that non-zero vectors are called orthogonal if their dot product equals 0. A set is orthonormalif it is orthogonal and each vector is a unit vector.An orthogonal matrix U, from Definition 4.97, is one in which UU T = I. In otherwords, the transpose of an orthogonal matrix is equal to its inverse. A key characteristicof orthogonal matrices, which will be essential in this section, is that the columns of anorthogonal matrix form an orthonormal set.We now recall another important definition.Definition 7.37: Symmetric and Skew Symmetric MatricesA real n n matrix A, is symmetric if AT = A. If A = AT , then A is called skewsymmetric.Before proving an essential theorem, we first examine the following lemma which will beused below.Lemma 7.38: The Dot ProductLet A = [aij ] be a real symmetric n n matrix, and let ~x, ~y Rn . ThenA~x ~y = ~x A~y339

Proof. This result follows from the definition of the dot product together with properties ofmatrix multiplication, as follows:XA~x ~y =akl xl ykk,l

(alk )T xl yk

k,l

= ~x AT ~y= ~x A~yThe last step follows from AT = A, since A is symmetric.We can now prove that the eigenvalues of a real symmetric matrix are real numbers.Consider the following important theorem.Theorem 7.39: Orthogonal EigenvectorsLet A be a real symmetric matrix. Then the eigenvalues of A are real numbers andeigenvectors corresponding to distinct eigenvalues are orthogonal.Proof. Recall that for a complex number a + ib, the complex conjugate, denoted by a + ibis given by a + ib = a ib. The notation, ~x will denote the vector which has every entryreplaced by its complex conjugate.Suppose A is a real symmetric matrix and A~x = ~x. ThenTTTTT~x ~x = A~x ~x = ~x AT ~x = ~x A~x = ~x ~xT

Dividing by ~x ~x on both sides yields = which says is real. To do this, we need to

TTensure that ~x ~x 6= 0. Notice that ~x ~x = 0 if and only if ~x = ~0. Since we chose ~x such thatA~x = ~x, ~x is an eigenvector and therefore must be nonzero.Now suppose A is real symmetric and A~x = ~x, A~y = ~y where 6= . Then since A issymmetric, it follows from Lemma 7.38 about the dot product that~x ~y = A~x ~y = ~x A~y = ~x ~y = ~x ~yHence ( ) ~x ~y = 0. It follows that, since 6= 0, it must be that ~x ~y = 0. Thereforethe eigenvectors form an orthogonal set.The following theorem is proved in a similar manner.Theorem 7.40: Eigenvalues of Skew Symmetric MatrixThe eigenvalues of a real skew symmetric matrix are either equal to 0 or are pureimaginary numbers.Proof. First, note that if A = 0 is the zero matrix, then A is skew symmetric and haseigenvalues equal to 0.340

Suppose A = AT so A is skew symmetric and A~x = ~x. Then

TTTTT~x ~x = A~x ~x = ~x AT ~x = ~x A~x = ~x ~xT

and so, dividing by ~x ~x as before, = . Letting = a + ib, this means a ib = a ib

and so a = 0. Thus is pure imaginary.Consider the following example.Example 7.41: Eigenvalues of a Skew Symmetric Matrix

0 1Let A =. Find its eigenvalues.10Solution. First notice that A is skew symmetric. By Theorem 7.40, the eigenvalues willeither equal 0 or be pure imaginary. The eigenvalues of A are obtained by solving the usualequation

Consider the following example.

1 2. Find its eigenvalues.Let A =2 3Solution. First, notice that A is symmetric. By Theorem 7.39, the eigenvalues will all bereal. The eigenvalues of A are obtained by solving the usual equation

x12= x2 4x 1 = 0det(xI A) = det2 x 3

The eigenvalues are given by 1 = 2 + 5 and 2 = 2 5 which are both real.

Recall that a diagonal matrix D = [dij ] is one in which dij = 0 whenever i 6= j. In otherwords, all numbers not on the main diagonal are equal to zero.Consider the following important theorem.Theorem 7.43: Orthogonal DiagonalizationLet A be a real symmetric matrix. Then there exists an orthogonal matrix U suchthatU T AU = Dwhere D is a diagonal matrix. Moreover, the diagonal entries of D are the eigenvaluesof A.341

We can use this theorem to diagonalize a symmetric matrix, using orthogonal matrices.Consider the following corollary.Corollary 7.44: Orthonormal Set of EigenvectorsIf A is a real n n symmetric matrix, then there exists an orthonormal set of eigenvectors, {~u1 , , ~un } .Proof. Since A is symmetric, then by Theorem 7.43, there exists an orthogonal matrix Usuch that U T AU = D, a diagonal matrix whose diagonal entries are the eigenvalues of A.Therefore, since A is symmetric and all the matrices are real,D = D T = U T AT U = U T AT U = U T AU = Dshowing D is real because each entry of D equals its complex conjugate.Now let

U = ~u1 ~u2 ~unwhere the ~ui denote the columns of U and

D=

0..

.n

The equation, U T AU = D implies AU = UD and

A~u1 A~u2 A~unAU =

1~u1 2~u2 n~un== UDwhere the entries denote the columns of AU and UD respectively. Therefore, A~ui = i ~ui .Since the matrix U is orthogonal, the ij th entry of U T U equals ij and soij = ~uTi ~uj = ~ui ~ujThis proves the corollary because it shows the vectors {~ui } form an orthonormal set.Example 7.45: Find an Orthonormal Set of EigenvectorsFind an orthonormal set of eigenvectors for the symmetric matrix

17 2 264 A = 2246

342

Solution. Recall Procedure 7.5 for finding the eigenvalues and eigenvectors of a matrix. Youcan verify that the eigenvalues are 18, 9, 2. First find the eigenvector for 18 by solving theequation (18I A)X = 0. The appropriate augmented matrix is given by

018 1722

218 640 2418 6 0

The reduced row-echelon form is

Therefore an eigenvector is

1 04 0 0 1 1 0 0 00 0

4 1 1

Next find the eigenvector for = 9. The augmented matrix and resulting reduced row-echelonform are

1 0 21 009 1722

29 6 4 0 0 1 1 0 24 9 6 00 00 0Thus an eigenvector for = 9 is

Finally find an eigenvector for = 2. The

echelon form are

2 1722

22 6 424 2 6

Thus an eigenvector for = 2 is

1 2 2

appropriate augmented matrix and reduced row

01 0 0 00 0 1 1 0 0 0 0 00

0 1 1

The set of eigenvectors for A is given by

10 4 1 , 2 , 1

121

You can verify that these eigenvectors form an orthogonal set. By dividing each eigenvectorby its magnitude, we obtain an orthonormal set:

140 111 1 , 2 , 1

1832211343

Consider the following example.

Example 7.46: Repeated EigenvaluesFind an orthonormal set of three eigenvectors for the matrix

10 2 2A = 2 13 4 2 4 13Solution. You can verify that the eigenvalues of A are 9 (with multiplicity two) and 18(with multiplicity one). Consider the eigenvectors corresponding to = 9. The appropriateaugmented matrix and reduced row-echelon form are given by

= 4 (4)55Next find the eigenvector for = 18. Therow-echelon form are given by

18 1022 218 1342418 13

augmented matrix and the resulting reduced

1 0 21 000 0 1 1 0 00 00 0

344

and so an eigenvector is

1 2 2

Dividing each eigenvector by its length, the orthonormal set is

212 151

4 ,21 ,

5153520In the above solution, the repeated eigenvalue implies that there would have been manyother orthonormal bases which could have been obtained. While we chose to take z = 0, y =1, we could just as easily have taken y = 0 or even y = z = 1. Any such change would haveresulted in a different orthonormal set.Recall the following definition.Definition 7.47: DiagonalizableAn n n matrix A is said to be non defective or diagonalizable if there exists aninvertible matrix P such that P 1 AP = D where D is a diagonal matrix.As indicated in Theorem 7.43 if A is a real symmetric matrix, there exists an orthogonalmatrix U such that U T AU = D where D is a diagonal matrix. Therefore, every symmetricmatrix is diagonalizable because if U is an orthogonal matrix, it is invertible and its inverse isU T . In this case, we say that A is orthogonally diagonalizable. In the following example,this orthogonal matrix U will be found.Example 7.48: Diagonalize a Symmetric Matrix

Solution. In this case, the eigenvalues are 2 (with multiplicity one) and 1 (with multiplicitytwo). First we will find an eigenvector for the eigenvalue 2. The appropriate augmentedmatrix and resulting reduced row-echelon form are given by

021001 00 031

02 2 2 0

0 1 1 0

3102 2 2 00 00 0345

and so an eigenvector is

0 1 1

However, it is desired that the eigenvectors be unit vectors and so dividing this vector by itslength gives

0 1 2 12

Next find the eigenvectors corresponding to the eigenvalue equal to 1. The appropriateaugmented matrix and resulting reduced row-echelon form are given by:

11000011031 01 2 2 0 0 0 0 0

130 0 0 002 1 2 0Therefore, the eigenvectors are of the form

s t t

01 1 Two of these which are orthonormal are 0 , choosing s = 1 and t = 0, and 2 ,102

letting s = 0, t = 1 and normalizing the resulting vector.

To obtain the desired orthogonal matrix, we let the orthonormal eigenvectors computedabove be the columns.

0 1 0 12 0 12 10 122To verify, compute U T AU as follows:

0 12 12

U T AU = 0 0 11

1022

1 0 0 0 1 0 3 1 22 1

02

131

0 2 202

1 0 0= 0 1 0 =D0 0 2

01212

the desired diagonal matrix. Notice that the eigenvectors, which construct the columns ofU, are in the same order as the eigenvalues in D.346

7.4.2. Positive Definite Matrices

7.4.3. QR FactorizationIn this section, a reliable factorization of matrices is studied. Called the QR factorization ofa matrix, it always exists. While much can be said about the QR factorization, this sectionwill be limited to real matrices. Therefore we assume the dot product used below is theusual dot product. We begin with a definition.Definition 7.49: QR FactorizationLet A be a real m n matrix. Then a QR factorization of A consists of two matrices,Q orthogonal and R upper triangular, such that A = QR.The following theorem claims that such a factorization exists.Theorem 7.50: Existence of QR FactorizationLet A be any real m n matrix with linearly independent columns. Then there existsan orthogonal matrix Q and an upper triangular matrix R having non-negative entrieson the main diagonal such thatA = QRThe procedure for obtaining the QR factorization for any matrix A is as follows.

3. Construct the orthogonal matrix Q as Q =

5. Finally, write A = QR where Q is the orthogonal matrix and R is the upper

triangular matrix obtained above.Notice that Q is an orthogonal matrix as the Ci form an orthonormal set. Since kBi k > 0for all i (since the length of a vector is always positive), it follows that R is an upper triangularmatrix with positive entries on the main diagonal.Consider the following example.Example 7.52: Finding a QR FactorizationLet

1 2A= 0 1 1 0

Find an orthogonal matrix Q and upper triangular matrix R such that A = QR.Solution. First, observe that A1 , A2 , the columns of A, are linearly independent. Thereforewe can use the Gram-Schmidt Process to create a corresponding orthogonal set {B1 , B2 } as

348

follows:B1 =B2 ==

1A1 = 0 1A2 B1A2 B1kB1 k2

12 1 2 0 210

1 1 1

Normalize each vector to create the set {C1 , C2 } as follows:

11 10B1 = C1 =kB1 k2 1

111B2 = 1 C2 =kB2 k3 1Now construct the orthogonal matrix Q as

C1 C2 CnQ = 11

12

13 13

Finally, construct the upper triangular matrix R as

kB1 k A2 C1R =0kB2

2 2=03It is left to the reader to verify that A = QR.

The QR Factorization and Eigenvalues

The QR factorization of a matrix has a very useful application. It turns out that it can beused repeatedly to estimate the eigenvalues of a matrix. Consider the following procedure.349

Procedure 7.53: Using the QR Factorization to Estimate Eigenvalues

Let A be an invertible matrix. Define the matrices A1 , A2 , as follows:1. A1 = A factored as A1 = Q1 R12. A2 = R1 Q1 factored as A2 = Q2 R23. A3 = R2 Q2 factored as A3 = Q3 R3Continue in this manner, where in general Ak = Qk Rk and Ak+1 = Rk Qk .Then it follows that this sequence of Ai converges to an upper triangular matrix whichis similar to A. Therefore the eigenvalues of A can be approximated by the entries onthe main diagonal of this upper triangular matrix.

7.4.4. Quadratic Forms

One of the applications of orthogonal diagonalization is that of quadratic forms and graphsof level curves of a quadratic form. This section has to do with rotation of axes so thatwith respect to the new axes, the graph of the level curve of a quadratic form is orientedparallel to the coordinate axes. This makes it much easier to understand. For example, weall know that x21 + x22 = 1 represents the equation in two variables whose graph in R2 is acircle of radius 1. But how do we know what the graph of the equation 5x21 + 4x1 x2 + 3x22 = 1represents?We first formally define what is meant by a quadratic form. In this section we will workwith only real quadratic forms, which means that the coefficients will all be real numbers.Definition 7.54: Quadratic FormA quadratic form is a polynomial of degree two in n variables x1 , x2 , , xn , writtenas a linear combination of x2i terms and xi xj terms.222Consider the quadratic form q = a11 x1 + a22 x2 + + ann xn + a12 x1 x2 + . We can write

x1 x2

~x = .. as the vector whose entries are the variables contained in the quadratic form. . xn

a11 a12 a1n

a21 a22 a2n

Similarly, let A = ...... be the matrix whose entries are the coefficients of ... an1 an2 ann2xi and xi xj from q. Note that the matrix A is not unique, and we will consider this furtherin the example below. Using this matrix A, the quadratic form can be written as q = ~xT A~x.

350

q = ~xT A~x=

x1 x2 xn

a11 x21

x1 x2

xn

a22 x22

a11 a12 a1n

a21 a22 a2n.........an1 an2 ann

x1x2...xn

a11 x1 + a21 x2 + + an1 xn

a12 x1 + a22 x2 + + an2 xn...

a1n x1 + a2n x2 + + ann xn

+ + ann x2n + a12 x1 x2 +

Lets explore how to find this matrix A. Consider the following example.Example 7.55: Matrix of a Quadratic FormLet a quadratic form q be given byq = 6x21 + 4x1 x2 + 3x22Write q in the form ~xT A~x.

This demonstrates that the matrix A is not unique, as there are several correct solutionsto a21 + a12 = 4. However, we will always choose the coefficients such that a21 = a12 =1(a + a12 ). This results in a21 = a12 = 2. This choice is key, as it will ensure that A turns2 21out to be a symmetric matrix.Hence,

The above procedure for choosing A to be symmetric applies for any quadratic form q.We will always choose coefficients such that aij = aji .We now turn our attention to the focus of this section. Our goal is to start with aquadratic form q as given above and find a way to rewrite it to eliminate the xi xj terms.This is done through a change of variables. In other words, we wish to find yi such that

Letting ~y =

y1y2...

q = d11 y12 + d22 y22 + + dnn yn2

and D = [dij ], we can write q = ~y T D~y where D is the matrix of

yncoefficients from q. There is something special about this matrix D that is crucial. Since noyi yj terms exist in q, it follows that dij = 0 for all i 6= j. Therefore, D is a diagonal matrix.Through this change of variables, we find the principal axes y1 , y2 , , yn of the quadraticform.This discussion sets the stage for the following essential theorem.Theorem 7.56: Diagonalizing a Quadratic FormLet q be a quadratic form in the variables x1 , , xn . It follows that q can be writtenin the form q = ~xT A~x where

x1 x2

~x = .. . xn

and A = [aij ] is the symmetric matrix of coefficients of q.

New variables y1 , y2, , yn can be found such that q = ~y T D~y where

y1 y2

~y = .. . yn

and D = [dij ] is a diagonal matrix. The matrix D contains the eigenvalues of A andis found by orthogonally diagonalizing A.

352

While not a formal proof, the following discussion should convince you that the abovetheorem holds. Let q be a quadratic form in the variables x1 , , xn . Then, q can be writtenin the form q = ~xT A~x for a symmetric matrix A. By Theorem 7.43 we can orthogonallydiagonalize the matrix A such that U T AU = D for an orthogonal matrix U and diagonalmatrix D.

The following procedure details the steps for the change of variables given in the abovetheorem.Procedure 7.57: Diagonalizing a Quadratic FormLet q be a quadratic form in the variables x1 , , xn given byq = a11 x21 + a22 x22 + + ann x2n + a12 x1 x2 + Then, q can be written as q = d11 y12 + + dnn yn2 as follows:1. Write q = ~xT A~x for a symmetric matrix A.2. Orthogonally diagonalize A to be written as U T AU = D for an orthogonal matrixU and diagonal matrix D.

y1 y2

3. Write ~y = .. . Then, ~x = U~y .

. yn4. The quadratic form q will now be given by

q = d11 y12 + + dnn yn2 = ~y T D~y

where D = [dij ] is the diagonal matrix found by orthogonally diagonalizing A.Consider the following example.

353

Example 7.58: Choosing New Axes to Simplify a Quadratic Form

Consider the following level curve6x21 + 4x1 x2 + 3x22 = 7shown in the following graph.x2x1

Use a change of variables to choose new axes such that the ellipse is oriented parallelto the new coordinate axes. In other words, use a change of variables to rewrite q toeliminate the x1 x2 term.Solution. Notice that the level curve is given by q = 7 for q = 6x21 + 4x1 x2 + 3x22 . This is thesame quadratic form that we examined earlier in Example 7.55. Therefore we know that wecan write q = ~xT A~x for the matrix

6 2A=2 3Now we want to orthogonally diagonalize A to write U T AU = D for an orthogonal matrixU and diagonal matrix D. The details are left to the reader, and you can verify that theresulting matrices are 2

155

U = 125

D =

7 00 2

y1. It follows that ~x = U~y .Next we write ~y =y2We can now express the quadratic form q in terms of y, using the entries from D ascoefficients as follows:

q = d11 y12 + d22 y22

= 7y12 + 2y22Hence the level curve can be written 7y12 + 2y22 = 7. The graph of this equation is givenby:

354

y2

y1

The change of variables results in new axes such that with respect to the new axes, theellipse is oriented parallel to the coordinate axes. These are called the principal axes ofthe quadratic form.The following is another example of diagonalizing a quadratic form.Example 7.59: Choosing New Axes to Simplify a Quadratic FormConsider the level curve5x21 6x1 x2 + 5x22 = 8shown in the following graph.x2x1

Use a change of variables to choose new axes such that the ellipse is oriented parallelto the new coordinate axes. In other words, use a change of variables to rewrite q toeliminate the x1 x2 term.T

x1x2

Solution. First, express the level curve as ~x A~x where ~x =

a11 a12A=. Then q = ~xT A~x is given bya21 a22

a11 a12x1x1 x2q =x2a21 a22

and A is symmetric. Let

= a11 x21 + (a12 + a21 )x1 x2 + a22 x22

Equating this to the given description for q, we have

5x21 6x1 x2 + 5x22 = a11 x21 + (a12 + a21 )x1 x2 + a22 x22355

This implies that a11 = 5, a22 = 5 and in order

for A to be symmetric, a12 = a22 =5 31(a + a21 ) = 3. The result is A =. We can write q = ~xT A~x as2 1235

5 3x1x1 x2=835x2

Next, orthogonally diagonalize the matrix A to write U T AU = D. The details are left tothe reader and the necessary matrices are given by

112222

U =112

222

2 0D =0 8

y1, such that ~x = U~y . Then it follows that q is given byWrite ~y =y2q = d11 y12 + d22 y22= 2y12 + 8y22Therefore the level curve can be written as 2y12 + 8y22 = 8.This is an ellipse which is parallel to the coordinate axes. Its graph is of the formy2

y1

Thus this change of variables chooses new axes such that with respect to these new axes,the ellipse is oriented parallel to the coordinate axes.

7.4.5. Exercises1. Find the eigenvalues and an orthonormal basis of eigenvectors for A.

11 1 4A = 1 11 4 4 4 14Hint: Two eigenvalues are 12 and 18.

356

2. Find the eigenvalues and an orthonormal basis of eigenvectors for A.

41 2A= 14 2 2 27Hint: One eigenvalue is 3.

3. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by

9. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A by

Hint: The eigenvalues are 0, 2, 2 where 2 is listed twice because it is a root of multiplicity 2.10. Find the eigenvalues and an orthonormal basis of eigenvectors for A. Diagonalize A byfinding an orthogonal matrix U and a diagonal matrix D such that U T AU = D.

A=

1

13 26

13 66

16

3 232

112

2 6

3 6

12 61216

12

Hint: The eigenvalues are 2, 1, 0.

11. Find the eigenvalues and an orthonormal basis of eigenvectors for the matrix

13

16

3 2

1 3

2A= 6 3 2

7 3 6 1 2 61812

Hint: The eigenvalues are 1, 2, 2.

358

7 183 6

1 122 6 56

12. Find the eigenvalues and an orthonormal basis of eigenvectors for the matrix

A=

1 51 6 5 105 12

71

6 15 6 555

195 15 6 1010

Hint: The eigenvalues are 1, 2, 1 where 1 is listed twice because it has multiplicity2 as a zero of the characteristic equation.13. Explain why a matrix A is symmetric if and only if there exists an orthogonal matrixU such that A = U T DU for D a diagonal matrix.14. Show that if A is a real symmetric matrix and and are two different eigenvalues,then if ~x is an eigenvector for and ~y is an eigenvector for , then ~x ~y = 0. Also alleigenvalues are real. Supply reasons for each step in the following argument. First~xT ~x = (A~x)T ~x = ~xT A~x = ~xT A~x = ~xT ~x = ~xT ~xand so = . This shows that all eigenvalues are real. It follows all the eigenvectorsare real. Why? Now let ~x, ~y , and be given as above. (~x ~y ) = ~x ~y = A~x ~y = ~x A~y = ~x ~y = (~x ~y ) = (~x ~y )and so( ) ~x ~y = 0Why does it follow that ~x ~y = 0?15. Using the Gram Schmidt process or the QR factorization, find an orthonormal basisfor the following span:

21 1

span2 , 1 , 0

130

16. Using the Gram Schmidt process or the QR factorization, find an orthonormal basisfor the following span:

1 21

012

span = 1 , 3 , 0

110

359

17. A quadratic form in three variables is an expression of the form a1 x2 + a2 y 2 + a3 z 2 +

a4 xy + a5 xz + a6 yz. Show that every such quadratic form may be written as

x

x y z A y zwhere A is a symmetric matrix.

18. Given a quadratic form in three variables, x, y, and z, show there exists an orthogonalmatrix U and variables x , y , z such that

xx y = U y zzwith the property that in terms of the new variables, the quadratic form is2

1 (x ) + 2 (y ) + 3 (z )

where the numbers, 1 , 2 , and 3 are the eigenvalues of the matrix A in Problem 17.19. Consider the quadratic form q given by q = 3x21 12x1 x2 2x22 .(a) Write q in the form ~xT A~x for an appropriate symmetric matrix A.(b) Use a change of variables to rewrite q to eliminate the x1 x2 term.20. Consider the quadratic form q given by q = 2x21 + 2x1 x2 2x22 .(a) Write q in the form ~xT A~x for an appropriate symmetric matrix A.(b) Use a change of variables to rewrite q to eliminate the x1 x2 term.21. Consider the quadratic form q given by q = 7x21 + 6x1 x2 x22 .(a) Write q in the form ~xT A~x for an appropriate symmetric matrix A.(b) Use a change of variables to rewrite q to eliminate the x1 x2 term.

360

8. Some Curvilinear Coordinate Systems

8.1 Polar Coordinates and Polar Graphs

OutcomesA. Understand polar coordinates.B. Convert points between Cartesian and polar coordinates.You have likely encountered the Cartesian coordinate system in many aspects of mathematics. There is an alternative way to represent points in space, called polar coordinates.The idea is suggested in the following picture.y

(x, y)(r, )r

Consider the point above, which would be specified as (x, y) in Cartesian coordinates.We can also specify this point using polar coordinates, which we write as (r, ). The numberr is the distance from the origin(0, 0) to the point, while is the angle shown between thepositive x axis and the line from the origin to the point. In this way, the point can bespecified in polar coordinates as (r, ).Now suppose we are given an ordered pair (r, ) where r and are real numbers. Wewant to determine the point specified by this ordered pair. We can use to identify a rayfrom the origin as follows. Let the ray pass from (0, 0) through the point (cos , sin ) asshown.

361

(cos(), sin())

The ray is identified on the graph as the line from the origin, through the point (cos(), sin()).Now if r > 0, go a distance equal to r in the direction of the displayed arrow starting at(0, 0). If r < 0, move in the opposite direction a distance of |r|. This is the point determinedby (r, ).It is common to assume that is in the interval [0, 2) and r > 0. In this case, there isa very simple relationship between the Cartesian and polar coordinates, given byx = r cos () , y = r sin ()

(8.1)

These equations demonstrate how to find the Cartesian coordinates when we are giventhe polar coordinates of a point. They can also be used to find the polar coordinates whenwe know (x, y). A simpler way to do this is the following equations:pr = x2 + y 2(8.2)ytan () = xIn the next example, we look at how to find the Cartesian coordinates of a point specifiedby polar coordinates.Example 8.1: Finding Cartesian CoordinatesThe polar coordinates of a point in the plane are (5, /6). Find the Cartesian coordinates of this point.Solution. The point is specified by the polar coordinates (5, /6). Therefore r = 5 and = /6. From 8.1 5=x = r cos () = 5 cos362 5y = r sin () = 5 sin=62

Thus the Cartesian coordinates are 52 3, 52 . The point is shown in the below graph.

362

( 52 3, 25 )

Consider the following example of the case where r < 0.

Example 8.2: Finding Cartesian CoordinatesThe polar coordinates of a point in the plane are (5, /6) . Find the Cartesian coordinates.Solution. For the point specified by the polar coordinates (5, /6), r = 5, and x = /6.From 8.1

5=x = r cos () = 5 cos362

5=y = r sin () = 5 sin62

Thus the Cartesian coordinates are 52 3, 52 . The point is shown in the following graph.

( 52 3, 52 )Recall from thepreviousexample that for the point specified by (5, /6), the Cartesian

55coordinates are 2 3, 2 . Notice that in this example, by multiplying r by 1, the resultingCartesian coordinates are also multiplied by 1.The following picture exhibits both points in the above two examples to emphasize howthey are just on opposite sides of (0, 0) but at the same distance from (0, 0).

363

( 25 3, 52 )

( 52 3, 52 )

In the next two examples, we look at how to convert Cartesian coordinates to polarcoordinates.Example 8.3: Finding Polar CoordinatesSuppose the Cartesian coordinates of a point are (3, 4). Find a pair of polar coordinateswhich correspond to this point.

identify the angle between the positive x axis and the line from the origin to the point.Since both the x and y values are positive, the point is in the first quadrant. Therefore, isbetween 0 and /2 . Using this and 8.2, we have to solve:tan () =

Conversely, we can use equation 8.1 as follows:

Suppose the Cartesian coordinates of a point are 3, 1 . Find the polar coordinateswhich correspond to this point.

Solution. Given the point 3, 1 ,

r =

12 + ( 3)2

1+3== 2

364

In this case, the point is in the second quadrant since the x value is negative and the y valueis positive. Therefore, will be between /2 and . Solving the equations

3 = 2 cos ()1 = 2 sin ()we find that = 5/6. Hence the polar coordinates for this point are (2, 5/6).Consider this example. Suppose we used r = 2 and = 2 (/6) = 11/6. Thesecoordinates specify the same point as above. Observe that there are infinitely many ways toidentify this particular point with polar coordinates. In fact, every point can be representedwith polar coordinates in infinitely many ways. Because of this, it will usually be the casethat is confined to lie in some interval of length 2 and r > 0, for real numbers r and .Just as with Cartesian coordinates, it is possible to use relations between the polarcoordinates to specify points in the plane. The process of sketching the graphs of theserelations is very similar to that used to sketch graphs of functions in Cartesian coordinates.Consider a relation between polar coordinates of the form, r = f (). To graph such arelation, first make a table of the form

12...

rf (1 )f (2 )...

Graph the resulting points and connect them with a curve. The following picture illustrateshow to begin this process.

21

To find the point in the plane corresponding to the ordered pair (f () , ), we follow thesame process as when finding the point corresponding to (r, ).Consider the following example of this procedure, incorporating computer software.Example 8.5: Graphing a Polar EquationGraph the polar equation r = 1 + cos .Solution. We will use the computer software Maple to complete this example. The commandwhich produces the polar graph of the above equation is: > plot(1+cos(t),t= 0..2*Pi,coords=polar).365

Here we use t to represent the variable for convenience. The command tells Maple that ris given by 1 + cos (t) and that t [0, 2].

The above graph makes sense when considered in terms of trigonometric functions. Suppose = 0, r = 2 and let increase to /2. As increases, cos decreases to 0. Thus theline from the origin to the point on the curve should get shorter as goes from 0 to /2.As goes from /2 to , cos decreases, eventually equaling 1 at = . Thus r = 0 atthis point. This scenario is depicted in the above graph, which shows a function called acardioid.The following picture illustrates the above procedure for obtaining the polar graph ofr = 1 + cos(). In this picture, the concentric circles correspond to values of r while therays from the origin correspond to the angles which are shown on the picture. The dot onthe ray corresponding to the angle /6 is located at a distance of r = 1 + cos(/6) fromthe origin. The dot on the ray corresponding to the angle /3 is located at a distance ofr = 1 + cos(/3) from the origin and so forth. The polar graph is obtained by connectingsuch points with a smooth curve, with the result being the figure shown above.

366

Consider another example of constructing a polar graph.

To see the way this is graphed, consider the following picture. First the indicated pointswere graphed and then the curve was drawn to connect the points. When done by a computer,many more points are used to create a more accurate picture.Consider first the following table of points.

/6/3 /2 5/6

3+1 211 3 1

4/3 7/601 3

5/32

Note how some entries in the table have r < 0. To graph these points, simply move inthe opposite direction. These types of points are responsible for the small loop on the insideof the larger loop in the graph.

367

The process of constructing these graphs can be greatly facilitated by computer software.However, the use of such software should not replace understandingthe steps involved.7The next example shows the graph for the equation r = 3 + sin. For complicated6polar graphs, computer software is used to facilitate the process.Example 8.7: A Polar Graph

7for [0, 14].Graph r = 3 + sin6Solution.

The next example shows another situation in which r can be negative.

368

Example 8.8: A Polar Graph: Negative r

Graph r = 3sin(4) for [0, 2].Solution.

We conclude this section with an interesting graph of a simple polar equation.

Example 8.9: The Graph of a SpiralGraph r = for [0, 2].Solution. The graph of this polar equation is a spiral. This is the case because as increases,so does r.

In the next section, we will look at two ways of generalizing polar coordinates to threedimensions.369

(a) r = 1 sin (2) , [0, 2]

(b) r = sin (4) , [0, 2]

(c) r = cos (3) + sin (2) , [0, 2]

(d) r = , [0, 15]

5. Graph the polar equation r = 1 + sin for [0, 2].

6. Graph the polar equation r = 2 + sin for [0, 2].7. Graph the polar equation r = 1 + 2 sin for [0, 2].8. Graph the polar equation r = 2 + sin (2) for [0, 2].9. Graph the polar equation r = 1 + sin (2) for [0, 2].10. Graph the polar equation r = 1 + sin (3) for [0, 2].11. Describe how to solve for r and in terms of x and y in polar coordinates.12. This problem deals with parabolas, ellipses, and hyperbolas and their equations. Letl, e > 0 and considerlr=1 e cos Show that if e = 0, the graph of this equation gives a circle. Show that if 0 < e < 1,the graph is an ellipse, if e = 1 it is a parabola and if e > 1, it is a hyperbola.

8.2 Spherical and Cylindrical Coordinates

OutcomesA. Understand cylindrical and spherical coordinates.B. Convert points between Cartesian, cylindrical, and spherical coordinates.Spherical and cylindrical coordinates are two generalizations of polar coordinates to threedimensions. We will first look at cylindrical coordinates .When moving from polar coordinates in two dimensions to cylindrical coordinates inthree dimensions, we use the polar coordinates in the xy plane and add a z coordinate. Forthis reason, we use the notation (r, , z) to express cylindrical coordinates. The relationshipbetween Cartesian coordinates (x, y, z) and cylindrical coordinates (r, , z) is given byx = r cos ()y = r sin ()z=z371

where r 0, [0, 2), and z is simply the Cartesian coordinate. Notice that x and y aredefined as the usual polar coordinates in the xy-plane. Recall that r is defined as the lengthof the ray from the origin to the point (x, y, 0), while is the angle between the positivex-axis and this same ray.To illustrate this coordinate system, consider the following two pictures. In the first ofthese, both r and z are known. The cylinder corresponds to a given value for r. A useful wayto think of r is as the distance between a point in three dimensions and the z-axis. Everypoint on the cylinder shown is at the same distance from the z-axis. Giving a value for zresults in a horizontal circle, or cross section of the cylinder at the given height on the z axis(shown below as a black line on the cylinder). In the second picture, the point is specifiedcompletely by also knowing as shown.z

z(x, y, z)ry

y(x, y, 0)

xr, and z are known

r and z are known

Every point of three dimensional space other than the z axis has unique cylindricalcoordinates. Of course there are infinitely many cylindrical coordinates for the origin andfor the z-axis. Any will work if r = 0 and z is given.Consider now spherical coordinates, the second generalization of polar form in threedimensions. For a point (x, y, z) in three dimensional space, the spherical coordinates aredefined as follows. : the length of the ray from the origin to the point : the angle between the positive x-axis and the ray from the origin to the point (x, y, 0) : the angle between the positive z-axis and the ray from the origin to the point of interestThe spherical coordinates are determined by (, , ). The relation between these and theCartesian coordinates (x, y, z) for a point are as follows.x = sin () cos () , [0, ]y = sin () sin () , [0, 2)z = cos , 0.Consider the pictures below. The first illustrates the surface when is known, which is asphere of radius . The second picture corresponds to knowing both and , which resultsin a circle about the z-axis. Suppose the first picture demonstrates a graph of the Earth.Then the circle in the second picture would correspond to a particular latitude.372

yxy

x is known

and are known

Giving the third coordinate, completely specifies the point of interest. This is demonstrated in the following picture. If the latitude corresponds to , then we can think of asthe longitude.z

x, and are knownThe following picture summarizes the geometric meaning of the three coordinate systems.z(, , )(r, , z)(x, y, z)

yr(x, y, 0)

373

Therefore, we can represent the same point in three ways, using Cartesian coordinates,(x, y, z), cylindrical coordinates, (r, , z), and spherical coordinates (, , ).Using this picture to review, call the point of interest P for convenience. The Cartesiancoordinates for P are (x, y, z). Then is the distance between the origin and the point P .The angle between the positive z axis and the line between the origin and P is denoted by. Then is the angle between the positive x axis and the line joining the origin to thepoint (x, y, 0) as shown. This gives the spherical coordinates, (, , ). Given the line fromthe origin to (x, y, 0), r = sin() is the length of this line. Thus r and determine a pointin the xy-plane. In other words, r and are the usual polar coordinates and r 0 and [0, 2). Letting z denote the usual z coordinate of a point in three dimensions, (r, , z)are the cylindrical coordinates of P .The relation between spherical and cylindrical coordinates is that r = sin() and the is the same as the of cylindrical and polar coordinates.We will now consider some examples.Example 8.10: Describing a Surface in Spherical CoordinatespExpress the surface z = 13 x2 + y 2 in spherical coordinates.Solution. We will use the equations from above:x = sin () cos () , [0, ]y = sin () sin () , [0, 2)z = cos , 0To express the surface in spherical coordinates, we substitute these expressions into theequation. This is done as follows:q11

( sin () cos ())2 + ( sin () sin ())2 =

3 sin () . cos () =33This reduces totan () =

and so = /3.Example 8.11: Describing a Surface in Spherical CoordinatesExpress the surface y = x in terms of spherical coordinates.Solution. Using the same procedure as the previous example, this says sin () sin () = sin () cos (). Simplifying, sin () = cos (), which you could also write tan () = 1.We conclude this section with an example of how to describe a surface using cylindricalcoordinates.

8.2.1. Exercises1. The following are the cylindrical coordinates of points, (r, , z). Find the Cartesianand spherical coordinates of each point.

(a) 5, 5, 36

(b) 3, 3 , 4

(c) 4, 2,13

(d) 2, 3, 24

,1(e) 3, 32

11(f) 8, 6 , 112. The following are the Cartesian coordinates of points, (x, y, z). Find the cylindricaland spherical coordinates of these points.

(a) 25 2, 52 2, 3

(b) 23 , 23 3, 2

(c) 52 2, 52 2, 11

(d) 52 , 52 3, 23

(e) 3, 1, 5

(f) 23 , 32 3, 7

(g)2, 6, 2 2

(h) 12 3, 32 , 1

(i) 34 2, 34 2, 32 3375

(j) 3, 1, 2 3

(k) 14 2, 14 6, 12 2

3. The following are spherical coordinates of points in the form (, , ). Find the Cartesian and cylindrical coordinates of each point.

(a) 4, 4 , 56

(b) 2, 3 , 23

5 3(c) 3, 6 , 2

(d) 4, 2 , 74

(e) 4, 2,3 6

, 5(f) 4, 343

4. Describe the surface = /4 in Cartesian coordinates, where is the polar angle in

spherical coordinates.5. Describe the surface = /4 in spherical coordinates, where is the angle measuredfrom the positive x axis.6. Describe the surface r = 5 in Cartesian coordinates, where r is one of the cylindricalcoordinates.7. Describe the surface = 4 in Cartesian coordinates, where is the distance to theorigin.p8. Give the cone described by z = x2 + y 2 in cylindrical coordinates and in sphericalcoordinates.9. The following are described in Cartesian coordinates. Rewrite them in terms of spherical coordinates.(a) z = x2 + y 2 .(b) x2 y 2 = 1.

OutcomesA. Develop the abstract concept of a vector space through axioms.B. Deduce basic properties of vector spaces.C. Use the vector space axioms to determine if a set and its operations constitutea vector space.In this section we consider the idea of an abstract vector space. A vector space issomething which has two operations satisfying the following vector space axioms. In thefollowing definition we define two operations; vector addition, denoted by + and scalarmultiplication denoted by placing the scalar next to the vector. A vector space need nothave usual operations, and for this reason the operations will always be given in the definitionof the vector space. The below axioms for addition (written +) and scalar multiplicationmust hold for however addition and scalar multiplication are defined for the vector space.It is important to note that we have seen much of this content before, in terms of Rn . Wewill prove in this section that Rn is an example of a vector space and therefore all discussionsin this chapter will pertain to Rn . While it may be useful to consider all concepts of thischapter in terms of Rn , it is also important to understand that these concepts apply to allvector spaces.In the following definition, we will choose scalars a, b to be real numbers and are thusdealing with real vector spaces. However, we could also choose scalars which are complexnumbers. In this case, we would call the vector space V complex.

379

Definition 9.1: Vector Space Axioms

A vector space V is a set of vectors with two operations defined, addition and scalarmultiplication. Let ~v , w,~ ~z be vectors in V . Then they satisfy the following axioms ofaddition: Closed under AdditionIf ~v , w~ are in V, then ~v + w~ is also in V. The Commutative Law of Addition~v + w~ =w~ + ~v The Associative Law of Addition(~v + w)~ + ~z = ~v + (w~ + ~z ) The Existence of an Additive Identity~v + ~0 = ~v The Existence of an Additive Inverse~v + (~v ) = ~0Let a, b R. The following axioms apply to the operation of scalar multiplication. Closed under Scalar MultiplicationIf a is a real number, and ~v is in V, then a~v is in V.

a (~v + w)~ = a~v + aw~(a + b) ~v = a~v + b~v

a (b~v ) = (ab)~v

1~v = ~v

Consider the following example, in which we prove that Rn is in fact a vector space.

380

Example 9.2: RnRn , under the usual operations of vector addition and scalar multiplication, is a vectorspace.Solution. To show that Rn is a vector space, we need to show that the above axioms hold.Let ~x, ~y , ~z be vectors in Rn . We first prove the axioms for vector addition. To show that Rn is closed under addition, we must show that for two vectors in Rntheir sum is also in Rn . The sum ~x + ~y is given by:

x1y1x1 + y1 x2 y2 x2 + y2

.. + .. =

.. . .

.xnynxn + yn

The sum is a vector with n entries, showing that it is in Rn . Hence Rn is closed undervector addition.

To show that addition is commutative, consider the following:

We will show that addition of vectors in Rn is associative in a similar way.

x1y1z1 x2 y2 z2

(~x + ~y ) + ~z = .. + .. + .. . . . xnynzn

x1 + y1z1 x2 + y2 z2

= + .. ..

.. xn + ynzn

(x1 + y1 ) + z1 (x2 + y2 ) + z2

..

.(xn + yn ) + zn

x1 + (y1 + z1 ) x2 + (y2 + z2 )

..

.xn + (yn + zn )

x1y1 + z1 x2 y2 + z2

= .. +

.. .

.xnyn + zn

x1y1z1 x2 y2 z2

= .. + .. + .. . . . xnynzn= ~x + (~y + ~z )

Hence addition of vectors is associative.

~ Next, we show the existence of an additive identity. Let 0 =

382

00...0

~x + ~0 =

x1x2...xn

x1 + 0 x2 + 0

= ..

.xn + 0

x1 x2

= .. . xn= ~x

00...0

Hence the zero vector ~0 is an additive identity.

Next, we prove the existence of an additive inverse. Let ~x =

~x + (~x) =

x1x2...xn

x1 x1 x2 x2

= ..

.xn xn

0 0

= .. . 0= ~0

x1x2...xn

x1x2...xn

Hence ~x is an additive inverse.

We now need to prove the axioms related to scalar multiplication. Let a, b be real numbersand let ~x, ~y be vectors in Rn .383

We first show that Rn is closed under scalar multiplication. To do so, we show that a~xis also a vector with n entries.

x1ax1 x2 ax2

a~x = a .. = .. . . xnaxn

The vector a~x is again a vector with n entries, showing that Rn is closed under scalarmultiplication.

We wish to show that a(~x + ~y ) = a~x + a~y .

a(~x + ~y ) = a

x1x2...xn

x1x2...xn

x1 + y1 x2 + y2

= a

..

.xn + yn

a(x1 + y1 ) a(x2 + y2 )

..

.a(xn + yn )

ax1 + ay1 ax2 + ay2

..

.axn + ayn

ax1ay1 ax2 ay2

= .. + .. . .axn= a~x + a~y

Next, we wish to show that (a + b)~x = a~x + b~x.

384

ayn

(a + b)~x = (a + b)

x1x2...xn

(a + b)x1(a + b)x2...

(a + b)xnax1 + bx1ax2 + bx2...

axn + bxn

bx1ax1 ax2 bx2

= .. + .. . . bxnaxn= a~x + b~x

385

We wish to show that a(b~x) = (ab)~x.

a(b~x) = a b

= a

x1x2...

xn

bx1bx2...

bxn

a(bx1 )a(bx2 )...

a(bxn )

(ab)x1(ab)x2...

(ab)xn

x1 x2

= (ab) .. .xn

= (ab)~x

Finally, we need to show that 1~x = ~x.

1~x = 1

= ~x

386

x1x2...xn

1x11x2...

1xn

x1x2

.. . xn

By the above proofs, it is clear that Rn satisfies the vector space axioms. Hence, Rn is avector space under the usual operations of vector addition and scalar multiplication.We now consider some examples of vector spaces.Example 9.3: Vector Space of PolynomialsLet P2 be the set of all polynomials of at most degree 2 as well as the zero polynomial.Define addition to be the standard addition of polynomials, and scalar multiplicationthe usual multiplication of a polynomial by a number. Then P2 is a vector space.Solution. We can write P2 explicitly as

P2 = a2 x2 + a1 x + a0 |ai R for all i

To show that P2 is a vector space, we verify the axioms. Let p(x), q(x), r(x) be polynomialsin P2 and let a, b, c be real numbers. Write p(x) = p0 + p1 x + p2 x2 , q(x) = q0 + q1 x + q2 x2 ,and r(x) = r0 + r1 x + r2 x2 . We first prove that addition of polynomials in P2 is closed. For two polynomials in P2we need to show that their sum is also a polynomial in P2 . From the definition of P2 ,a polynomial is contained in P2 if it is of degree at most 2 or the zero polynomial.p(x) + q(x) = p2 x2 + p1 x + p0 + q2 x2 + q1 x + q0= (p2 + q2 )x2 + (p1 + q1 )x + (p0 + q0 )The sum is a polynomial of degree 2 and therefore is in P2 . It follows that P2 is closedunder addition. We need to show that addition is commutative, that is p(x) + q(x) = q(x) + p(x).p(x) + q(x) =====

The next axiom which needs to be verified is a(bp(x)) = (ab)p(x).

Finally, we show that 1p(x) = p(x).

1p(x) ====

1 p2 x2 + p1 x + p01p2 x2 + 1p1 x + 1p0p2 x2 + p1 x + p0p(x)

Since the above axioms hold, we know that P2 as described above is a vector space.Another important example of a vector space is the set of all matrices of the same size.Example 9.4: Vector Space of MatricesLet M2,3 be the set of all 2 3 matrices. Using the usual operations of matrix additionand scalar multiplication, show that M2,3 is a vector space.Solution. Let A, B be 2 3 matrices in M2,3 . We first prove the axioms for addition. In order to prove that M2,3 is closed under matrix addition, we show that the sumA + B is in M2,3 . This means showing that A + B is a 2 3 matrix.

b11 b12 b13a11 a12 a13+A+B =b21 b22 b23a21 a22 a23

a11 + b11 a12 + b12 a13 + b13=a21 + b21 a22 + b22 a23 + b23You can see that the sum is a 2 3 matrix, so it is in M2,3 . It follows that M2,3 isclosed under matrix addition. The remaining axioms regarding matrix addition follow from properties of matrix addition, found in Proposition 2.7. Therefore M2,3 satisfies the axioms of matrix addition.We now turn our attention to the axioms regarding scalar multiplication. Let A, B bematrices in M2,3 and let c be a real number.389

We first show that M2,3 is closed under scalar multiplication. That is, we show thatcA a 2 3 matrix.

a11 a12 a13cA = ca21 a22 a23

ca11 ca12 ca13=ca21 ca22 ca23This is a 2 3 matrix in M2,3 which proves that the set is closed under scalar multiplication. The remaining axioms of scalar multiplication follow from properties of scalar multiplication of matrices, from Proposition 2.10. Therefore M2,3 satisfies the axioms of scalarmultiplication.In conclusion, M2,3 satisfies the required axioms and is a vector space.While here we proved that the set of all 2 3 matrices is a vector space, there is nothingspecial about this choice of matrix size. In fact if we instead consider Mm,n , the set of allm n matrices, then Mm,n is a vector space under the operations of matrix addition andscalar multiplication.We now examine an example of a set that does not satisfy all of the above axioms, andis therefore not a vector space.Example 9.5: Not a Vector SpaceLet V denote the set of 2 3 matrices. Let addition in V be defined by A + B = A formatrices A, B in V . Let scalar multiplication in V be the usual scalar multiplicationof matrices. Show that V is not a vector space.Solution. In order to show that V is not a vector space, it suffices to find only one axiomwhich is not satisfied. We will begin by examining the axioms for addition until one is foundwhich does not hold. Let A, B be matrices in V . We first want to check if addition is closed. Consider A + B. By the definition ofaddition in the example, we have that A + B = A. Since A is a 2 3 matrix, it followsthat the sum A + B is in V , and V is closed under addition. We now wish to check if addition is commutative. That is, we want to check if A+ B =B + A for all choices of A and B in V . From the definition of addition, we have thatA + B = A and B + A = B. Therefore, we can find A, B in V such that these sumsare not equal. One example is

(a (f + g)) (x) = a (f + g) (x) = a (f (x) + g (x))

((ab) f ) (x) = (ab) f (x) = a (bf (x)) = (a (bf )) (x)

Finally (1f ) (x) = 1f (x) = f (x) so 1f = f .

It follows that V satisfies all the required axioms and is a vector space.

We conclude this section with the following important theorem.

Theorem 9.7: UniquenessIn any vector space, the following are true:1. ~0, the additive identity, is unique2. ~x, the additive inverse, is unique3. 0~x = ~0 for all vectors ~x4. (1) ~x = ~x for all vectors ~xProof.392

1. When we say that the additive identity, ~0, is unique, we mean that if a vector actslike the additive identity, then it is the additive identity. To prove this uniqueness, wewant to show that another vector which acts like the additive identity is actually equalto ~0.Suppose ~0 is also an additive identity. Then,~0 + ~0 = ~0Now, for ~0 the additive identity given above in the axioms, we have that~0 + ~0 = ~0So by commutativity:0 = 0 + 0 = 0 + 0 = 0This says that if a vector acts like an additive identity (such as ~0 ), it in fact equals ~0.This proves the uniqueness of ~0.2. When we say that the additive inverse, ~x, is unique, we mean that if a vector actslike the additive inverse, then it is the additive inverse. Suppose that ~y acts like anadditive inverse:~x + ~y = ~0Then the following holds:~y = ~0 + ~y = (~x + ~x) + ~y = ~x + (~x + ~y ) = ~x + ~0 = ~xThus if ~y acts like the additive inverse, it is equal to the additive inverse ~x. Thisproves the uniqueness of ~x.3. This statement claims that for all vectors ~x, scalar multiplication by 0 equals the zerovector ~0. Consider the following, using the fact that we can write 0 = 0 + 0:0~x = (0 + 0) ~x = 0~x + 0~xWe use a small trick here: add 0~x to both sides. This gives0~x + (0~x) = 0~x + 0~x + (~x)~0 + 0 = 0~x + 0~0 = 0~xThis proves that scalar multiplication of any vector by 0 results in the zero vector ~0.4. Finally, we wish to show that scalar multiplication of 1 and any vector ~x results inthe additive inverse of that vector, ~x. Recall from 2. above that the additive inverseis unique. Consider the following:(1) ~x + ~x ====393

(1) ~x + 1~x(1 + 1) ~x0~x~0

By the uniqueness of the additive inverse shown earlier, any vector which acts like theadditive inverse must be equal to the additive inverse. It follows that (1) ~x = ~x.

9.1.1. Exercises1. Let X consist of the real valued functions which are defined on an interval [a, b] . Forf, g X, f + g is the name of the function which satisfies (f + g) (x) = f (x) + g (x).For s a real number, (sf ) (x) = s (f (x)). Show this is a vector space.2. Consider functions defined on {1, 2, , n} having values in R. Explain how, if V isthe set of all such functions, V can be considered as Rn .3. Let the vectors be polynomials of degree no more than 3. Show that with the usual definitions of scalar multiplication and addition wherein, for p (x) a polynomial, (ap) (x) =ap (x) and for p, q polynomials (p + q) (x) = p (x) + q (x) , this is a vector space.

9.2 Subspaces

OutcomesA. Determine if a vector is within a given span.B. Utilize the subspace test to determine if a set is a subspace of a given vectorspace.In this section we will examine the concepts of spanning and subspaces introduced earlierin terms of Rn . Here, we will discuss these concepts in terms of abstract vector spaces.Consider the following definition.Definition 9.8: SubsetLet X and Y be two sets. If all elements of X are also elements of Y then we say thatX is a subset of Y and we writeX YIn particular, we often speak of subsets of a vector space, such as X V . By this wemean that every element in the set X is contained in the vector space V .

394

Definition 9.9: Span of Vectors

Let {~v1 , , ~vn } V . Thenspan {~v1 , , ~vn } =

( nXi=1

ci~vi : ci R

When we say that a vector w

~ is in span {~v1 , , ~vn } we mean that w~ can be written asa linear combination of the ~v1 .Consider the following example.Example 9.10: Matrix Span

0 11 0. Determine if A and B are in,B=Let A =1 00 2

0 01 0,span {M1 , M2 } = span0 10 0Solution.First consider A. We want to see if scalars s, t can be found such that A = sM1 + tM2 .

1 01 00 0=s+t0 20 00 1The solution to this equation is given by1 = s2 = tand it follows that A is in span {M1 , M2 }.Now consider B. Again we write B = sM1 + tM2 and see if a solution can be found fors, t.

0 01 00 1+t=s0 10 01 0

Clearly no values of s and t can be found such that this equation holds. Therefore B is notin span {M1 , M2 }.Consider another example.Example 9.11: Polynomial SpanShow that p(x) = 7x2 + 4x 3 is in span {4x2 + x, x2 2x + 3}.

395

Solution. To show that p(x) is in the given span, we need to show that it can be written asa linear combination of polynomials in the span. Suppose scalars a, b existed such that7x2 + 4x 3 = a(4x2 + x) + b(x2 2x + 3)If this linear combination were to hold, the following would be true:4a + b = 7a 2b = 43b = 3You can verify that a = 2, b = 1 satisfies this system of equations. This means that wecan write p(x) as follows:7x2 + 4x 3 = 2(4x2 + x) (x2 2x + 3)Hence p(x) is in the given span.Consider the following example.Example 9.12: Spanning SetLet S = {x2 + 1, x 2, 2x2 x}. Show that S is a spanning set for P2 , the set of allpolynomials of degree at most 2.Solution. Let p(x) = ax2 + bx + c be an arbitrary polynomial in P2 . To show that S isa spanning set, it suffices to show that p(x) can be written as a linear combination of theelements of S. In other words, can we find r, s, t such that:p(x) = ax2 + bx + c = r(x2 + 1) + s(x 2) + t(2x2 x)If a solution r, s, t can be found, then this shows that for any such polynomial p(x), itcan be written as a linear combination of the above polynomials and S is a spanning set.ax2 + bx + c = r(x2 + 1) + s(x 2) + t(2x2 x)= rx2 + r + sx 2s + 2tx2 tx= (r + 2t)x2 + (s t)x + (r 2s)For this to be true, the following must hold:a = r + 2tb = stc = r 2sTo check that a solution exists, set up the augmented matrix and row reduce:

Clearly a solution exists for any choice of a, b, c. Hence S is a spanning set for P2 .Consider now the definition of a subspace.Definition 9.13: SubspaceLet V be a vector space. A subset W V is said to be a subspace of V if a~x +b~y Wwhenever a, b R and ~x, ~y W.The span of a set of vectors as described in Definition 9.9 is an example of a subspace.The following fundamental result says that subspaces are subsets of a vector space which arethemselves vector spaces.Theorem 9.14: Subspaces are Vector SpacesLet W be a nonempty collection of vectors in V , a vector space. Then W is a subspaceif and only if W is itself a vector space having the same operations as those definedon V .Proof. Suppose first that W is a subspace. It is obvious that all the algebraic laws hold onW because it is a subset of V and they hold on V . Thus ~u + ~v = ~v + ~u along with the otheraxioms. Does W contain ~0? Yes because it contains 0~u = ~0. See Theorem 9.7.Are the operations of V defined on W ? That is, when you add vectors of W do you geta vector in W ? When you multiply a vector in W by a scalar, do you get a vector in W ?Yes. This is contained in the definition. Does every vector in W have an additive inverse?Yes by Theorem 9.7 because ~v = (1) ~v which is given to be in W provided ~v W .Next suppose W is a vector space. Then by definition, it is closed with respect to linearcombinations. Hence it is a subspace.When determining spanning sets the following theorem proves useful.Theorem 9.15: Spanning SetLet S V for a vector space V and suppose S = span {~v1 , ~v2 , , ~vn }.Let R V be a subspace such that ~v1 , ~v2 , , ~vn R. Then it follows that S R.In other words, this theorem claims that any subspace that contains a set of vectors mustalso contain the span of these vectors.The following example will show that two spans, described differently, can in fact beequal.Example 9.16: Equal SpanLet p(x), q(x) be polynomials and suppose U = span {2p(x) q(x), p(x) + 3q(x)} andW = span {p(x), q(x)}. Show that U = W .397

Solution. We will use Theorem 9.15 to show that U W and W U. It will then followthat U = W .1. U W

Notice that 2p(x) q(x) and p(x) + 3q(x) are both in W = span {p(x), q(x)}. Thenby Theorem 9.15 W must contain the span of these polynomials and so U W .

Hence p(x), q(x) are in span {2p(x) q(x), p(x) + 3q(x)}. By Theorem 9.15 U mustcontain the span of these polynomials and so W U.To prove that a set is a vector space, one must verify each of the axioms given in Definition9.1. This is a cumbersome task, and therefore a shorter procedure is used to verify a subspace.Procedure 9.17: Subspace TestSuppose W is a subset of a vector space V . The W is a subspace of W if the followingthree conditions hold, using the operations of V :1. The additive identity ~0 of V is contained in W .2. For any vectors w~ 1, w~ 2 in W , w~1 + w~ 2 is also in W .3. For any vector w~ 1 in W and scalar a, the product aw~ 1 is also in W .Therefore it suffices to prove these three steps to show that a set is a subspace.Consider the following example.Example 9.18: Improper SubspacesLetoV be an arbitrary vector space. Then V is a subspace of itself. Similarly, the setn~0 containing only the zero vector is also a subspace.n oSolution. Using the subspace test in Procedure 9.17 we can show that V and ~0 aresubspaces of V .The conditions are clearly satisfied by V , since V satisfies the vector space axioms.Therefore V is a subspace.n oLets consider the set ~0 .398

n o1. The vector ~0 is clearly contained in ~0 , so the first condition is satisfied.

n o2. Let w~ 1, w~ 2 be in ~0 . Then w~ 1 = ~0 and w~ 2 = ~0 and so

w~1 + w~ 2 = ~0 + ~0 = ~0n oIt follows that the sum is contained in ~0 and the second condition is satisfied.

n o3. Let w~ 1 be in ~0 and let a be an arbitrary scalar. Then

aw~ 1 = a~0 = ~0n oHence the product is contained in ~0 and the third condition is satisfied.

n oIt follows that ~0 is a subspace of V .

The two subspaces described above are called

n o improper subspaces. Any subspace ofa vector space V which is not equal to V or ~0 is called a proper subspace.Consider another example.Example 9.19: Subspace of PolynomialsLet P2 be the vector space of polynomials of degree two or less. Let W P3 be allpolynomials of degree two or less which have 1 as a root. Show that W is a subspaceof P3 .Solution. First, express W as follows:

W = p(x) = ax2 + bx + c, a, b, c, R|p(1) = 0

We need to show that W satisfies the three conditions of Procedure 9.17.

0(x) is contained in W .2. Let p(x), q(x) be polynomials in W . It follows that p(1) = 0 and q(1) = 0. Nowconsider p(x) + q(x). Let r(x) represent this sum.r(1) = p(1) + q(1)= 0+0= 0Therefore the sum is also in W and the second condition is satisfied.

399

3. Let p(x) be a polynomial in W and let a be a scalar. It follows that p(1) = 0. Considerthe product ap(x).ap(1) = a(0)= 0Therefore the product is in W and the third condition is satisfied.It follows that W is a subspace of P2 .

9.2.1. Exercises1. Let V be a vector space and suppose {~x1 , , ~xk } is a set of vectors in V . Show that~0 is in span {~x1 , , ~xk } .2. Determine if p(x) = 4x2 x is in the span given by

10. Let W be a subset of P3 given by

11. Let W be a subset of P3 given by

9.3 Linear Independence and Bases

OutcomesA. Determine if a set is linearly independent.B. Extend a linearly independent set and shrink a spanning set to a basis of a givenvector space.In this section, we will again explore concepts introduced earlier in terms of Rn andextend them to apply to abstract vector spaces.Definition 9.20: Linear IndependenceLet V be a vector space. If {~v1 , , ~vn } V, then it is linearly independent ifnXi=1

ai~vi = ~0 implies a1 = = an = 0

where the ai are real numbers.

The set of vectors is called linearly dependent if it is not linearly independent.Example 9.21: Linear IndependenceLet S P2 be a set of polynomials given by

S = x2 + 2x 1, 2x2 x + 3Determine if S is linearly independent.

401

Solution. To determine if this set S is linearly independent, we write

We know that the set S = {~u, ~v, w}

~ is linearly independent, which implies that thecoefficients in the last line of this equation must all equal 0. In other words:12a + c = 02b + 3c = 0a + b = 0The augmented matrix and resulting reduced row-echelon form are given by:

2 0 21 01 0 0 0 0 1 3 0 0 1 0 0 0 0 1 01 1 0 0

Hence the solution is a = b = c = 0 and the set is linearly independent.

Consider the span of a linearly independent set of vectors. Suppose we take a vector whichis not in this span and add it to the set. The following lemma claims that the resulting setis still linearly independent.Lemma 9.23: Adding to a Linearly Independent SetSuppose ~v / span {~u1, , ~uk } and {~u1 , , ~uk } is linearly independent. Then the set{~u1 , , ~uk , ~v}is also linearly independent.PProof. Suppose ki=1 ci ~ui + d~v = ~0. It is required to verify that each ci = 0 and that d = 0.But if d 6= 0, then you can solve for ~v as a linear combination of the vectors, {~u1 , , ~uk },~v =

k Xcii=1

~ui

contrary to the assumption that ~v is not in the span of the ~ui . Therefore, d = 0. But thenPkui = ~0 and the linear independence of {~u1 , , ~uk } implies each ci = 0 also.i=1 ci ~Consider the following example.

403

Example 9.24: Adding to a Linearly Independent Set

Let S M22 be a linearly independent set given by

0 11 0,S=0 00 0Show that the set R M22 given by

0 00 11 0,,R=1 00 00 0is also linearly independent.Solution. Instead of writing a linear combination of the matrices which equals 0 and showingthat the coefficients must equal 0, we can instead use Lemma 9.23.To do so, we show that

0 11 00 0,

/ span0 00 01 0Write

0 01 0

1 0+b= a0 0

0a 0+=00 0

a b=0 0

0 10 0

b0

Clearly there are no possible a, b to make this equation true. Hence the new matrix doesnot lie in the span of the matrices in S. By Lemma 9.23, R is also linearly independent.Recall the definition of basis, considered now in the context of vector spaces.Definition 9.25: BasisLet V be a vector space. Then {~v1 , , ~vn } is called a basis for V if the followingconditions hold.1. span {~v1 , , ~vn } = V2. {~v1 , , ~vn } is linearly independentConsider the following example.

404

Example 9.26: Polynomials of Degree Two

Let P2 be the set polynomials of degree no more than 2.span {x2 , x, 1} . Is {x2 , x, 1} a basis for P2 ?

We can write P2 =

Solution. It can be verified that P2 is a vector space defined under the usual addition andscalar multiplication of polynomials.Now, since P2 = span {x2 , x, 1}, the set {x2 , x, 1} is a basis if it is linearly independent.Suppose then thatax2 + bx + c = 0x2 + 0x + 0where a, b, c are real numbers. It is clear that this can only occur if a = b = c = 0. Hencethe set is linearly independent and forms a basis of P2 .The next theorem is an essential result in linear algebra and is called the exchangetheorem.Theorem 9.27: Exchange TheoremLet {~x1 , , ~xr } be a linearly independent set of vectors such that each ~xi is containedin span{~y1 , , ~ys } . Then r s.Proof. The proof will proceed as follows. First, we set up the necessary steps for the proof.Next, we will assume that r > s and show that this leads to a contradiction, thus requiringthat r s.Define span{~y1 , , ~ys } = V. Since each ~xi is in span{~y1 , , ~ys }, it follows there existscalars c1 , , cs such thatsX~x1 =ci ~yi(9.1)i=1

Note that not all of these scalars ci can equal zero. Suppose that all the ci = 0. Then it~would follow thatPr ~x1 = 0 and so {~x1 , , ~xr } would not be linearly independent. Indeed, if~x1 = ~0, 1~x1 + i=2 0~xi = ~x1 = ~0 and so there would exist a nontrivial linear combination ofthe vectors {~x1 , , ~xr } which equals zero. Therefore at least one ci is nonzero.Say ck 6= 0. Then solve 9.1 for ~yk and obtain

Not all the dj can equal zero because if this were so, it would follow that {~x1 , , ~xr } wouldbe a linearly dependent set because one of the vectors would equal a linear combination ofthe others. Therefore, 9.2 can be solved for one of the ~zi , say ~zk , in terms of ~xl+1 and theother ~zi and just as in the above argument, replace that ~zi with ~xl+1 to obtain

dj ~vj = ~0 if and only if

Now since {~u1 , , ~un } is independent, this happens if and only if

nXj=1

cij dj = 0, i = 1, 2, , m.

However, this is a system of m equations in n variables, d1 , , dn and m < n. Therefore,

there exists a solution to this system of equations in which not all the dj are equal to zero.Recall why this is so. The augmented matrix for the system is of the form C ~0 whereC is a matrix which has more columns than rows. Therefore, there are free variables andhence nonzero solutions to the system of equations. However, this contradicts the linearindependence of {~u1 , , ~um }. Similarly it cannot happen that m > n.Given the result of the previous corollary, the following definition follows.Definition 9.29: DimensionA vector space V is of dimension n if it has a basis consisting of n vectors.Notice that the dimension is well defined by Corollary 9.28. It is assumed here thatn < and therefore such a vector space is said to be finite dimensional.Example 9.30: Dimension of a Vector SpaceLet P2 be the set of all polynomials of degree at most 2. Find the dimension of P2 .Solution. If we can find a basis of P2 then the number of vectors in the basis will give thedimension. Recall from Example 9.26 that a basis of P2 is given by

S = x2 , x, 1There are three polynomials in S and hence the dimension of P2 is three.

It is important to note that a basis for a vector space is not unique. A vector space canhave many bases. Consider the following example.407

Example 9.31: A Different Basis for Polynomials of Degree Two

Let P2 be the polynomials of degree no more than 2. Is {x2 + x + 1, 2x + 1, 3x2 + 1}a basis for P2 ?Solution. Suppose these vectors are linearly independent but do not form a spanning set forP2 . Then by Lemma 9.23, we could find a fourth polynomial in P2 to create a new linearlyindependent set containing four polynomials. However this would imply that we could finda basis of P2 of more than three polynomials. This contradicts the result of Example 9.30in which we determined the dimension of P2 is three. Therefore if these vectors are linearlyindependent they must also form a spanning set and thus a basis for P2 .Suppose then that

a x2 + x + 1 + b (2x + 1) + c 3x2 + 1 = 0(a + 3c) x2 + (a + 2b) x + (a + b + c) = 0We know that {1, x, x2 } is linearly independent, and so it follows thata + 3c = 0a + 2b = 0a+b+c = 0and there is only one solution to this system of equations, a = b = c = 0. Therefore, theseare linearly independent and form a basis for P2 .Consider the following theorem.Theorem 9.32: Every Subspace has a BasisLet V be a nonzero subspace of a finite dimensional vector space W . Suppose W hasdimension n. Then V has a basis with no more than n vectors.Proof. Let ~v1 V where ~v1 6= 0. If span {~v1 } = V, then it follows that {~v1 } is a basis forV . Otherwise, there exists ~v2 V which is not in span {~v1 } . By Lemma 9.23 {~v1 , ~v2 } isa linearly independent set of vectors. Then {~v1 , ~v2 } is a basis for V and we are done. Ifspan {~v1 , ~v2 } =6 V, then there exists ~v3 / span {~v1 , ~v2 } and {~v1 , ~v2 , ~v3 } is a larger linearlyindependent set of vectors. Continuing this way, the process must stop before n + 1 stepsbecause if not, it would be possible to obtain n + 1 linearly independent vectors contrary tothe exchange theorem, Theorem 9.27.The following theorem claims that a spanning set of a vector space V can be shrunk downto a basis of V . Similarly, a linearly independent set within V can be enlarged to create abasis of V .

408

Theorem 9.33: Basis of V

If V = span {~u1 , , ~un } is a vector space, then some subset of {~u1, , ~un } is a basisfor V. Also, if {~u1 , , ~uk } V is linearly independent and the vector space is finitedimensional, then the set {~u1, , ~uk }, can be enlarged to obtain a basis of V.Proof. LetS = {E {~u1, , ~un } such that span {E} = V }.

For E S, let |E| denote the number of elements of E. Let

m = min{|E| such that E S}.

Thus there exist vectorssuch that

{~v1 , , ~vm } {~u1 , , ~un }

span {~v1 , , ~vm } = V

and m is as small as possible for this to happen. If this set is linearly independent, it followsit is a basis for V and the theorem is proved. On the other hand, if the set is not linearlyindependent, then there exist scalars, c1 , , cm such that~0 =

mX

ci~vi

i=1

and not all the ci are equal to zero. Suppose ck 6= 0. Then solve for the vector ~vk in termsof the other vectors. Consequently,V = span {~v1 , , ~vk1 , ~vk+1, , ~vm }contradicting the definition of m. This proves the first part of the theorem.To obtain the second part, begin with {~u1 , , ~uk } and suppose a basis for V is{~v1 , , ~vn }Ifspan {~u1 , , ~uk } = V,

then k = n. If not, there exists a vector

~uk+1 / span {~u1 , , ~uk }Then from Lemma 9.23, {~u1 , , ~uk , ~uk+1} is also linearly independent. Continue addingvectors in this way until n linearly independent vectors have been obtained. Thenspan {~u1 , , ~un } = Vbecause if it did not do so, there would exist ~un+1 as just described and {~u1 , , ~un+1} wouldbe a linearly independent set of vectors having n + 1 elements. This contradicts the fact that409

{~v1 , , ~vn } is a basis. In turn this would contradict Theorem 9.27. Therefore, this list is abasis.Consider the following example.Example 9.34: Shrinking a Spanning SetConsider the set S P2 given by

S = 1, x, x2 , x2 + 1

Show that S spans P2 , then remove vectors from S until it creates a basis.Solution. First we need to show that S spans P2 . Let ax2 + bx + c be an arbitrary polynomialin P2 . Writeax2 + bx + c = r(1) + s(x) + t(x2 ) + u(x2 + 1)Then,ax2 + bx + c = r(1) + s(x) + t(x2 ) + u(x2 + 1)= (t + u)x2 + s(x) + (r + u)It follows thata = t+ub = sc = r+uClearly a solution exists for all a, b, c and so S is a spanning set for P2 . By Theorem 9.33,some subset of S is a basis for P2 .Recall that a basis must be both a spanning set and a linearly independent set. Thereforewe must remove a vector from S keeping this in mind. Suppose we remove x from S. Theresulting set would be {1, x2 , x2 + 1}. This set is clearly linearly dependent (and also doesnot span P2 ) and so is not a basis.Suppose we remove x2 + 1 from S. The resulting set is {1, x, x2 } which is both linearlyindependent and spans P2 . Hence this is a basis for P2 . Note that removing 1, x2 , or x2 + 1will result in a basis.Recall Example 9.24 in which we added a matrix to a linearly independent set to createa larger linearly independent set. By Theorem 9.33 we can extend a linearly independentset to a basis.Example 9.35: Adding to a Linearly Independent SetLet S M22 be a linearly independent set given by

0 11 0,S=0 00 0Enlarge S to a basis of M22 .410

Solution. Recall from the solution of Example 9.24 that the set R M22 given by

0 00 11 0,,R=1 00 00 0is also linearly independent. However this set is still not a basis for M22 as it is not a spanning

0 0is not in spanR. Therefore, this matrix can be added to the setset. In particular,0 1by Lemma 9.23 to obtain a new linearly independent set given by

9.3.1. Exercises1. Determine if the following set is linearly independent. If it is linearly dependent, writeone vector as a linear combination of the other vectors in the set.

x + 1, x2 + 2, x2 x 3

2. Determine if the following set is linearly independent. If it is linearly dependent, write

one vector as a linear combination of the other vectors in the set. 2

x + x, 2x2 4x 6, 2x 2

3. Determine if the following set is linearly independent. If it is linearly dependent, write

one vector as a linear combination of the other vectors in the set.

1 2724 0,,0 12 31 24. Determine if the following set is linearly independent. If it is linearly dependent, writeone vector as a linear combination of the other vectors in the set.

0 01 00 11 0,,,1 11 00 10 1

5. If you have 5 vectors in R5 and the vectors are linearly independent, can it always beconcluded they span R5 ?6. If you have 6 vectors in R5 , is it possible they are linearly independent? Explain.7. Let P3 be the polynomials of degree no more than 3. Determine which of the followingare bases for this vector space.411

linearly independent on an interval [s, t] if

c1c2c3c4

d1d2

d3 d4

9. Let the field of scalars be Q, the rational numbers and let the vectors be of the forma+ b 2 where a, b are rational numbers. Show that this collection of vectors is a vectorspace with field of scalars Q and give a basis for this vector space.10. Suppose V is a finite dimensional vector space. Based on the exchange theorem above,it was shown that any two bases have the same number of vectors in them. Givea different proof of this fact using the earlier material in the book. Hint: Suppose{~x1 , ~ , xn } and {~y1 , ~ , y m } are two bases with m < n. Then define : Rn 7 V, : Rm 7 Vby (~a) =

nXk=1

m Xak ~xk , ~b =bj ~yjj=1

Consider the linear transformation, 1 . Argue it is a one to one and onto mappingfrom Rn to Rm . Now consider a matrix of this linear transformation and its row reducedechelon form.

412

A. Some Prerequisite Topics

The topics presented in this section are important concepts in mathematics and thereforeshould be examined.

A.1 Sets and Set Notation

A set is a collection of things called elements. For example {1, 2, 3, 8} would be a set consisting of the elements 1,2,3, and 8. To indicate that 3 is an element of {1, 2, 3, 8} , it iscustomary to write 3 {1, 2, 3, 8} . We can also indicate when an element is not in a set, bywriting 9 / {1, 2, 3, 8} which says that 9 is not an element of {1, 2, 3, 8} . Sometimes a rulespecifies a set. For example you could specify a set as all integers larger than 2. This wouldbe written as S = {x Z : x > 2} . This notation says: S is the set of all integers, x, suchthat x > 2.Suppose A and B are sets with the property that every element of A is an element of B.Then we say that A is a subset of B. For example, {1, 2, 3, 8} is a subset of {1, 2, 3, 4, 5, 8} .In symbols, we write {1, 2, 3, 8} {1, 2, 3, 4, 5, 8} . It is sometimes said that A is containedin B or even B contains A. The same statement about the two sets may also be writtenas {1, 2, 3, 4, 5, 8} {1, 2, 3, 8}.We can also talk about the union of two sets, which we write as A B. This is theset consisting of everything which is an element of at least one of the sets, A or B. As anexample of the union of two sets, consider {1, 2, 3, 8} {3, 4, 7, 8} = {1, 2, 3, 4, 7, 8}. This setis made up of the numbers which are in at least one of the two sets.In generalA B = {x : x A or x B}

Notice that an element which is in both A and B is also in the union, as well as elementswhich are in only one of A or B.Another important set is the intersection of two sets A and B, written A B. This setconsists of everything which is in both of the sets. Thus {1, 2, 3, 8} {3, 4, 7, 8} = {3, 8}because 3 and 8 are those elements the two sets have in common. In general,A B = {x : x A and x B}

If A and B are two sets, A \ B denotes the set of things which are in A but not in B.ThusA \ B = {x A : x / B}For example, if A = {1, 2, 3, 8} and B = {3, 4, 7, 8}, then A \ B = {1, 2, 3, 8} \ {3, 4, 7, 8} ={1, 2}.413

A special set which is very important in mathematics is the empty set denoted by . Theempty set, , is defined as the set which has no elements in it. It follows that the empty setis a subset of every set. This is true because if it were not so, there would have to exist aset A, such that has something in it which is not in A. However, has nothing in it andso it must be that A.We can also use brackets to denote sets which are intervals of numbers. Let a and b bereal numbers. Then [a, b] = {x R : a x b} [a, b) = {x R : a x < b} (a, b) = {x R : a < x < b} (a, b] = {x R : a < x b} [a, ) = {x R : x a} (, a] = {x R : x a}These sorts of sets of real numbers are called intervals. The two points a and b are calledendpoints, or bounds, of the interval. In particular, a is the lower bound while b is the upperbound of the above intervals, where applicable. Other intervals such as (, b) are definedby analogy to what was just explained. In general, the curved parenthesis, (, indicates theend point is not included in the interval, while the square parenthesis, [, indicates this endpoint is included. The reason that there will always be a curved parenthesis next to or is that these are not real numbers and cannot be included in the interval in the way areal number can.To illustrate the use of this notation relative to intervals consider three examples ofinequalities. Their solutions will be written in the interval notation just described.Example A.1: Solving an InequalitySolve the inequality 2x + 4 x 8.Solution. We need to find x such that 2x + 4 x 8. Solving for x, we see that x 12 isthe answer. This is written in terms of an interval as (, 12].Consider the following example.Example A.2: Solving an InequalitySolve the inequality (x + 1) (2x 3) 0.Solution. We need to find x such that (x + 1) (2x 3) 0. The solution is given by x 1or x 23 . Therefore, x which fit into either of these intervals gives a solution. In terms ofset notation this is denoted by (, 1] [ 32 , ).414

Consider one last example.

Example A.3: Solving an InequalitySolve the inequality x (x + 2) 4.Solution. This inequality is true for any value of x where x is a real number. We can writethe solution as R or (, ) .In the next section, we examine another important mathematical concept.

A.2 Well Ordering and Induction

PWe begin this section with some important notation. Summation notation, written ji=1 i,represents a sum. Here, i is called the index of the sum, and we add iterations until i = j.For example,jXi = 1+2++ji=1

Another example:

a11 + a12 + a13 =

3X

a1i

i=1

The following notation is a specific use of summation notation.

Notation A.4: Summation NotationLet aij be real numbers, and suppose 1 i r while 1 j s. These numbers canbe listed in a rectangular array as given bya11 a12 a1sa21 a22 a2s.........ar1 ar2 arsP PThen sj=1 ri=1 aij means to first sum the numbers in each column (using i as theindex)then to add the sums which result (using j as the index). Similarly,Pr Pandsaj=1 ij means to sum the vectors in each row (using j as the index) and theni=1to add the sums which result (using i as the index).P PP PNotice that since addition is commutative, sj=1 ri=1 aij = ri=1 sj=1 aij .We now consider the main concept of this section. Mathematical induction and wellordering are two extremely important principles in math. They are often used to provesignificant things which would be hard to prove otherwise.415

Definition A.5: Well Ordered

A set is well ordered if every nonempty subset S, contains a smallest element z havingthe property that z x for all x S.In particular, the set of natural numbers defined asN = {1, 2, }is well ordered.Consider the following proposition.Proposition A.6: Well Ordered SetsAny set of integers larger than a given number is well ordered.This proposition claims that if a set has a lower bound which is a real number, then thisset is well ordered.Further, this proposition implies the principle of mathematical induction. The symbol Zdenotes the set of all integers. Note that if a is an integer, then there are no integers betweena and a + 1.Theorem A.7: Mathematical InductionA set S Z, having the property that a S and n + 1 S whenever n S, containsall integers x Z such that x a.Proof. Let T consist of all integers larger than or equal to a which are not in S. The theoremwill be proved if T = . If T 6= then by the well ordering principle, there would haveto exist a smallest element of T, denoted as b. It must be the case that b > a since bydefinition, a / T. Thus b a + 1, and so b 1 a and b 1 / S because if b 1 S, thenb 1 + 1 = b S by the assumed property of S. Therefore, b 1 T which contradicts thechoice of b as the smallest element of T. (b 1 is smaller.) Since a contradiction is obtainedby assuming T 6= , it must be the case that T = and this says that every integer at leastas large as a is also in S.Mathematical induction is a very useful device for proving theorems about the integers.The procedure is as follows.

416

Procedure A.8: Proof by Mathematical Induction

Suppose Sn is a statement which is a function of the number n, for n = 1, 2, , andwe wish to show that Sn is true for all n 1. To do so using mathematical induction,use the following steps.1. Base Case: Show S1 is true.2. Assume Sn is true for some n, which is the induction hypothesis. Then, usingthis assumption, show that Sn+1 is true.Proving these two steps shows that Sn is true for all n = 1, 2, .We can use this procedure to solve the following examples.Example A.9: Proving by InductionProve by induction that

Pn

k=1 k

n (n + 1) (2n + 1).6

Solution. By Procedure A.8, we first need to show that this statement is true for n = 1.When n = 1, the statement says that1X

k2 =

k=1

1 (1 + 1) (2(1) + 1)6

66= 1

The sum on the left hand side also equals 1, so this equation is true for n = 1.Now suppose this formula is valid for some n 1 where n is an integer. Hence, thefollowing equation is true.nXn (n + 1) (2n + 1)(1.1)k2 =6k=1We want to show that this is true for n + 1.Suppose we add (n + 1)2 to both sides of equation 1.1.n+1X

k2 =

k=1

nX

k 2 + (n + 1)2

k=1

n (n + 1) (2n + 1)+ (n + 1)26

The step going from the first to the second line is based on the assumption that the formulais true for n. Now simplify the expression in the second line,n (n + 1) (2n + 1)+ (n + 1)26417

This equals(n + 1)and

n (2n + 1)+ (n + 1)6

n (2n + 1)6 (n + 1) + 2n2 + n(n + 2) (2n + 3)+ (n + 1) ==666

Therefore,n+1Xk=1

k2 =

(n + 1) (n + 2) (2n + 3)(n + 1) ((n + 1) + 1) (2 (n + 1) + 1)=66

showing the formula holds for n + 1 whenever it holds for n. This proves the formula bymathematical induction. In other words, this formula is true for all n = 1, 2, .Consider another example.Example A.10: Proving an Inequality by InductionShow that for all n N,

1 32n 11.

<2 42n2n + 1

Solution. Again we will use the procedure given in Procedure A.8 to prove that this statementis true for all n. Suppose n = 1. Then the statement says11<23which is true.Suppose then that the inequality holds for n. In other words,1 32n 11

<2 42n2n + 1is true.Now multiply both sides of this inequality by

2n+1.2n+2

This yields

2n + 12n + 12n 1 2n + 111 3

<=2 42n2n + 22n + 22n + 1 2n + 2The theorem will be proved if this last expression is less than only if

2n + 3

2

1. This happens if and2n + 3

12n + 1>2n + 3(2n + 2)2

which occurs if and only if (2n + 2)2 > (2n + 3) (2n + 1) and this is clearly true which maybe seen from expanding both sides. This proves the inequality.418

Lets review the process just used. If S is the set of integers at least as large as 1 for whichthe formula holds, the first step was to show 1 S and then that whenever n S, it followsn + 1 S. Therefore, by the principle of mathematical induction, S contains [1, ) Z,all positive integers. In doing an inductive proof of this sort, the set S is normally notmentioned. One just verifies the steps above.