The prior subsection has many descriptions of solution sets. They all fit a pattern. They have a vector that is a particular solution of the system added to an unrestricted combination of some other vectors. The solution set from Example 2.13 illustrates.

The combination is unrestricted in that w{\displaystyle w} and u{\displaystyle u} can be any real numbers— there is no condition like "such that 2w−u=0{\displaystyle 2w-u=0}" that would restrict which pairs w,u{\displaystyle w,u} can be used to form combinations.

That example shows an infinite solution set conforming to the pattern. We can think of the other two kinds of solution sets as also fitting the same pattern. A one-element solution set fits in that it has a particular solution, and the unrestricted combination part is a trivial sum (that is, instead of being a combination of two vectors, as above, or a combination of one vector, it is a combination of no vectors). A zero-element solution set fits the pattern since there is no particular solution, and so the set of sums of that form is empty.

We will show that the examples from the prior subsection are representative, in that the description pattern discussed above holds for every solution set.

Theorem 3.1

For any linear system there are vectors β→1{\displaystyle {\vec {\beta }}_{1}}, ..., β→k{\displaystyle {\vec {\beta }}_{k}} such that the solution set can be described as

where p→{\displaystyle {\vec {p}}} is any particular solution, and where the system has k{\displaystyle k} free variables.

This description has two parts, the particular solution p→{\displaystyle {\vec {p}}} and also the unrestricted linear combination of the β→{\displaystyle {\vec {\beta }}}'s. We shall prove the theorem in two corresponding parts, with two lemmas.

We will focus first on the unrestricted combination part. To do that, we consider systems that have the vector of zeroes as one of the particular solutions, so that p→+c1β→1+⋯+ckβ→k{\displaystyle {\vec {p}}+c_{1}{\vec {\beta }}_{1}+\dots +c_{k}{\vec {\beta }}_{k}} can be shortened to c1β→1+⋯+ckβ→k{\displaystyle c_{1}{\vec {\beta }}_{1}+\dots +c_{k}{\vec {\beta }}_{k}}.

Definition 3.2

A linear equation is homogeneous if it has a constant of zero, that is, if it can be put in the form a1x1+a2x2+⋯+anxn=0{\displaystyle a_{1}x_{1}+a_{2}x_{2}+\,\cdots \,+a_{n}x_{n}=0}.

(These are "homogeneous" because all of the terms involve the same power of their variable— the first power— including a "0x0{\displaystyle 0x_{0}}" that we can imagine is on the right side.)

Obviously the two reductions go in the same way. We can study how linear systems are reduced by instead studying how the associated homogeneous systems are reduced.

Studying the associated homogeneous system has a great advantage over studying the original system. Nonhomogeneous systems can be inconsistent. But a homogeneous system must be consistent since there is always at least one solution, the vector of zeros.

Definition 3.4

A column or row vector of all zeros is a zero vector, denoted 0→{\displaystyle {\vec {0}}}.

There are many different zero vectors, e.g., the one-tall zero vector, the two-tall zero vector, etc. Nonetheless, people often refer to "the" zero vector, expecting that the size of the one being discussed will be clear from the context.

has many vectors besides the zero vector (if we interpret w{\displaystyle w} as a number of molecules then solutions make sense only when w{\displaystyle w} is a nonnegative multiple of 3{\displaystyle 3}).

We now have the terminology to prove the two parts of Theorem 3.1. The first lemma deals with unrestricted combinations.

Lemma 3.7

For any homogeneous linear system there exist vectors β→1{\displaystyle {\vec {\beta }}_{1}}, ..., β→k{\displaystyle {\vec {\beta }}_{k}} such that the solution set of the system is

We next perform back-substitution to express each variable in terms of the free variable z{\displaystyle z}. Working from the bottom up, we get first that w{\displaystyle w} is 0⋅z{\displaystyle 0\cdot z}, next that y{\displaystyle y} is (1/3)⋅z{\displaystyle (1/3)\cdot z}, and then substituting those two into the top equation x+2((1/3)z)−z+2(0)=0{\displaystyle x+2((1/3)z)-z+2(0)=0} gives x=(1/3)⋅z{\displaystyle x=(1/3)\cdot z}. So, back substitution gives a parametrization of the solution set by starting at the bottom equation and using the free variables as the parameters to work row-by-row to the top. The proof below follows this pattern.

Comment: That is, this proof just does a verification of the bookkeeping in back substitution to show that we haven't overlooked any obscure cases where this procedure fails, say, by leading to a division by zero. So this argument, while quite detailed, doesn't give us any new insights. Nevertheless, we have written it out for two reasons. The first reason is that we need the result— the computational procedure that we employ must be verified to work as promised. The second reason is that the row-by-row nature of back substitution leads to a proof that uses the technique of mathematical induction.[1] This is an important, and non-obvious, proof technique that we shall use a number of times in this book. Doing an induction argument here gives us a chance to see one in a setting where the proof material is easy to follow, and so the technique can be studied. Readers who are unfamiliar with induction arguments should be sure to master this one and the ones later in this chapter before going on to the second chapter.

Proof

First use Gauss' method to reduce the homogeneous system to echelon form. We will show that each leading variable can be expressed in terms of free variables. That will finish the argument because then we can use those free variables as the parameters. That is, the β→{\displaystyle {\vec {\beta }}}'s are the vectors of coefficients of the free variables (as in Example 3.6, where the solution is x=(1/3)w{\displaystyle x=(1/3)w}, y=w{\displaystyle y=w}, z=(1/3)w{\displaystyle z=(1/3)w}, and w=w{\displaystyle w=w}).

We will proceed by mathematical induction, which has two steps. The base step of the argument will be to focus on the bottom-most non-"0=0{\displaystyle 0=0}" equation and write its leading variable in terms of the free variables. The inductive step of the argument will be to argue that if we can express the leading variables from the bottom t{\displaystyle t} rows in terms of free variables, then we can express the leading variable of the next row up— the t+1{\displaystyle t+1}-th row up from the bottom— in terms of free variables. With those two steps, the theorem will be proved because by the base step it is true for the bottom equation, and by the inductive step the fact that it is true for the bottom equation shows that it is true for the next one up, and then another application of the inductive step implies it is true for third equation up, etc.

For the base step, consider the bottom-most non-"0=0{\displaystyle 0=0}" equation (the case where all the equations are "0=0{\displaystyle 0=0}" is trivial). We call that the m{\displaystyle m}-th row:

where am,ℓm≠0{\displaystyle a_{m,\ell _{m}}\neq 0}. (The notation here has "ℓ{\displaystyle \ell }" stand for "leading", so am,ℓm{\displaystyle a_{m,\ell _{m}}} means "the coefficient, from the row m{\displaystyle m} of the variable leading row m{\displaystyle m}".) Either there are variables in this equation other than the leading one xℓm{\displaystyle x_{\ell _{m}}} or else there are not. If there are other variables xℓm+1{\displaystyle x_{\ell _{m}+1}}, etc., then they must be free variables because this is the bottom non-"0=0{\displaystyle 0=0}" row. Move them to the right and divide by am,ℓm{\displaystyle a_{m,\ell _{m}}}

to express this leading variable in terms of free variables. If there are no free variables in this equation then xℓm=0{\displaystyle x_{\ell _{m}}=0} (see the "tricky point" noted following this proof).

For the inductive step, we assume that for the m{\displaystyle m}-th equation, and for the (m−1){\displaystyle (m-1)}-th equation, ..., and for the (m−t){\displaystyle (m-t)}-th equation, we can express the leading variable in terms of free variables (where 0≤t<m{\displaystyle 0\leq t<m}). To prove that the same is true for the next equation up, the (m−(t+1)){\displaystyle (m-(t+1))}-th equation, we take each variable that leads in a lower-down equation xℓm,…,xℓm−t{\displaystyle x_{\ell _{m}},\ldots ,x_{\ell _{m-t}}} and substitute its expression in terms of free variables. The result has the form

where am−(t+1),ℓm−(t+1)≠0{\displaystyle a_{m-(t+1),\ell _{m-(t+1)}}\neq 0}. We move the free variables to the right-hand side and divide by am−(t+1),ℓm−(t+1){\displaystyle a_{m-(t+1),\ell _{m-(t+1)}}}, to end with xℓm−(t+1){\displaystyle x_{\ell _{m-(t+1)}}} expressed in terms of free variables.

Because we have shown both the base step and the inductive step, by the principle of mathematical induction the proposition is true.

We say that the set {c1β→1+⋯+ckβ→k|c1,…,ck∈R}{\displaystyle \{c_{1}{\vec {\beta }}_{1}+\cdots +c_{k}{\vec {\beta }}_{k}\,{\big |}\,c_{1},\ldots ,c_{k}\in \mathbb {R} \}} is generated by or spanned by the set of vectors {β→1,…,β→k}{\displaystyle \{{{\vec {\beta }}_{1}},\ldots ,{{\vec {\beta }}_{k}}\}}. There is a tricky point to this definition. If a homogeneous system has a unique solution, the zero vector, then we say the solution set is generated by the empty set of vectors. This fits with the pattern of the other solution sets: in the proof above the solution set is derived by taking the c{\displaystyle c}'s to be the free variables and if there is a unique solution then there are no free variables.

This proof incidentally shows, as discussed after Example 2.4, that solution sets can always be parametrized using the free variables.

We will show mutual set inclusion, that any solution to the system is in the above set and that anything in the set is a solution to the system.[2]

For set inclusion the first way, that if a vector solves the system then it is in the set described above, assume that s→{\displaystyle {\vec {s}}} solves the system. Then s→−p→{\displaystyle {\vec {s}}-{\vec {p}}} solves the associated homogeneous system since for each equation index i{\displaystyle i},

For set inclusion the other way, take a vector of the form p→+h→{\displaystyle {\vec {p}}+{\vec {h}}}, where p→{\displaystyle {\vec {p}}} solves the system and h→{\displaystyle {\vec {h}}} solves the associated homogeneous system, and note that it solves the given system: for any equation index i{\displaystyle i},

As the theorem states, and as discussed at the start of this subsection, in this single-solution case the general solution results from taking the particular solution and adding to it the unique solution of the associated homogeneous system.

Example 3.10

Also discussed there is that the case where the general solution set is empty fits the "General=Particular+Homogeneous{\displaystyle {\text{General}}={\text{Particular}}+{\text{Homogeneous}}}" pattern. This system illustrates. Gauss' method

However, because no particular solution of the original system exists, the general solution set is empty— there are no vectors of the form p→+h→{\displaystyle {\vec {p}}+{\vec {h}}} because there are no p→{\displaystyle {\vec {p}}\,}'s.

Corollary 3.11

Solution sets of linear systems are either empty, have one element, or have infinitely many elements.

Proof

We've seen examples of all three happening so we need only prove that those are the only possibilities.

First, notice a homogeneous system with at least one non-0→{\displaystyle {\vec {0}}} solution v→{\displaystyle {\vec {v}}} has infinitely many solutions because the set of multiples sv→{\displaystyle s{\vec {v}}} is infinite— if s≠1{\displaystyle s\neq 1} then sv→−v→=(s−1)v→{\displaystyle s{\vec {v}}-{\vec {v}}=(s-1){\vec {v}}} is easily seen to be non-0→{\displaystyle {\vec {0}}}, and so sv→≠v→{\displaystyle s{\vec {v}}\neq {\vec {v}}}.

is either empty (if there is no particular solution p→{\displaystyle {\vec {p}}}), or has one element (if there is a p→{\displaystyle {\vec {p}}} and the homogeneous system has the unique solution 0→{\displaystyle {\vec {0}}}), or is infinite (if there is a p→{\displaystyle {\vec {p}}} and the homogeneous system has a non-0→{\displaystyle {\vec {0}}} solution, and thus by the prior paragraph has infinitely many solutions).

This table summarizes the factors affecting the size of a general solution.

number of solutions of theassociated homogeneous system

one

infinitely many

particularsolutionexists?

yes

unique
solution

infinitely many
solutions

no

no
solutions

no
solutions

The factor on the top of the table is the simpler one. When we perform Gauss' method on a linear system, ignoring the constants on the right side and so paying attention only to the coefficients on the left-hand side, we either end with every variable leading some row or else we find that some variable does not lead a row, that is, that some variable is free. (Of course, "ignoring the constants on the right" is formalized by considering the associated homogeneous system. We are simply putting aside for the moment the possibility of a contradictory equation.)

A nice insight into the factor on the top of this table at work comes from considering the case of a system having the same number of equations as variables. This system will have a solution, and the solution will be unique, if and only if it reduces to an echelon form system where every variable leads its row, which will happen if and only if the associated homogeneous system has a unique solution. Thus, the question of uniqueness of solution is especially interesting when the system has the same number of equations as variables.

Definition 3.12

A square matrix is nonsingular if it is the matrix of coefficients of a homogeneous system with a unique solution. It is singular otherwise, that is, if it is the matrix of coefficients of a homogeneous system with infinitely many solutions.

We have made the distinction in the definition because a system (with the same number of equations as variables) behaves in one of two ways, depending on whether its matrix of coefficients is nonsingular or singular. A system where the matrix of coefficients is nonsingular has a unique solution for any constants on the right side: for instance, Gauss' method shows that this system

has the unique solution x=b−2a{\displaystyle x=b-2a} and y=(3a−b)/2{\displaystyle y=(3a-b)/2}. On the other hand, a system where the matrix of coefficients is singular never has a unique solution— it has either no solutions or else has infinitely many, as with these.

Thus, "singular" can be thought of as connoting "troublesome", or at least "not ideal".

The above table has two factors. We have already considered the factor along the top: we can tell which column a given linear system goes in solely by considering the system's left-hand side— the constants on the right-hand side play no role in this factor. The table's other factor, determining whether a particular solution exists, is tougher. Consider these two

with the same left sides but different right sides. Obviously, the first has a solution while the second does not, so here the constants on the right side decide if the system has a solution. We could conjecture that the left side of a linear system determines the number of solutions while the right side determines if solutions exist, but that guess is not correct. Compare these two systems

with the same right sides but different left sides. The first has a solution but the second does not. Thus the constants on the right side of a system don't decide alone whether a solution exists; rather, it depends on some interaction between the left and right sides.

For some intuition about that interaction, consider this system with one of the coefficients left as the parameter c{\displaystyle c}.

If c=2{\displaystyle c=2} this system has no solution because the left-hand side has the third row as a sum of the first two, while the right-hand does not. If c≠2{\displaystyle c\neq 2} this system has a unique solution (try it with c=1{\displaystyle c=1}). For a system to have a solution, if one row of the matrix of coefficients on the left is a linear combination of other rows, then on the right the constant from that row must be the same combination of constants from the same rows.

More intuition about the interaction comes from studying linear combinations. That will be our focus in the second chapter, after we finish the study of Gauss' method itself in the rest of this chapter.

Prove that any linear system with a nonsingular matrix of coefficients has a solution, and that the solution is unique.

Problem 9

To tell the whole truth, there is another tricky point to the proof of Lemma 3.7. What happens if there are no non-"0=0{\displaystyle 0=0}" equations? (There aren't any more tricky points after this one.)

This exercise is recommended for all readers.

Problem 10

Prove that if s→{\displaystyle {\vec {s}}} and t→{\displaystyle {\vec {t}}} satisfy a homogeneous system then so do these vectors.

What's wrong with: "These three show that if a homogeneous system has one solution then it has many solutions— any multiple of a solution is another solution, and any sum of solutions is a solution also— so there are no homogeneous systems with exactly one solution."?

Problem 11

Prove that if a system with only rational coefficients and constants has a solution then it has at least one all-rational solution. Must it have infinitely many?