Browse by

Mathematical Physics Theory 2003

Martin Plenio
Imperial College London
Version May 27, 2003
Oﬃce: Blackett 622
Contents
1 Sets, numbers and the concept of inﬁnity 5
1.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 From counting to inﬁnities . . . . . . . . . . . . . . . . . 13
1.3 Basic logic notation . . . . . . . . . . . . . . . . . . . . . 21
1.4 Sequences, completeness and uncountable sets . . . . . . 22
1.5 Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
1.5.1 Absolute convergence . . . . . . . . . . . . . . . . 46
1.5.2 Methods to enhance the speed of convergence of
series . . . . . . . . . . . . . . . . . . . . . . . . . 50
1.6 Complex numbers . . . . . . . . . . . . . . . . . . . . . . 56
2 Functions of real variables 65
2.1 More about sets . . . . . . . . . . . . . . . . . . . . . . . 66
2.2 The basic deﬁnition of a function . . . . . . . . . . . . . 68
2.3 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . 71
2.3.1 Functions of many variables . . . . . . . . . . . . 75
2.4 Convexity I . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.5 Diﬀerentiation . . . . . . . . . . . . . . . . . . . . . . . . 81
2.5.1 Convexity II and its application to inequalities . . 84
2.5.2 Minimization of convex functions on convex sets. 86
2.5.3 Newton’s method . . . . . . . . . . . . . . . . . . 87
2.6 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.6.1 Riemann integration . . . . . . . . . . . . . . . . 90
2.6.2 The integral comparison criterion . . . . . . . . . 99
2.6.3 Interchanging Limites . . . . . . . . . . . . . . . . 102
1
2 CONTENTS
3 Vectors and Matrices 109
3.1 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.2 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.3 Eigenvalues, eigenvectors, singular values . . . . . . . . . 112
3.4 Functions of matrices . . . . . . . . . . . . . . . . . . . . 113
3.5 Markov processes . . . . . . . . . . . . . . . . . . . . . . 115
4 Entropy, disorder and information 121
4.1 Quantifying classical information . . . . . . . . . . . . . 123
4.2 Elements of the theory of majorization . . . . . . . . . . 125
CONTENTS 3
Introduction
In this course I will provide you with a range of mathematical ideas
and tools. It should be clear from the beginning however, that I do
not only want to present you with recipes that you learn by heart and
then apply. Of course you should simply know certain techniques and
concepts. But, much more important than this I would like to imprint
on you a mathematical way of thinking. This means that you should
be able to think logically and be able to create proofs for hunches that
you may have. These proofs have to be written out in an impeccable
logical sequence that holds up to standards of proof that we have in
theoretical physics. However, proofs in theoretical physics sometimes
do not quite conform to the strict requirements of mathematics in that
certain things may be taken for granted by a physicist that a mathe-
matician would endeavour to prove. These are the situations where the
physicist animatedly waves his or her hands to describe some ﬂuﬀy not
really strict argument. This is always dangerous even for the mathe-
matically competent physicist but even more so for the physicist whose
understanding of mathematics is less well developed. There are many
potential pitfalls and usually one only learns to handwave correctly
once one has made a lot of strict proofs in an area. Only then does one
have the experience and feeling to see the right results without proving
the strictly. Unfortunately, in England it is rare that physics students
actually have to take mathematics lectures and as a consequence there
is the danger that they are never exposed to stringent mathematical
thinking. To make matters worse, you are then often encouraged to
wave your hands. This approach, in my opinion, has a high propensity
for leading to disaster and the most dangerous kind of half-knowledge
which is the one where the owner thinks he knows but actually doesn’t
know
1
.
This course aims to expose you a little bit more to mathematical
thinking. Unfortunately, the time that is available for this course is not
suﬃcient to go into very much depth in the various subjects that I am
1
Less dangerous are persons that do not know something properly but are aware
of this. Even less dangerous are people who know something and are aware of it.
The greatest danger for them is to generalize the idea that they know things to
other situations where they haven’t got a clue.
4 CONTENTS
going to present. However, what I will try to do is to avoid handwaving
or to use it only to make a mathematical proof or deﬁnition more
intuitive. I will not quite achieve the stringency of mathematicians as
this would require too much depth and time but I will suggest literature
that will go all the way and I suggest to you to have a look at some of
these.
The downside of this approach is that it is not easy and may ap-
pear dry in places. But I am sure that it is worth it, as I feel that my
own exposure to many lectures in pure mathematics (taught by math-
ematicians) in my old university (I studied in G¨ottingen) has greatly
beneﬁted my work in theoretical physics.
Apart from an appreciation of the beauty of some parts of mathe-
matics, it has helped me to learn to make strict proofs and to develop
a reasonably reliable red alert light in my head that ﬂashes when I
am waving my hands to vigorously. The instances where it fails I will
inevitably be told oﬀ by colleagues and PhD sudents who have devel-
oped their own alert devices and unveil these mistakes by their critical
questions. This brings me to the last point. Feel free to ask questions
before, after and during the lectures. Don’t worry that some of your
questions may turn out to be not terribly deep. It doesn’t matter. I
do not worry either about making myself a fool in front of my PhD
students by asking ’silly’ questions.
Ok, thats it. I hope that you will enjoy the lectures.
Chapter 1
Sets, numbers and the
concept of inﬁnity
When we describe Nature using mathematics we are generally using
natural, rational, real and complex numbers (and maybe quaternions)
and manipulate them according to general principles. In this chapter
I would like to introduce all these numbers starting from a little bit of
set theory to develop the natural numbers and then go over to rational
numbers etc. In the process I will introduce ideas such as sequences,
convergence etc which are of great use in mathematics and physics.
But not only that, these ideas will also force us to consider such weird
things as inﬁnities and completeness of numbers. So let us start the
journey.
1.1 Sets
In the following I will introduce some very basic concepts of set theory.
I do this for two reasons. Firstly, set theory can be used to form the
basis for mathematics and in particular arithmetic and it provides a
deﬁnition of what the natural numbers are. Secondly, it will allow me
to give you a glimpse of a rather intriguing concept, namely that of
inﬁnity. In this ﬁrst part of the lecture I will not go into all the depths
of detail simply because this would require a full lecture course in itself.
If you would like to know more about set theory and the theoretical
5
6CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
foundation of natural numbers, then I wholeheartedly recommend you
to have a look at the book ’Naive Set Theory’ by Paul R. Halmos which
explains the basic ideas of set theory in a very narrative, yet precise,
style.
In mathematics and therefore also in physics we describe properties
of assemblies of entities. Usually we call these assemblies simply ’sets’
and their constituent parts we call ’elements’. Put it slightly diﬀer-
ently, the mathematician and philosopher Bolzano deﬁnes a set as ’an
embodiment of the idea or concept which we conceive when we regard
the arrangement of its parts as a matter of indiﬀerence’. So, a set is a
collection of things, for example
o
1
= ¦∆, 2, ♦, (¦ (1.1)
I have chosen these geometric objects to make clear that elements of a
set can take any form. They do not need to be numbers at all, in fact,
they may even be sets themselves. Note that I have not yet deﬁned
any properties of sets, so its not strictly clear that the above assembly
is actually something, that the mathematicians call a set and indeed,
we will soon realize that not every collection of elements is admissible
as a set
1
. There are some shorthand notation for saying that ’♦ is an
element of the set o
1
’ namely
♦ ∈ o
1
. (1.2)
If ’♦ is not an element of the set o
1
’ we write
♦ / ∈ o
1
. (1.3)
The ﬁrst property of sets that one should deﬁne, is that of equality
of two sets
Deﬁnition 1 Two sets / and B are equal, / = B, if they contain the
same elements.
This is of course a natural deﬁnition, but it should nevertheless be
made. Of course, you would like to have a rule that allows you to build
new sets from a given set. We will need to deﬁne which sort of rules
are acceptable. You may think that this is trivial, but later on I will
show you that in fact it isn’t.
1
Rest assured however that indeed o
1
is a set even according to mathematicians
1.1. SETS 7
Deﬁnition 2 (Axiom of selection) For any given set / and any con-
dition S(x) there is a set B whose elements are exactly those x which
are from / and for whom S(x) is true.
Example: Take / = ¦1, 2, 3, . . .¦ to be the set of integers. Fur-
thermore, take the condition S(x) as the condition x is even. Then we
obtain the new set B as
B = ¦x ∈ /[S(x)¦ = ¦x ∈ /[x is even¦ = ¦2, 4, 6, 8, . . .¦ (1.4)
From this deﬁnition and the assumption that there exists at least
one set we can now build new sets. Indeed, the simplest set that you
can think of is the empty set which is the set that has no elements in
it. This is usually denoted as Ø. Now let us assume that at least one
set / exists. Then we can deﬁne the empty set a bit more formally by
writing
Ø = ¦x ∈ /[x = x¦ (1.5)
Obviously no element can be in the so deﬁned set, so it is empty. Note
that the set that contains the empty set, ie ¦Ø¦, obviously contains an
element, namely the empty itself and therefore we have Ø = ¦Ø¦.
So far this all looks a little bit like Kindergartenmaths. Well, let us
look again at the deﬁnition of the empty set eq (1.5). In this deﬁnition
I took great care to assume the existence of the set /. Surely, any old
set will do here. Indeed, you could think that I do not need to assume
the existence of this set /. Surely, x can be anything you like and there
should be no restriction on it. Yes? Well, no. Let us see what happens
when we make no restriction at all on x, or in other words, we assume
that there is a set that contains everything.
If you don’t spot the problem with this immediately, you are in
good company. Indeed, in the beginnings of set theory mathematicians
have made exactly this assumption without worrying too much. Unfor-
tunately, we will now see that in fact, it would lead to catastrophical
structural failure of the theory, ie it would be contradictory. This does
not show up in the deﬁnition of the empty set but in the following
example. Let us deﬁne the set
T = ¦x[x / ∈ x¦ (1.6)
8CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
Note that now I allow myself to chose any x that I like and test whether
the condition x / ∈ x is satisﬁed or not. This may look a bit weird in the
ﬁrst instance as we are treating the elements of the set as sets them-
selves. But that’s not unnatural. After all you could imagine the set
of all those sets that contain exactly one element. But the real trouble
starts when you try to decide the question whether T ∈ T or T / ∈ T .
One of the two assumption should be correct. So, lets check them then.
If T ∈ T then T has to satisfy the deﬁning condition for being a mem-
ber of T , namely T / ∈ T , but that’s a contradiction. Equally, when we
assume that T / ∈ T then T satisﬁes the condition for being in T and
should therefore be a member of T . Another contradiction. So what
has happened. Obviously something has gone wrong, but what? Well,
it turns out that we made a mistake that all mathematicians had done
in the early years of the development of set theory. Namely, we took
something for granted which is not quite so natural. In this case it was
the assumption that there is a set that contains everything. Such a set
does not exist, meaning that it leads to logical contradictions. So, what
we should write is the following
T = ¦x ∈ /[x / ∈ x¦ (1.7)
for some set /. Now we can run the same argument as above. If we
try to assume that T ∈ T then we obtain a contradiction. So, as a
consequence T / ∈ T and we realize that for every set there is something
that is not included in it. In other words, there is no set that
contains everything, ie the ’set of everything’ does not exist. It took
people many years to realize this and indeed, it was Bertrand Russell
in 1894, and independently Zermelo, who found the above example and
communicated it to Frege who was just ﬁnishing oﬀ a book in which
he used set theory as the logical basis for mathematics. Large part of
his work were invalidated by this paradox and it took mathematicians
quite a while to put things right (see Halmos for details).
This ﬁrst taste of the non-triviality of set theory teaches you two
things. Firstly, don’t take things for granted in mathematics without
checking and of course it should also convince you that set theory is
not quite so trivial. Now let us continue to deﬁne properties of sets.
Firstly let us deﬁne the notion of subsets.
1.1. SETS 9
Deﬁnition 3 A set / is called subset of a set B, written / ⊂ B if for
all x ∈ / we have that x ∈ B.
Clearly you would like to combine diﬀerent sets with each other. Of
course this can be done and the idea is essentially based on the axiom
of selection.
There are a range of operations that you can do and I present three
of the basic and most important operations.
1. Union of sets: The union of two sets / and B is also a set which
is written /∪ B. It is deﬁned as
/∪ B = ¦x[x ∈ /or x ∈ B) (1.8)
Figure 1.1: The Venn diagram that illustrates the union of two sets /
and B.
2. Intersection of two sets: The intersection of two sets / and B is
also a set written /∩ B. It is deﬁned as
/∩ B = ¦x[x ∈ /and x ∈ B)¦ (1.9)
3. Diﬀerence between sets: The diﬀerence between two sets / and
B is also a set written /` B. It is deﬁned as
/` B = ¦x[x ∈ / and x / ∈ B)¦ (1.10)
10CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
Figure 1.2: The Venn diagram that illustrates the intersection of two
sets / and B.
Figure 1.3: The Venn diagram that illustrates the subtraction of two
sets / and B.
Note that in all these deﬁnitions I make a statement of the form ’...
the union of two sets is also a set ...’ which is a statement of existence.
Again this is necessary as it helps us to deﬁne which sets are admissible
and which ones aren’t. After all, we do not know the properties of sets,
but we have to deﬁne them in a way that appears natural. The above
deﬁnitions appear natural but nevertheless they are deﬁnitions and not
fundamental truths. Before you dismiss this as a triviality you should
always remember the fact that a little bit earlier we discovered that the
set that contains everything does not exist.
To get you familiarized with the operations of forming the ’union,
intersection and subtraction’ of sets let me give you a range of examples.
1. /∪ Ø = /
Proof: We have to check that x ∈ /∪ Ø ⇔ x ∈ /. This can be
seen via
1.1. SETS 11
(i) x ∈ /∪ Ø ⇒ x ∈ /or x ∈ Ø ⇒ x ∈ /
(ii) x ∈ / ⇒ x ∈ /or x ∈ Ø ⇒ x ∈ /∪ Ø
So we satisfy both criteria for equality of two sets and we are
ﬁnished.
2. /∩ Ø = Ø
3. /∪ B = B ∪ /
4. /∩ B = B ∩ /
5. For any pair of sets we have / ⊂ /∪ B
6. If / ⊂ B and B ⊂ / then / = B.
7. /∪ (B ∩ /) = /
Proof: We have to check that x ∈ /∪ (B ∩ /) ⇔ x ∈ /. This
follows from
(i) x ∈ / ∪ (B ∩ /) ⇔ x ∈ /or (x ∈ /and x ∈ B). Clearly,
this implies either that x ∈ / or that we have (x ∈ /and x ∈ B)
which again implies x ∈ /. Therefore x ∈ /∪(B ∩/) ⇒ x ∈ /.
As a consequence we have /∪ (B ∩ /) ⊂ /
(ii) On the other hand we have for any set A that / ⊂ / ∪ A.
In particular if we set A = (B ∩ /) we have / ⊂ /∪ (B ∩ /).
8. /∩ (B ∪ /) = /
9. (/∪ B) ` ( = (/` () ∪ (B ` ()
10. (/∩ B) ` ( = (/` () ∩ (B ` ()
11. /∪ (B ∪ () = (/∪ B) ∪ (
12. /∪ (B ∩ () = (/∪ B) ∩ (/∪ ()
13. /∩ (B ∪ () = (/∩ B) ∪ (/∩ ()
14. /` (B ∪ () = (/` B) ∩ (/` ()
15. /` (B ∩ () = (/` B) ∪ (/` ()
16. /` (B ` () = (/` B) ∪ (/∩ ()
12CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
As an additional exercise, try to visualize the proofs by drawing Venn
diagrams that show the equality as well.
Now, as promised I introduce the natural numbers from the concept
of sets. Again, we have to ask ourselves what are the basic properties of
natural numbers and we would like to use them to make a construction
that we then call ’natural numbers’. Well, the main property of natural
numbers is that we use them for counting, ie for every number there is
a successor and, of course, there is a smallest (ﬁrst) natural number.
We will deﬁne the natural numbers starting from only a single type of
sets, namely the empty set and we build the remaining numbers via
the union of sets. We start by deﬁning
0 = Ø. (1.11)
Furthermore, we say that the successor x
+
of a number is deﬁned as
x
+
= x ∪ ¦x¦ . (1.12)
For example
1 ≡ 0
+
= 0 ∪ ¦0¦ = ¦0¦ = ¦Ø¦ (1.13)
2 ≡ 1
+
= 1 ∪ ¦1¦ = ¦0, 1¦ = ¦Ø, ¦Ø¦¦ (1.14)
3 ≡ 2
+
= 2 ∪ ¦2¦ = ¦0, 1, 2¦ (1.15)
.
.
.
The set of all the so deﬁned numbers is then given by
ω = ¦0, 1, 2, . . .¦ . (1.16)
This is in the smallest set that contains with every element also its
successor. The size of this set is the natural deﬁnition for inﬁnity as it
clearly does not contain ﬁnitely many elements. Note however, that we
actually have to deﬁne that this is an admissible set. After all so far we
have really only dealt with sets that have ﬁnitely many elements and
what tells you that you are not running into problems or contradictions
when you consider sets with inﬁnitely elements. The axiom that the set
ω is admissible is sometimes called the axiom of inﬁnity in set theory,
for obvious reasons.
1.2. FROM COUNTING TO INFINITIES 13
1.2 From counting to inﬁnities
In the previous section I have introduced the natural numbers by for-
malizing the idea of counting and even got a ﬁrst idea of how to deﬁne
an inﬁnite number. Now let us assume that the set of natural numbers
is given and let us call it
N = ¦1, 2, 3, 4, . . .¦ (1.17)
Note that from now on I exclude the 0 from the set of natural numbers.
I do this because in the following I will talk about counting and very
few of you will count ”zero, one, two, ...” but rather ”one, two, three,
...”. In a moment we will see that this does not change the concept of
inﬁnity, ie that I will show you in a moment that inﬁnity and inﬁnity
+ 1 are just the same.
2
.
In order to study these questions carefully, we need to be able to
compare the size of sets and in particular to compare the size of sets
to the size of ω also commonly known as counting. How does this
go? Let us consider a set / whose size we wish to compare with N.
We count by picking the ﬁrst element of N and associate with it an
element of the set /, namely a
1
. Then we pick the next element of
N, namely 1, and associate with it one of the remaining elements of
/, ie an element of / ` ¦a
1
¦
3
. What we are doing here is to make
a one-to-one correspondence between the two sets in such a way that
no element of the two sets are left out. Mathematicians say that we
construct a bijective map between the two sets. What is this? Here is
the formal deﬁnition
Deﬁnition 4 A map f that maps a set / to a set B, ie to every element
x ∈ / it associates another element f(x) ∈ B, is called bijective if
2
Natural numbers are the most basic concept of numbers and is often regarded as
the most pure mathematical concept, or to quote Leopold Kronecker (the one from
the Kronecker symbol and many other things) ”Die ganzen Zahlen hat der liebe
Gott gemacht, alles andere ist Menschenwerk” (”The natural numbers have been
created by god, all the rest are human constructions”) implying the imperfection of
any concept diﬀerent from natural numbers
3
Note that this prescription somehow requires that the two sets are ordered,
indeed this is essentially the requirement, that each subset of the set contains a
’smallest’ element. As a consequence, one can order the elements according to their
size.
14CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
(i) the map f is injective, ie for any pair x, y ∈ / with x = y we have
that f(x) = f(y)
and
(ii) the map f is surjective, ie for any pair z ∈ B there is an x ∈ /
such that f(x) = z.
Now we can give the formal deﬁnition that two sets have the same size,
namely
Deﬁnition 5 Two sets / and B are said to have the same size, / ∼ B,
if there is a bijective map f between them.
We say that / is at most as large as B, / _ B, is there is an injective
map f from / to B.
We say that / is at least as large as B, B _ /, is there is a surjective
map f from / to B.
The last two statement can also be formulated in a slightly diﬀerent
way. If a set / has the same size as a proper subset of B, ie a set that
does not contain all the elements of B, then we say it is of smaller or
equal extent
4
and vice versa and write / _ B.
Now let us get a better grip on the concept of inﬁnity by ﬁrst saying
that the size of the set N is inﬁnite. We will denote the size of the set of
natural numbers by ℵ
0
(this letter is Greek and is pronounced aleph) .
Now we can start to consider other sets and compare their size to that
of N, ie we will count these other sets. This approach will then allow
us to ﬁnd out some basic properties of inﬁnities such as ℵ
0
and it will
furthermore help us later on to illuminate the question whether there
are diﬀerent degrees of inﬁnity. Below I write a few sets.
o
1
= ¦n ∈ N[n = n¦ (1.18)
o
2
= ¦n ∈ N[ ∃natural number m such that 120 = mn¦ (1.19)
o
3
= ¦n ∈ N[ ∃natural number m such that m
2
= n¦. (1.20)
Clearly, the ﬁrst set is actually the empty set Ø and has 0 elements.
The second set o
2
= ¦1, 2, 3, 4, 5, 6, 8, 10, 12, 15, 20, 24, 30, 40, 60, 120¦
has 16 elements which is meant to say that it has the same number of
4
Note that we do not conclude that the set / is truly smaller than B. Indeed in
a moment we will see that this is not necessarily true.
1.2. FROM COUNTING TO INFINITIES 15
elements as the set ¦1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16¦. Both
sets are ﬁnite. We count them by picking the ﬁrst natural number 1
and associate with it one element of the set. Then we take number 2
and associate with it another element of the set. We continue with this
until we have completed the numbering. For the set o
2
for example
this means
o
2
N
1 1
2 2
3 3
4 3
5 5
6 6
8 7
10 8
12 9
15 10
20 11
24 12
30 13
40 14
60 15
120 16
The procedure that we have presented here is an example for counting
as I explained at the end of the last section.
Note that I did not make any requirement concerning the size of
the two sets. They can be ﬁnite or inﬁnite. An example for an inﬁnite
set is o
3
. Let us see how its size compares to the one of N. To this end
we try to establish a one-to-one correspondence between the elements
with the following table
16CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
o
3
N
1 1
4 2
9 3
16 4
25 5
36 6
49 7
64 8
.
.
.
.
.
.
Here we see that to every element of o
3
we can associate one element of
N in a one-to-one manner without leaving out a single element of each
set. More formally we deﬁne the map f from N to o
3
via f(n) = n
2
.
Clearly this map is injective on N because for m = n we have m
2
= n
2
.
So N _ o
3
. We could now proceed to show that the map is also
surjective, to show that the two sets have the same size, but here I
proceed slightly diﬀerently. I rather show that we can deﬁne a map
g from o
3
to N via g(x) = +
√
x where x ∈ o
3
. Again, it is clear
straightaway, that this map is injective, so that we have o
3
_ N. Given
that we have N _ o
3
and o
3
_ N we have that the two sets have the
same size, ie o
3
∼ N.
5
Any set for which we can ﬁnd such an association between the el-
ements of the set and those of the natural numbers we call countable.
Sometimes we will see that we do not need all the elements of N or
more precisely we need only a ﬁnite number of them. That is when
we say that the set is ﬁnitely countable. The sets o
1
and o
2
are of
that type. The set o
3
is diﬀerent however. Here we really need all
the elements of N. For such a set we say that it is countably inﬁnite
and its size is denoted by ℵ
0
. The way I counted the elements of o
3
is by writing a table but it is more convincing to write an analytical
connection between the natural numbers and the elements of o
3
. If I
call the elements of o
3
that I wish to correspond to the natural number
5
Note that in a more stringent formulation of set theory than the one I presented
here, this line of argument actually needs to be proven and is called the theorem
of Schr¨oder and Bernstein. Namely, two sets / and B have the same size exactly if
/ _ B and B _ /.
1.2. FROM COUNTING TO INFINITIES 17
n by a
n
, then I ﬁnd a
n
= n
2
. Now it becomes clear that there is a
one-to-one correspondence between the elements of o
3
and N.
Quite obviously, ﬁnitely countable sets are rather boring. Therefore,
let us explore a few more examples of inﬁnite sets to learn some of the
computational rules for inﬁnities. In fact, let us consider the set N
and let us add a single element to it, namely 0. The new set is then
N
0
= N ∪ ¦0¦. Now the natural expectation is that the second set N
0
has more elements than N. Certainly, it does not have less elements,
ie N _ N
0
. But does it really have more elements? To decide this
question, let us count them. When there is a one-to-one correspondence
between N
0
and N then they have the same number of elements. Let us
assume the elements of the two sets are nicely ordered and a
n
∈ N and
b
n
∈ N
0
. Clearly, a
1
= 1 and b
1
= 0. In general we have a
n
= b
n
+ 1
so that indeed we have a one-to-one correspondence and the sets have
the same (inﬁnite) number of elements. In an equation this statement
reads
1 +ℵ
0
= ℵ
0
(1.21)
As another example let me introduce the integers which is the set
Z = ¦. . . , −3, −2, −1, 0, 1, 2, 3, . . .¦ (1.22)
Can you ﬁnd a way to enumerate the integers, ie can you show that
the size of this set is again just as large as the natural numbers? The
solution in tabulated form is
o
3
N
0 1
1 2
-1 3
2 4
-2 5
3 6
-3 7
4 8
-4 9
5 10
-5 11
.
.
.
.
.
.
18CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
from which it is self-evident that the two sets are equal in size. Drawing
the table is a ﬁrst step to a proper proof which amounts to writing the
rule for associating natural numbers one-to-one with integers. Formally
we ﬁnd
a
1
= 1 a
2n
= n a
2n+1
= −n (1.23)
ie we have deﬁned a sequence that runs through all the integers
6
. So we
have a one-to-one mapping between the set of integers and the natural
numbers.
As we doubled the set and added one more element (the 0) this
corresponds to the equation
2 ℵ
0
+ 1 = ℵ
0
(1.24)
Now, let us make the set of natural numbers smaller by leaving out
every second element, i.e. let us consider the set of even numbers
o
even
= ¦n[∃m ∈ Nsuch that n = 2m¦
7
. (1.25)
Quite clearly o
even
⊂ N so you would say that it has less elements than
the natural numbers. So let us count the set o
even
.
o
3
N
2 1
4 2
6 3
8 4
10 5
12 6
14 7
16 8
18 9
20 10
22 11
.
.
.
.
.
.
6
I could have written this in a more compact form as a
m
= (−1)
m

m
2
| where
the special brackets x| mean that one choses the largest integer smaller than x.
For example 1.5| = 1 and −1.2| = −2.
7
The symbol ∃ means ’there exists’
1.2. FROM COUNTING TO INFINITIES 19
So we see that there is a one-to-one correspondence for each of the
elements and as a consequence both sets have the same number of
elements which you should turn into an arithmetic expression, namely
n → a
n
= 2n. This can be summed up in the equation
ℵ
0
2
= ℵ
0
(1.26)
In general multiplying or dividing ℵ
0
by a ﬁnite natural number and
adding a ﬁnite amount gives as a result ℵ
0
again.
Now we know that multiplying ℵ
0
by a ﬁnite number, we get ℵ
0
again, so no change. A less trivial question is that of the result of
ℵ
0
ℵ
0
=??? (1.27)
To make sense of this equation we ﬁrst have to make sense of what it
means to multiply by ℵ
0
. If we multiply by 2 we add for every existing
element of the set another one. To see a systematic way of doing this
let us consider sets of the form
T
1
= ¦(1, 1), (2, 1), (3, 1), (4, 1), . . .¦ (1.28)
where the objects in the round brackets such as (3, 1) form a single
element of the set and the set is continued following the obvious rule.
Obviously this is an inﬁnite set and it has the same size as the natural
numbers, ie N ∼ T
1
. Now let us double the set. One way to do this is
T
2
= ¦(1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (4, 1), (4, 2), . . .¦ (1.29)
This set has the size 2 ℵ
0
. Now let us write down the set which has
the size ℵ
2
0
. This is quite intuitively
T
∞
=

, (1.30)
where I have ordered the elements in a quadratic scheme to make it
more intuitive. Of course, this does not aﬀect the set itself. So, the
big question is now whether this set is still the same size as the natural
numbers. The answer is yes, because you can still enumerate the set
T
∞
. You do this in the following way
20CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
T
∞
N
(1,1) 1
(2,1) 2
(1,2) 3
(3,1) 4
(2,2) 5
(1,3) 6
(4,1) 7
(3,2) 8
(2,3) 9
(4,1) 10
.
.
.
.
.
.
ie you ﬁrst enumerate the elements (k, l) where k + l = 2, then you
enumerate the elements where k +l = 3 and so on. This works because
for k+l = r there are r −1 elements, ie a ﬁnite number. More precisely
we make the association
(k, l) ↔
1
2
(k + l −2)(k + l −1) + l (1.31)
Exercise: Prove that the analytical form of this correspondence and
convince yourself that this is indeed a one-to-one correspondence.
As a consequence we have
ℵ
0
ℵ
0
= ℵ
0
(1.32)
Now the set T
∞
that I introduced here looks like a big array of
elements (k, l) and this is surely one way of looking at it. But there is
another way. Let me make the correspondence
(k, l) ↔
k
l
(1.33)
where the righthand-side is a rational number. So we see that the set
O
+
=

(1.34)
1.3. BASIC LOGIC NOTATION 21
and now you see that this is also the set of the positive rational num-
bers. Note that you may say that this is not exactly the set of rational
numbers because for example we have
4
2
=
2
1
and as a consequence there
may be a case for identifying the two fractions
4
2
and
2
1
. This would de-
ﬁne a set that is slightly diﬀerent, namely the set of all fractions where
numerator and denominator have no common factor. This is of course
possible, but it does not tell us something new because clearly that set
is smaller than O
+
for which we have already veriﬁed that it has the
same size as N. As both sets are inﬁnite then they are both countable.
Now you could start to ask the question whether there is a diﬀerent
degree of inifnity, ie is there a number that is larger than ℵ
0
? This
question is equivalent to the questions whether there is a set that cannot
be enumerated. This is not a straightforward question at all, and in
fact 150 years ago it was believed that all inﬁnities are equal. It took
a brilliant mathematician, Georg Cantor, to realize that inﬁnities are
really more complicated than one would have thought at ﬁrst. Indeed,
there are sets that are truly bigger than the natural numbers and we
will get to know them soon.
1.3 Basic logic notation
Many mathematical expression will be written using logical symbols so
it is useful to see early on. They have the advantage that they abbre-
viate statements and (after you got used to them) make them easier to
comprehend than statements that are written in words. Firstly, con-
sider the following
• ∀x means ’ for all x’
• ∃x means ’there exists an x’
• s.t means ’such that’
So for example the statement
’For all > 0 there is an n
0
such that for all n > n
0
we have [a
n
−a[ >
22CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
then reads
∀ > 0 : ∃ n
0
s.t. ∀n > n
0
: [a
n
−a[ > .
There is also the logical negation which takes a logical statement
and negates it, so if statement x is true then x is false and vice versa.
Some examples will clarify this notation:
• (x > 0) = x ≤ 0
• (∀x : f(x) > 0) = ∃x : f(x) ≤ 0
• (∃x : f(x) < 0) = ∀x : f(x) ≥ 0
• (∀ > 0 : ∃n
0
s.t. ∀n > n
0
: [a
n
−a[ > ) = ∃ > 0s.t.∀n
0
: ∃n >
n
0
s.t[a
n
−a[ <
Further logical operations are
• x ∧ y means ’x and y’
• x ∨ y means ’x or y’
1.4 Sequences, completeness and uncount-
able sets
In the previous section I have introduced a ﬁrst idea of the concept of
functions and I have given a speciﬁc example, namely that of a function
from the natural number N into the rational numbers via the association
n → a
n
. Functions that map the natural numbers into another set of
quantities, such as the rational numbers, are also called sequences. In
fact, the concept of a sequence is older than that of functions and
has a life and signiﬁcance of its own. In the following I would like
to study sequences for two reasons. Firstly, because they show up
in physics all the time. In fact, usually your theory, be it classical
mechanics, thermodynamics or quantum mechanics, will predict a value
for a measurable quantity to, at least in principle, arbitrary precision.
In your experiment however you will always have some measurement
uncertainty, ie you will get some value together with some error bars
around it. For example your experiment may tell you that the value is
1.4. SEQUENCES, COMPLETENESS ANDUNCOUNTABLE SETS23
1.45±0.05. So you make the best guess that the true value is m
1
= 1.45.
Then you improve your apparatus and you are able to measure more
precisely and you ﬁnd 1.415 ± 0.05. Now you write down m
2
= 1.415.
This continues with every improvement of your apparatus and you get
an arbitrarily long sequence of numbers m
1
, m
2
, m
3
, . . . which ideally
should converge to some speciﬁc value m. These intuitive ideas are
made precise in this section. Apart from their practical motivation
sequences also have a more abstract importance. In a natural way they
force upon us the idea of real numbers, and can indeed be used for their
deﬁnition.
Note that for the moment, when I am speaking about ’a number’
I mean to say ’a rational number’ ! Remember that so far we do not
know any other type of numbers. Nevertheless, I will formulate give
the deﬁnitions, lemmas and propositions as far as possible without the
mentioning the restriction to rational numbers (unless of course they
only apply to rational numbers).
Deﬁnition 6 A sequence, denoted by ¦a
n
¦
n=1,...
, associates with every
natural number n another number a
n
.
Question: Would you gain anything new if you would pick the
indices n in the sequence from the rational numbers rather than the
natural numbers?
The deﬁnition implies automatically that a sequence contains in-
ﬁnitely many elements. Nevertheless also ﬁnite sequences of numbers
ﬁt into this deﬁnition when we assume that all the undeﬁned elements
actually take the value zero. Again, ﬁnite sequences are rather easy
to deal with, but also rather boring. Inﬁnite sequences, or simply se-
quences as I will call them from now on, however, have inﬁnitely many
non-zero elements and have non-trivial properties.
A particularly important place is taken by sequences whose values
become more and more limited the larger the index n becomes. An
example is
¦a
n
=
1
n
¦
n=1,2,...
(1.35)
The larger n becomes the smaller the value of a
n
, ie a
n
approaches 0
arbitrarily closely. Such a sequence we would call convergent to 0. Note
24CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
however that, although the elements a
n
come closer and closer to zero
they are never equal to zero in this case. So zero is not an element of the
sequence but it is arbitrarily well approximated by the elements of the
sequence. In fact, this property that one can use sequences of numbers
to approximate some other numbers is one of the most important uses
of sequences. Many problems in physics have solutions that cannot
be expressed in neat and tidy analytical expressions, but one can only
provide more and more precise approximations to them. These ideas
are captured in the following formal deﬁnition
Deﬁnition 7 A sequence of numbers, denoted by ¦a
n
¦
n=1,...
, is called
convergent when there is a number a such that for every > 0 there is
an n
0
such that for all n > n
0
we have [a
n
− a[ ≤ . The number a is
also often written as a = lim
n→∞
a
n
. A sequence that does not converge
is called a divergent sequence.
Examples (expand some of them):
(a) a
n
=
1
n
is convergent to 0
(b) a
n
=
n
n+1
is convergent to 1
(c) and loads more examples for them to practice
(d) What do you say to a
n
= n? Is it convergent or not? What could
be the limit of this sequence?
Lemma 8 The convergence properties of a sequence remain unaﬀected
when we change a ﬁnite number of its elements.
Proof: Given the sequence ¦a
n
¦
n=1,...
and another sequence ¦a

n
¦
n=1,...
that diﬀers from it only in ﬁnitely many places. This implies that there
is an ˜ n such that for all n > ˜ n we have that a
n
= a

n
. Whenever we
have a statement
’∀ > 0 there is an n
0
such that for all n > n
0
we have’
where n
0
< ˜ n we can replace it by
’∀ > 0 there is an n
0
such that for all n > ˜ n’
This conﬁrms that convergence is unaﬀected by changing ﬁnitely many
elements. This concludes the proof.
1.4. SEQUENCES, COMPLETENESS ANDUNCOUNTABLE SETS25
Let us deﬁne the convergence of sequences in a slightly diﬀerent
way. This concept is called Cauchy-sequence.
Deﬁnition 9 A sequence of numbers, denoted by ¦a
n
¦
n=1,...
, is called
a Cauchy sequence when for every > 0 there is an n
0
such that for
all m, n > n
0
we have [a
n
−a
m
[ < .
This looks very close to the concept of convergence, so now let us
prove the following
Theorem 10 A convergent sequence of numbers is a Cauchy sequence.
Proof: As the sequence ¦a
n
¦
n=1,...
is convergent, then there is an a
such that for every

2
= . (1.36)
This is just the deﬁnition of a Cauchy sequence and therefore the proof
is ﬁnished.
Now you would also expect, that any Cauchy-sequence is conver-
gent. This is a very reasonable conjecture, but nevertheless I would
like to investigate it a little bit because I will show you that it is almost
correct, but not quite. Luckily it fails for an interesting reason. Indeed,
the failure of this idea will lead us to the concept of real numbers and
the idea of completeness. Let me consider the following sequence which
is deﬁned as: Let x
n
be the largest rational number such that 10
n
x
n
is
a natural number and x
2
n
≤ 2. The ﬁrst few elements of the sequence
are
x
0
= 1, x
1
= 1.4, x
2
= 1.41, x
3
= 1.414, x
4
= 1.4142, . . . (1.37)
Exercise: Check that these ﬁrst for elements are correct. We could
have deﬁned the sequence slightly diﬀerently, namely: Be x
n
the largest
rational number with at most n digits after the decimal point such that
x
2
n
≤ 2. Now its clear that the sequence increases monotonically, ie
x
n+1
≥ x
n
for all n. This sequence is also a Cauchy sequence because
for all n > m ≥ n
0
we have
[x
n
−x
m
[ ≤ 10
−m
(1.38)
26CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
Therefore, for every > 0 we chose n
0
to be the smallest integer that
is larger than log
10
(1/) = −log
10
. Then for all m, n ≥ n
0
we have
[x
n
−x
m
[ ≤ 10
−m
≤ (1.39)
so that the sequence x
n
is indeed a Cauchy sequence. Clearly, by the
construction of the sequence it approximates closer and closer the num-
ber
√
2. So, you would say that this sequence converges to
√
2. But
stop! So far we only know rational number. So, we better ask ourselves
whether
√
2 is actually a rational numbers. Lets us check by trying
k
l
=
√
2 (1.40)
where we have assumed that k and l have no common factor. If they
would have, then we could simply cancel this factor. Ok, now square
both sides to ﬁnd
k
2
l
2
= 2 (1.41)
Now multiply by l
2
and we have
k
2
= 2l
2
(1.42)
So, clearly the right hand side can be divided by 2. So the left hand
side must be divisible by 2, ie even, as well. This implies that k is
even, simply because the product of two odd number is odd again. So,
k = 2q and we ﬁnd
4q
2
= 2l
2
⇒ 2q
2
= l
2
(1.43)
and now we conclude that l must be even as well. But then both k and
l are even and therefore they have a common factor in contradiction
to the original assumption that they do not have a common factor.
Therefore
√
2 is not rational!
As a consequence, while you would like to say that our sequence
x
n
converges to
√
2 as its limiting element, we now realize that
√
2 is
not a (rational) number. The phenomenon that we have encountered
here is that to be able to say that every Cauchy sequence converges
we need another requirement, namely that the limiting element is an
admissible number. We now realize that the rational numbers alone
1.4. SEQUENCES, COMPLETENESS ANDUNCOUNTABLE SETS27
are not suﬃcient for that! The fact that there are Cauchy sequences
of rational numbers that do not converge against a rational number is
called the incompleteness of the rational numbers. Indeed, this tells
us that there is something else than simply rational numbers. We
have to extend the set of rational numbers by the limiting elements of
all Cauchy-sequences of rational numbers. The resulting set is what
we call the real numbers. Does this coincide with our everyday idea
of real numbers as having a potentially inﬁnite sequence of decimal
digits? Yes, indeed. Because every such number can be approximated
arbitrarily well by a sequence where x
n
is the number that coincides
with our real number in the ﬁrst n decimal digits. Then, the larger n
becomes, the closer x
n
comes to the real number x.
Deﬁnition 11 We deﬁne the set of real numbers as the set of limiting
points of Cauchy sequences of rational numbers.
As a consequence we have the following
Lemma 12 Every Cauchy sequence of real numbers converges to a real
number.
Real numbers that are not rational are called irrational and a simple
example that I have shown to you is
√
2. The above deﬁnition shows
that sequences of rational numbers are suﬃcient to approximate every
real number a fact that can also be formulated in the form: The rational
number lie dense in the real numbers.
Deﬁnition 13 A set of numbers / lies dense in another set B when
for every point x ∈ B and every > 0 there is a y ∈ / such that
[x −y[ ≤ .
Now we have seen the mathematical signiﬁcance of the real num-
bers. They are there to make sure that every Cauchy sequence con-
verges. But now you may ask whether we can conﬁrm experimentally
that some physical quantity takes the value of a real number. Well,
the answer is that we simply cannot, because every experiment will
have a ﬁnite precision. So, as arbitrarily close to an irrational number
there is a rational number we cannot really decide whether the physical
quantity is irrational or rational. The only thing that we can do when
28CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
we increase our experimental precision is to rule out more and more
rational numbers.
Before we discuss more practically useful properties and rules for
sequences, let me brieﬂy come back to the concept of inﬁnity. The
simple question is: How many real numbers are there? Are the irra-
tional numbers a rather rare occurrence or are they typical or even the
majority. To answer questions of this type we have to count the irra-
tional numbers as well as the real numbers. I want to show you that
one cannot count real numbers, ie one cannot put them into one-to-one
correspondence with natural numbers. The proof of this statement will
be by contradiction. Let me assume that I can count the real numbers.
That would imply that I can make a list which may look like this
1 ↔ x
1
= 0.123452524752572 . . .
2 ↔ x
2
= 0.245209542095424 . . .
3 ↔ x
3
= 0.765987652752756 . . .
4 ↔ x
4
= 0.298752845295874 . . .
.
.
. ↔
.
.
.
Now I construct a real number that is deﬁnitively not on the list. This
number is x = 0.8542 . . . and is constructed by taking the ﬁrst decimal
digit of x
1
and subtracting it from 9, the second digit of x
2
and sub-
tracting it from 9 and so on. Clearly this number is diﬀerent to any
other number on the list as the n-th digit of x
n
is diﬀerent from the
n-th digit of x. Therefore we have constructed a contradiction to the
assumption that we can make a list of all the real numbers and the size
of the set of real numbers, which we denote from now on by ℵ
1
is truly
bigger than that of the natural numbers, ie we have ℵ
1
> ℵ
0
.
Therefore the size of the set of real numbers is truly larger than
that of the natural numbers. In fact, one can show that in some well-
deﬁned sense the size of the set of real numbers is 2
ℵ
0
which equals
ℵ
1
. For more details please consult the book by PR Halmos. You
could of course wonder whether there is a degree of inﬁnity which lies
in between the two number ℵ
0
and ℵ
1
= 2
ℵ
0
. This question is called
the continuum hypothesis. The answer to this question is extremely
tricky and rather surprising. Cantor, who introduced set theory and
1.4. SEQUENCES, COMPLETENESS ANDUNCOUNTABLE SETS29
studied the concept of inﬁnite tried to decide this question for many
years but he was unsuccessful. It took about 80 years before the full
solution to this question was found by results of Kurt G¨odel and Paul
Cohen. The proven answer is: From the standard axioms of set theory
the question cannot be decided in principle. In other words: Both
the assumption that there is no number between ℵ
0
and ℵ
1
= 2
ℵ
0
and the assumption that there is a degree of inﬁnity between ℵ
0
and
ℵ
1
= 2
ℵ
0
is fully compatible with the other axioms of set theory. This
was not the answer that people had expected but it is not entirely
unusual in mathematics. A similar status is held by the parallel axiom
in Euclidean geometry. mathematicians tried to prove it in vain for
nearly 2000 years from the other axioms of Euclidean geometry. It was
only in the 18
th
century that it was realized that this axiom is simply
independent of the the others and therefore cannot be proven either.
In fact, there are non-Euclidean geometries which take an important
place in physics as they form the basis of Einsteins theory of general
relativity. So they are real and not just mathematicians fun. However,
some mathematicians are not happy with this situation and there are
still thoughts spent on how to ﬁnd another intuitive axiom for set theory
that would allow to decide the question. After all, there does not seem
to be any physical system on which we can simply make an experiment
to see whether there is a diﬀerent degree of inﬁnity while this was
the case for geometry. But what would happen if we ﬁnd such an
extra axiom that allows us to decide this question. Will it mean that
every other theorem can be proven or disproven with this stronger set
of axioms? That would be nice, but unfortunately it is known that
as long as there are no contradictions in the set of axioms you will
always be able to ﬁnd further statements that are independent of those
axioms, ie they can neither be proven or disproven within these axioms.
Does this have any bearing on physics? Well, you know that there are
always people that tell you that the ﬁnal theory of all physics is around
the corner (Hawking is an example). But now you have to start to
worry, because if in any system of axioms for mathematics there are
statements that can neither be proven or disproven, how can there be
a ﬁnal theory of physics? After all physics is based on mathematics
and as a consequence for every theory that we formulate there should
be statements that we can neither prove nor disprove mathematically.
30CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
Therefore we would have to test them experimentally to see which one
is true and then extend our theory to cover for it. This appears to
be a process which goes ad inﬁnitum. So, it seems that we physicists
will never be out of a job. I hope that these things make you think a
little bit. Clearly if you want to make scientiﬁc statements about this
question you will have to be much more rigorous than I was here. In
any case, so far I haven’t seen that any of the ’last theory of physics’
guys has actually presented a satisfactory answer to this.
For the classwork: 1) Show that the Cantor dust is uncountable
but not dense in the set of real numbers. (3-yadic expansion without
1’s)
2) The set of all inﬁnite sequences made up of digits 0,1, ..., 9 is
uncountable as it is the same as the set of real numbers.
From now on, let us move on to practical aspects of sequences.
Now that we oﬃcially know the real numbers, when I say number then
I mean real number unless otherwise stated.
I want to give you a few basic properties of sequences that are
useful to decide whether a sequence is converging or not. Obviously,
there are many sequences that are very weird and wild. The digits
of π for example show absolutely no pattern whatsoever, and in fact
pass every random number test that has been used on them. But of
course they are not random, as each one of them can be determined
analytically. But the digits of π have one property. They have values
0, 1, 2, 3, 4, 5, 6, 7, 8, 9 and therefore they are bounded. This gives rise
to the idea of a bounded sequence
Deﬁnition 14 A sequence of numbers, denoted by ¦a
n
¦
n=1,...
, is called
a bounded from above (below) if there is a number r
above
(r
below
) such
that for all n we have a
n
< r
above
(a
n
> r
below
). Sequence is called
bounded if it is bounded both from below and above.
Exercises: (a) If ¦a
n
¦
n=1,...
is bounded from below then ¦−a
n
¦
n=1,...
is bounded from above.
(b) If ¦a
n
¦
n=1,...
is bounded then ¦a
2
n
¦
n=1,...
is also bounded.
(c) Given a sequence ¦a
n
¦
n=1,...
, when is the sequence ¦1/a
n
¦
n=1,...
bounded.
1.4. SEQUENCES, COMPLETENESS ANDUNCOUNTABLE SETS31
Why is the concept of boundedness useful? The reason is that it is
quite closely connected to convergence. Firstly we have
Lemma 15 Every convergent sequence is bounded.
Proof: Convergent means that there is an a such that for example for
= 1 there is an n
0
such that for all n > n
0
we have [a
n
−a[ ≤ 1. This
means that for n > n
0
a + 1 ≥ a
n
≥ a − 1. Taking into account the
remaining elements of the sequence a
k
for k = 1, . . . , n
0
we ﬁnd that
max¦a
1
, . . . , a
n
0
, a + 1¦ ≥ a
n
≥ min¦a
1
, . . . , a
n
0
, a −1¦. (1.44)
Conversely however, not every bounded sequence is convergent. A
simple example is the sequence with elements a
n
= (−1)
n
which is
clearly bounded, but does not converge. But from this sequence we
see that if we pick every second element and form the new sequence
b
n
= a
2n
then we have a converging (in fact constant) sequence. Is
this a general phenomenon that from a bounded sequence I can take a
subsequence that is convergent? The answer is yes. Let us ﬁrst deﬁne
properly what a subsequence is
Deﬁnition 16 Be (a
n
)
n∈N
a sequence and
n
0
< n
1
< n
2
< . . .
a strictly increasing sequence of natural numbers, then the sequence
(b
n
)
n∈N
with b
i
= a
n
i
is called a subsequence of (a
n
)
n∈N
.
Examples: (i) Consider the sequence deﬁned by a
n
= n of natural
numbers. A subsequence would be b
n
= a
2n
= 2n, ie the sequence of
even numbers.
Now we can formulate
Theorem 17 A bounded sequence always possesses a convergent sub-
sequence.
Proof: As the sequence f
n
is bounded it lies in an interval [−a, a] for
some value a. Now we will deﬁne two sequences ¦a
i
¦ and ¦b
i
¦. We start
both with value a
1
= −a and b
1
= a, ie the lower and upper boundary
32CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
of our interval. Now we split the interval [a
1
, b
1
] in two equal halves.
As the sequence has inﬁnitely many elements, at least one of the two
halves of the original interval must contain inﬁnitely many elements. If
this is the interval [a
1
,
a
1
+b
1
2
] then we deﬁne a
2
= a
1
and b
2
=
a
1
+b
1
2
. If
it is the interval [
a
1
+b
1
2
, b
1
] then we deﬁne a
2
=
a
1
+b
1
2
and b
2
= b
1
. We
then cut the new interval [a
2
, b
2
] into two halves and so on. Continue
this procedure to construct the sequences (a
n
)
n∈N
and (b
n
)
n∈N
. Each of
the intervals [a
k
, b
k
] contains inﬁnitely many points and we chose from
each of them a single value c
k
which equals an element of the sequence
(f
n
)
n∈N
. The resulting sequence (c
n
)
n∈N
is a Cauchy sequence. This
is so because for all n > n
0
we have that c
n
∈ [a
n
0
, b
n
0
] and therefore
for n, m > n
0
we have [c
n
− c
m
[ ≤ b
n
0
− a
n
0
= 2a2
−n
0
. So clearly for
any > 0 I can ﬁnd an n
0
such that one has for all n, m > n
0
that
[c
n
− c
m
[ < . Therefore the sequence (c
n
)
n∈N
it is convergent. This
concludes the proof.
The limit of a subsequence (b
n
)
n∈N
of (a
n
)
n∈N
is called an accumu-
lation point of the sequence (a
n
)
n∈N
.
Now something really useful if you want to avoid ﬁddling with

s
in the proof of the convergence of a sequence.
Lemma 18 A monotonically increasing (decreasing) sequence (a
n
)
n∈N
that is bounded from above (below) converges.
Proof: I just deal with the increasing sequence which is automatically
bounded from below and by assumption also from above. Therefore the
sequence is bounded and possesses a subsequence (b
n
)
n∈N
that converges
to a value a. The elements of this subsequence are given by b
k
= a
n
k
with an increasing sequence n
k
. Now we show that also the sequence as
a whole converges against a. Because the subsequence (b
n
)
n∈N
converges
we have that for every there is a k
0
such that for all k > k
0
we have
[b
k
− a[ ≤ . As the sequence (a
n
)
n∈N
is increasing we then have that
for all n > n
k
we have [a
n
−a[ ≤ [b
k
−a[ ≤ . This concludes the proof.
To see how much easier life has become with this theorem let us
consider some of the previous examples.
Examples:
1.4. SEQUENCES, COMPLETENESS ANDUNCOUNTABLE SETS33
(a) a
n
=
1
n
is convergent because it is bounded from below and is
monotonically decreasing
(b) a
n
=
n
n+1
is convergent because it is monotonically increasing and
bounded from above by 1
(c) What do you say to a
n
= n does not converge. Although it
is monotonically increasing it is not bounded from above and
as a consequence it cannot converge. Remember: A converging
sequence is bounded.
I am always going on about bounded sequences. Does this mean
that unbounded sequences are useless? Well, not really. In fact, there
are quite a few natural phenomena that lead to unbounded sequences.
Example: In the middle ages mathematics in Europe was not in a
very good state. However, here and there were some smart people who
nevertheless did work on mathematics. One of them was Leonardo da
Pisa who is widely known as Fibonacci (which means blockhead and
given that his head had a perfectly natural shape probably refers to his
character). He traveled the far east and brought a lot of mathemati-
cal ideas back to Europe and he also studied mathematical problems
himself. One particular problem he considered carries his name and
pops up in surprisingly many aspects of mathematics, physics and even
biology. In 1202 Fibonacci considered the following problem: Assume
we have pairs of Rabbits. After they were born, they will take one
season to grow up, and from then on, every season they will produce
a new pair of rabbits. If we neglect that they are dying, how many
rabbits will there be after n seasons? We start with one pair born in
the ﬁrst season, ie F
1
= 1. They have to grow up, so no children yet,
ie. F
2
= 1. In the next season they have kids, so there is one more
pair, ie F
3
= 2. The new pair won’t have children yet but the old one
continues to reproduce, ie. F
4
= 3. In fact, you can now see that the
sequence is built in general as
F
n+2
= F
n+1
+ F
n
(1.45)
ie the population of generation n + 1 increases by the population of
generation n as they start to have children now. Quite clearly this
34CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
sequence is unbounded but nevertheless it is interesting to know how
it develops. To get an idea, let us consider the slightly simpler case
G
n+1
= G
n
+ G
n
= 2G
n
. Clearly G
n
= 2
n
G
0
so the sequence grows
exponentially. So let us make an Ansatz in the Fibonacci sequence,
namely try F
n
= q
n
and insert into the deﬁnition to ﬁnd the quadratic
equation
q
2
= q + 1 (1.46)
which has solution
q
1
=
1
2

1 +
√
5

and q
2
=
1
2

1 −
√
5

. (1.47)
So the general solution of the Fibonacci sequence is then a linear com-
bination of the basic solutions, ie
F
n
= α
1
q
n
1
+ α
2
q
n
2
, (1.48)
and we determine the constants α
1
and α
2
from the ﬁrst few elements
of the Fibonacci sequence such as F
1
= 1 and F
2
= 1 to ﬁnd
α
1
= −α
2
=
1
√
5
. (1.49)
Inserting all this we ﬁnd
F
n
=
q
n
1
−(−q
1
)
−n
√
5
. (1.50)
Check this for a few terms to convince yourself that it is indeed correct.
It is remarkable how often Fibonacci numbers are encountered in
nature. For example they govern the numbers of leaves, petals and seed
grains of many plants.
Now an example for an important special generalization of sequences
of Fibonacci numbers
Example: You all know fractions such as
2
3
which are really a
very simple concept. However, there is a slightly trickier concept that
is closely related to both sequences a fractions. This is the idea of
continued fractions which look like
b
0
+
1
b
1
+
1
b
2
+
1
b
3
+...
(1.51)
1.4. SEQUENCES, COMPLETENESS ANDUNCOUNTABLE SETS35
and are usually abbreviated by [b
0
; b
1
, b
2
, b
3
, . . .]. Usually, such a frac-
tion has inﬁnitely many terms and you calculate it by building the
sequence of partial fractions that break oﬀ after the n-th term, ie
A
k
B
k
= [b
0
; b
1
, . . . , b
k
] (1.52)
where the sequences of numerators obey
A
k
= b
k
A
k−1
+ A
k−2
(1.53)
with A
0
= b
0
, A
−1
= 1 and A
−2
= 0 and the sequence of denominators
obeys
B
k
= b
k
B
k−1
+ B
k−2
(1.54)
with B
0
= 1 and B
−1
= 0. (Comment: That these are the sequences
requires proof, and this will be done either in the lecture or the class-
works.)
A ﬁrst application of continued fractions is that they allow to ﬁnd
good approximations to given fractions. For example
964
437
= [2; 4, 1, 5, 1, 12] (1.55)
If I now want to approximate this fraction by one with a smaller de-
nominator then I can for example break oﬀ the continued fraction after
the second term. This will give [2; 4, 1] =
11
5
which is correct to within
3 parts in 10
3
. Not bad, isn’t it?
Research project:
8
Try to ﬁnd a method by by which one can
obtain the continued fraction expansion of any rational number.
Another example is the following slight variation of continued frac-
tions
tan z =
¸
z
1−
z
2
3−
z
2
5−
. . .

:=
z
1 −
z
2
3−
z
2
5
...
. (1.56)
8
I call this research project, because you really need to explore the problem or
search the literature. And of course I will not give you the answer for quite a while.
Thats just like real research where nobody knows the answer until you ﬁnd it.
36CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
Now take the second order approximation, we ﬁnd
tan z
∼
= z
15 −z
2
15 −6z
2
(1.57)
which is really surprisingly good. In fact, for z =
π
4
this is about 0.9998
instead of the exact value 1. In stark contrast to that, the power series
expansion with three terms tan z = z +
z
3
3
+
z
5
5
makes an error that
is 32 times larger. The advantage of the continued fraction is that it
uses power series both in numerator and denominator and can therefore
reproduce functional behaviour much better.
As a curiosity let me tell you that the continued fraction [1; 1, 1, . . .],
which is about the simplest continued fraction you can think of, has
a particular importance in many areas including art. Its value is the
”golden ratio” g =
1+
√
5
2
. In art it is recognized that for example rect-
angles where the long and the short side have lengths that have a ratio
equal to g, are perceived as particularly nice and well balanced. In-
deed, in paintings of great master you can sometimes realize that they
have used this ratio (probably subconsciously). For an example see the
delightful book ”Number theory in science and communcation” by MR
Schroeder published by Springer.
Classwork: Given an electrical network made up of resistances in
the way shown in the ﬁgure.
Figure 1.4: This is a caption.
Now let us state some of the laws by which to manipulate convergent
1.4. SEQUENCES, COMPLETENESS ANDUNCOUNTABLE SETS37
sequences
Theorem 19 Given the convergent sequences (a
n
)
n∈N
and (b
n
)
n∈N
with
limits a and b respectively as well as real numbers λ and µ.
1. lim
n→∞
(a
n
+ b
n
) = lim
n→∞
a
n
+ lim
n→∞
b
n
2. lim
n→∞
(a
n
b
n
) = lim
n→∞
a
n
lim
n→∞
b
n
3. lim
n→∞
(a
n
/b
n
) = lim
n→∞
a
n
/ lim
n→∞
b
n
4. lim
n→∞
λa
n
= λa
5. If a
n
≤ b
n
for all n then lim
n→∞
a
n
≤ lim
n→∞
b
n
but note that
a
n
< b
n
for all n does not imply lim
n→∞
a
n
< lim
n→∞
b
n
Proofs: Exercise!
These rules apply for converging sequences, but we can also make
a few rules for sequences (a
n
)
n∈N
that diverge to inﬁnity. For example
lim
n→∞
a
n
= ∞ ⇒ lim
n→∞
λa
n
= ∞ for λ = 0 which represents the
rule ∞ = λ∞. A tricky point is the case when λ = 0 ie what happens
to expressions such as 0 ∞? Can one make sense of them? Well, yes
one can. Given another sequence (b
n
)
n∈N
which converges to 0 then we
can consider what happens to lim
n→∞
a
n
b
n
. The answer is however not
unique. It very much depends on the individual sequences (a
n
)
n∈N
and
(b
n
)
n∈N
. This can be seen with some simple examples:
• a
n
= n and b
n
=
1
n
then we have
lim
n→∞
a
n
b
n
= 1
• a
n
= n and b
n
=
1
n
2
then we have
lim
n→∞
a
n
b
n
= 0
• a
n
= n and b
n
=
1
√
n
then we have
lim
n→∞
a
n
b
n
= ∞
38CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
So you can see that anything can happen and that great care is needed
when one is dealing with sequences that do not converge. Soon we will
see even more examples for the counterintuitive features of inﬁnite se-
quences.
As I have said already, sequences are useful for approximating num-
bers such as square roots by rationals. Of course there are methods
that converge rapidly and require a lot of eﬀort to compute the suc-
cessive elements of the sequence and others that work very eﬃciently.
Previously I had constructed a sequence for you that converges against
√
2. But the construction was rather clumsy from a numerical point of
view (it was however very useful as one could immediately see that it is
a Cauchy sequence). Now I will show you a very eﬃcient way of com-
puting roots of numbers, ie we compute a
1/k
for any k ∈ N eﬃciently.
First I will show you how to compute
√
a and then, as an exercise,
you will show how to do this for a
1/k
with any k ∈ N. Let us deﬁne the
following sequence
x
0
> 0 (1.58)
x
n+1
=
1
2
(x
n
+
a
x
n
). (1.59)
For a = 2 and x
0
= 1 the ﬁrst few elements of this sequence have the
form
x
0
= 1, x
1
=
3
2
, x
2
=
17
12
, x
3
=
577
408
, x
4
=
665857
470832
, . . . (1.60)
Just by checking the values you can see that they come very close to
√
2. Indeed,
x
0
√
2
≈ 0.70710678118654752440084436210485 (1.61)
x
1
√
2
≈ 1.0606601717798212866012665431573 (1.62)
x
2
√
2
≈ 1.0017346066809423262345295129819 (1.63)
x
3
√
2
≈ 1.0000015018250929450472725415061 (1.64)
x
4
√
2
≈ 1.0000000000011277376112350571288 . (1.65)
1.4. SEQUENCES, COMPLETENESS ANDUNCOUNTABLE SETS39
As you can see after only 4 steps the relative diﬀerence between x
4
and
√
2 is roughly 10
−12
which is rather good. Even better, it seems that in
every step we roughly double the number of correct digits. This looks
like a rather useful scheme! So, I would like to prove to you that indeed
the method does converge to
√
2 analytically and I will also show you
that the number of valid digits is roughly doubled in every step of the
iteration.
Proof: First we prove convergence. The proof proceeds in a number
of small steps
1. For all k we have x
k
> 0 which one can see by induction as x
k
> 0
implies x
k+1
=
1
2
(x
k
+
a
x
k
). As x
0
> 0 all subsequent x
k
> 0.
2. For all k ≥ 1 we have x
2
k
> a. To see this consider
x
2
k
−a =
1
4

.
The unique positive solution of this equation for x is indeed x =
√
a.
This concludes the proof.
A hard exercise for the daring: Prove that for any k > 1 the
sequence
x
0
> 0 (1.66)
x
n+1
=
1
2
(x
n
+
a
x
k−1
n
) (1.67)
converges to
lim
n→∞
x
n
= a
1
k
(1.68)
Hints: The tricky point here is that the sequence is alternating and
not simply monotonically increasing. You need to show that if x
1
<
a
1/k
the subsequences x
2k
is monotonically decreasing while the subse-
quence x
2k+1
is monotonically increasing. Then you show that they are
bounded so that each of the converges, but as also the whole sequence
converges you have total convergence. For k = 3 it helps to realize that
for x < a
1/3
one has (x −a
1/3
)
3
+ a
1/3
(x −a
1/3
)(x −2a
1/3
) < 0
Classwork: Investigate the behaviour of the sequence
x
0
< 0 (1.69)
x
n+1
=
1
2
(x
n
+
a
x
n
) (1.70)
1.5. SERIES 41
for a > 0 and show that it converges to lim
n→∞
x
n
= −
√
a.
Exercise: Investigate the behaviour of the sequence
x
0
> 0 (1.71)
x
n+1
=
1
2
(x
n
−
a
x
n
) (1.72)
for a > 0 What do you observe? We will come back to this sequence
when we study complex numbers again.
1.5 Series
An important special case of sequences is that of inﬁnite sums which
are also called series. Often in physics you will be confronted with sys-
tems which we approximate as inﬁnite. Examples are condensed matter
physics, thermodynamics and others. If one then wishes to determine
quantities such as the energy of such a system one will then have to
add inﬁnitely many terms. As usual one will do this by adding ﬁnitely
many and then taking the limit towards larger and larger number of
terms in the sum. In that way one arrives at a limit which is then the
value of the inﬁnite sum. Let us make this deﬁnition precise
Deﬁnition 20 Given a sequence (a
n
)
n=1,...
. The sequence of partial
sums
s
n
=
n
¸
k=1
a
k
(1.73)
is called series and denoted by
¸
∞
k=1
a
k
. In the case of convergence the
limit will also be denoted by s =
¸
∞
k=1
a
k
.
As a series is a special case of a sequence the deﬁnitions for conver-
gence, Cauchy sequence and boundedness carry over in a trivial fashion.
For example
Deﬁnition 21 A series
¸
∞
k=1
a
k
is a Cauchy series when ∀ > 0 :
∃n
0
s.t. ∀n > m with m, n > n
0
we have
[
n
¸
i=m
a
i
[ ≤ . (1.74)
42CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
Deﬁnition 22 A series
¸
∞
i=1
a
i
is called convergent if ∀ > 0 : ∃n
0
s.t. ∀n >
n
0
[
∞
¸
i=n
a
i
[ ≤ . (1.75)
Examples: Consider the sum
∞
¸
i=1
1
i
.
This sum diverges as can be seen by grouping the terms conveniently.
∞
¸
i=1
1
i
= 1 +

s. So, let us formulate
some useful lemma’s.
Lemma 23 If the series
¸
∞
i=1
a
i
converges, then the sequence (a
n
)
n=1,...
converges to zero.
44CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
Proof: As the series converges it is a Cauchy sequence. Then for every
> 0 there is an n
0
such that for n > m > n
0
we have
[
n
¸
i=m
a
i
[ ≤ . (1.81)
In particular we then have for m = n > n
0
that
[a
n
[ = [
n
¸
i=n
a
i
[ ≤ (1.82)
which implies that the sequence (a
n
)
n=1,...
converges to zero. This ﬁn-
ishes the proof.
This lemma can be used in the following way. Check whether the
sequence of (a
n
)
n=1,...
converges. If it does not converge, then this
implies that the corresponding series
¸
∞
i=1
a
i
does not converge either.
Lemma 24 The convergence of the series
¸
∞
i=1
[a
i
[ implies the con-
vergence of the sequence
¸
∞
i=1
a
i
.
Proof: By the triangular inequality we have
≥
n
¸
i=m
[a
i
[ ≥ [
n
¸
i=m
a
i
[
so if
¸
∞
i=1
[a
i
[ is a Cauchy series, so is
¸
∞
i=1
a
i
and both converge. This
ﬁnishes the proof.
More generally we can say
Lemma 25 (Comparison criterion) Let
¸
∞
n=1
c
n
be a convergent series
where all the c
n
are positive. Any other series
¸
∞
n=1
a
n
for which we
have [a
n
[ ≤ c
n
converges absolutely.
Proof: Because
¸
∞
n=1
c
n
converges, it is a Cauchy sequence. Then for
any > 0 there is an n
0
such that for all m, n > n
0
we have
n
¸
k=m
[a
k
[ ≤
n
¸
k=m
c
k
= [
n
¸
k=m
c
k
[ ≤ (1.83)
1.5. SERIES 45
Therefore also
¸
∞
k=1
[a
k
[ is a Cauchy sequence. This completes the
proof.
Thus we have a reasonably eﬃcient way to identify some series that
diverge. But we would really like to have some positive statements as
well. Here is one of these
Theorem 26 (Ratio test of D’Alembert) For a series
¸
∞
i=1
a
i
we form
the sequence q
i
= [a
i+1
/a
i
[ and determine its limit
lim
n→∞
q
n
= q.
The series
¸
∞
i=1
a
i
converges if q < 1, it diverges if q > 1 and for q = 1
it may either converge or diverge.
Proof: Study ﬁrst the case q < 1. Then there is an > 0 and an n
0
such that for all n > n
0
we have q < 1−2 and [q
n
−q[ < . Then by the
triangular inequality we have [q
n
[ = [q
n
−q +q[ ≤ [q
n
−q[ +[q[ ≤ 1 −.
As the convergence of the series is not inﬂuenced by changing a ﬁnite
number of terms (though its value is changed) we can now assume that
[q
n
[ ≤ 1− for all n. This implies that [a
n+1
[ ≤ (1−)[a
n
[ ≤ (1−)
n
[a
0
[
and we have
n
¸
i=m
[a
i
[ ≤ [a
0
[
n
¸
i=m
(1 −)
i
= [a
0
[
(1 −)
m
−(1 −)
n

.
For suﬃciently large m, n this can be made arbitrarily small and so we
have a Cauchy series and by the previous lemma we have completed
the proof of this case.
For the case q > 1 we have an > 0 and an n
0
such that for all
n > n
0
we have q > 1 +2 and [q
n
−q[ < . As a consequence, again of
the triangular inequality, we have [q
n
[ = [q −(q −q
n
)[ ≥ [q[ −[q −q
n
[ >
q − > 1 + . Therefore we conclude that [a
n
[ ≥ (1 + )
n
[a
0
[ so that
the sequence of [a
n
[ does not converge. Then, as a consequence by the
above lemma also
¸
∞
n=1
a
n
cannot converge. This completes the proof.
For the case q = 1 I need to present one series that diverges and one
the converges to convince you that in this case the quotient criterion is
useless. Consider the series
∞
¸
n=1
1
n
46CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
which diverges but has q
i
=
i+1
i
which converges to 1, and
∞
¸
n=1
1
n(n + 1)
= 1
which converges but also has q
i
=
(n+1)
2
n
2
which converges to 1as well.
This concludes the proof.
A further criterion is
Theorem 27 (Root test of Cauchy) For a series
¸
∞
i=1
a
i
we form the
sequence q
i
= [a
i
[
1/i
and determine its limit
lim
n→∞
q
i
= q
The series
¸
∞
i=1
a
i
converges if q < 1, it diverges if q > 1 and for q = 1
it may either converge or diverge.
Proof: Works analogously to the proof of the the previous theorem
and is left as an exercise.
1.5.1 Absolute convergence
Up until now we have generally investigated the sequences of the form
¸
∞
i=1
[a
i
[ and from their convergence we concluded that
¸
∞
i=1
a
i
must
also converge. This is ﬁne, but there are sequences that cannot be
attacked like that. An example is the following sequence
∞
¸
n=1
(−1)
(n+1)
n
= ln 2
which converges but for which the sequence
¸
∞
n=1
1
n
diverges to inﬁnity.
Exercise: Prove convergence (but not the value) of the above series
by collecting neighboring terms in pairs to obtain a sum for which you
can show convergence more easily. This is a trick to accelerate the con-
vergence of the sum and later you will learn more of these.
1.5. SERIES 47
So, we see that the series converges. Surely as we just grouped the
terms in pairs, we may also reorder the terms as we please without
aﬀecting the convergence? This may seem natural, but it is deﬁnitively
wrong. Before I will prove this to you I will ﬁrst deﬁne what I mean by
a reordering and then I will show that for every absolutely convergent
series the order of the summation does not play any role.
Deﬁnition 28 Given a sequence (a
n
)
n=1,...
and a bijective map τ : N →
N then we call the sequence (b
n
)
n=1,...
with b
n
= a
τ(n)
a reordering of
the sequence (a
n
)
n=1,...
.
Now we can state
Lemma 29 Let
¸
∞
i=1
a
i
be an absolutely convergent series with the
limit a, then every reordering is a convergent series with limit a.
Proof: We need to show that for any reordering
lim
n→∞
n
¸
i=1
a
τ(i)
= a.
Due to the absolute convergence of the series, for every there is an n
0
such that for all n > n
0
we have
∞
¸
i=n
0
[a
i
[ ≤

2
≤ .
This ﬁnishes the proof.
48CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
Now we come to the proof that the series
¸
∞
n=1
(−1)
(n+1)
n
, which is
not absolutely convergent, has a reordering that diverges. In fact one
can prove that in general for any convergent series that is not absolutely
convergent and any given c there is a reordering such that the reordered
series has the limiting value c. I will not show you the proof for this
general statement but I will show you that there is a reordering of the
sequence
¸
∞
n=1
(−1)
(n+1)
n
that diverges to inﬁnity. So, remember that
the elements of the series are given by a
n
=
(−1)
n
n
. Let us have a look
at the elements with odd index ranging from
1
2
n
+1
to
1
2
n+1
+1
. Then for
every n ≥ 1 we have
1
2
n
+ 1
+ . . . +
1
2
n+1
+ 1
≥
2
n−1
2
n+1
=
1
4
.
Therefore we reorder the series in the following way
1 −
1
2
+
1
3
−
1
4
+

1
5
+
1
7

−
1
6
+

1
9
+
1
11
+
1
13
+
1
15

−
1
8
.
.
.
.
.
.
which clearly diverges as every row gives a contribution that is deﬁni-
tively larger than
1
12
. This completes the proof.
Research project: The brave amongst you may try to prove the
following statement. Given a series
¸
∞
i=1
a
i
that is not absolutely con-
vergent. Show that for any value c there is a reordering of the series
such that
¸
∞
i=1
a
τ(i)
= c. (Note that this is a tough one. I had to do
this as a student in my ﬁrst year myself and I remember that it took
me a while.)
As a consequence we have to be very careful when we are dealing
with sums that are not absolutely convergent. You may think that this
1.5. SERIES 49
has little relevance for physical problems, but let us think about the
following problem.
Imagine we have a crystal made up of a periodic array of positive
and negative charges. To make life simple, we consider this crystal
to be one-dimensional and the charges arranged equally spaced with a
distance 1 between neighboring charges (see ﬁgure). Generally in solid
Figure 1.5: This is a caption
state physics one assumes that the crystal has inﬁnite extension. Now
we would like to compute the potential energy of a single charge, which
is just the work that is needed to remove it from the charge from the
crystal. The potential energy is inversely proportional to the distance
between two charges and is proportional to the product of the two
charges. So, if we wish to compute the potential energy of the positive
charge sitting at n = 0 we ﬁnd
∞ ,
¸
n=−∞
(−1)
(n+1)
n
= 2 ln 2 (1.84)
where the term with n = 0 is excluded and if we sum in the order
given by the summation index. Now it seems odd, that in a physical
situation it should matter how we order the terms in the sum. one can
try various ways out. One attempt could be to say that the crystal is
ﬁnite in reality and for a ﬁnite sum it does not matter in which order one
sums up the terms. But then one would still have the problem that one
would like to make the crystal larger and larger (a real crystal contains
50CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
easily 10
23
charges which is pretty much inﬁnite for most purposes) and
given that one can make the series diverge to inﬁnity one then expects
that one can make the energy per particle arbitrarily large. That’s not
very encouraging. The better answer is that indeed the fact that one
can change the convergence of the series by changing the order of the
terms reﬂects something physical. The order in which the terms occur
reﬂects the way in which the crystal is formed. Let us consider two
possible ways to build the crystal
1) We start with a particle in position 0. Then we add two particles
with the opposite charge to the ﬁrst one in the positions +1 and −1.
Then we add the particles in the positions +2 and −2 and so on. This
way yields exactly the summation order that we have used to obtain
ln 2 as the value of the sum. However, other ways can be considered.
2) An extreme case would be that one ﬁrst adds all the particles
that have the same charge as the one at the position 0. First we put
the ones at position ±2, then those at position ±4 and so on. Of
course this requires some work as we have to overcome the electrostatic
repulsion. In fact, now we ﬁnd that completing ﬁrst the positioning of
all the particles with the same charges requires inﬁnitely much work or,
in other words, its not possible.
But of course, as I have told you already that by reordering the sum
I can achieve any limit I like we realize that depending on the way in
which I build the crystal, diﬀerent amounts of work will be required in
the limit of very large crystals. So what looked like some mathematical
curiosity has actually acquired some real physical meaning.
In any case, the main point to remember is that in a convergent
series that is not absolutely convergent the order of the summation is
absolutely crucial and must not be changed!
1.5.2 Methods to enhance the speed of conver-
gence of series
In the previous sections I have introduced convergence criteria for series.
Often it is the case that we know that the series converges to some
ﬁnite value from our convergence criteria but we do not know the limit.
Indeed, it is the standard situation that you cannot give a simple value
1.5. SERIES 51
to the series, such as 2 or ln 2 which you would accept as nice answers.
But to be precise even ln 2 is some real number for which we have given
a nice name, but if you want to have the digits you will have to resort
to actually working out the series it represents term by term until you
have the required precision. Of course you would not like to compute
too many terms in the sequence to obtain the desired precision. The
number of terms that you need to compute for this depends both on
the precision that you want to achieve but also on the nature of the
sequence. Some sequences will converge rather fast so that you do
not need to compute many terms others will converge very slowly. An
example for a rapidly converging sequence is
∞
¸
n=0
1
n!
= e
∼
= 2.7182818265 . . . (1.85)
where it is enough to compute the ﬁrst 10 terms to get the ﬁnal value
to within 3 10
−8
. An example for a rather slowly converging sequence
is
∞
¸
n=1
(−1)
n−1
n
= ln 2
∼
= 0.693 . . . (1.86)
where you need to compute 10
7
terms to obtain the ﬁnal value to within
5 10
8
. So that would be really hard work. So, obviously you like to
ﬁnd ways to accelerate the convergence of the sequence. One obvious
way to do this is to use the observation that the terms are alternatingly
positive and negative in the sum. So why not grouping them in pairs
so that you only have positive terms. This give the summation
∞
¸
n=1
(−1)
n−1
n
=
∞
¸
n=1
1
2n(2n −1)
= ln 2
∼
= 0.693 . . . (1.87)
Now we have a sum that requires 10
7
terms to get to ln 2 to within
2.5 10
−8
. So we gained a little bit, but not really very much. In
fact, you still have to compute roughly the same, very large, number
of terms. We are really looking for something that requires instead of
N terms maybe
√
N terms or some even stronger improvement. In the
following I will show you a neat trick to do that. It will not work for all
series, but for quite a large number of them. Also, whenever you will
52CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
come into a situation where you may need to accelerate the convergence
of a sum you may remember that there are methods to do so and then
you can have a look for them in books. Ok, now lets start with what
is called the Shanks Transformation.
9
What is the main idea behind the Shanks transformation and all
the other convergence enhancing transformations? When you are given
a series then the terms in that series are not random, but they possess
some regularities which may be hidden under some transient behaviour.
In a sense the eventual value of the series will be hidden by this transient
behaviour. But sometimes we can make guesses of the form of the
transient behaviour and reduce their impact. In that way we will ’see’
the ﬁnal value more clearly. Let us study these ideas with an example.
Imagine ﬁrst a particularly simple situation. Consider a series
¸
∞
n=0
a
n
where the n-th partial sum is given by
n
¸
k=0
a
k
= s
n
= a + αq
n
. (1.88)
Clearly the ﬁnal value of the sum will depend on the three free param-
eters a, α and q. We can determine these three values from the three
equations
s
n
= a + αq
n
(1.89)
s
n+1
= a + αq
n+1
(1.90)
s
n+2
= a + αq
n+2
. (1.91)
Indeed we ﬁnd that
a =
s
n+2
s
n
−s
2
n+1
s
n+2
+ s
n
−2s
n+1
. (1.92)
So, if we would have indeed a sum where the partial sums have such a
simple behaviour we have just shown in eq. (1.88), we could determine
the ﬁnal value of the sum from only three terms. The idea is now to
9
Shanks was a mathematician who used early computers to determine many
thousands of digits of the number π which he did via series expansions. To get
good precision, he had to ﬁnd methods for speeding up the convergence of those
series.
1.5. SERIES 53
apply this transformation to sums that may have a somewhat more
complicated behaviour and hope that, while we will not get the exact
value of the sum, we may accelerate the convergence. This means that
instead of the very simple behaviour for the partial sums assumed above
we will have
s
n
= a(n) + αq
n
+ βr
n
+ . . . , (1.93)
where a(n) is hopefully some weak n-dependence. Then by the Shanks
transformation we get a new sum
S(s
n
) =
s
n+2
s
n
−s
2
n+1
s
n+2
+ s
n
−2s
n+1
, (1.94)
for which we hope that it has a faster convergence. It is in fact not easy
to prove when this method works and when it doesn’t (also I simply
do not know these proofs) so what most people do is to just try the
transformation and then look whether the convergence improves. If it
doesn’t then one can try various other convergence enhancing transfor-
mations.
Let us now study a few examples to see that sometimes indeed, the
Shanks transformation leads to excellent improvements in the rate of
convergence.
Example 1: Let us consider the simple geometric series
s
n
=
n
¸
k=0
(−x)
k
=
1 −(−x)
n+1
1 −(−x)
(1.95)
When we apply the Shanks transformation we obtain
S(s
n
) =
1
1 + x
(1.96)
which is the exact sum. So here convergence is immediate.
Example 2: Let us consider the slightly more complicated series
∞
¸
i=0

1 −
1
2
i+1

(−x)
i
=
1
(x + 1)(x + 2)
(1.97)
54CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
If we wish to evaluate the original sum for x = 0.99 we need about 1500
terms to get the ﬁrst 6 decimal digits of the sum. That’s a lot of work.
Why could the Shanks transformation work in this case? To see this
let us write out the partial sums which are
s
n
=
n
¸
i=0

1 −
1
2
i+1

(−x)
i
=
1
(x + 1)(x + 2)
−
(−x)
n+1
x + 1
+
(−x/2)
n+1
x + 2
.
(1.98)
You can prove this by induction (Exercise). So you can see that the par-
tial sums indeed have a very close similarity to the behaviour assumed
in eq. (1.88). Indeed, for a general behaviour of the form
s
n
= a + α
1
q
n
1
+ α
2
q
n
2
(1.99)
If we consider 1 q
1
q
2
then we ﬁnd for the Shanks transformation
S(s
n
) = a +
α
1
α
2
(q
1
−q
2
)
2
αq
2
(q
1
−1)
2
q
n
1
+ α
2
q
1
(q
2
−1)
2
q
n
2
q
n
1
q
n
2
∼
= a +
α
1
α
2
q
2
1
α
1
q
2
q
n
1
+ α
2
q
1
q
n+2
2
q
n
1
q
n
2
(1.100)
∼
= a +
α
1
α
2
q
2
1
α
1
q
2
q
n
1
q
n
1
q
n
2
(1.101)
∼
= a + α
2
q
2
1
q
2
q
n
2
(1.102)
Before the Shanks transformation the dominant transient behaviour
comes from q
1
. But after the Shanks transformation we have an approx-
imate transient behaviour which is proportional to q
n
2
which is much
weaker. In general one can show that the transient behaviour of the
partial sums after the Shanks transformation is much smaller than for
the original series. This suggests that the Shanks transformation is
improved by successive iteration.
Of course you can iterate the Shanks transformation, ie once you
have used the Shanks transformation to generate a new series, you can
apply the Shanks transformation to that series and so on.
In the following table I show you the remarkable success of the
Shanks transformation for this sum for x = 0.99 for which the exact
result is 0.1680644 . . ..
1.5. SERIES 55
n s
n
S(s
n
) S(S(s
n
)) S(S(S(s
n
)))
0 +0.5000000 − − −
1 −0.2425000 0.1554524 − −
2 −0.6150875 0.1736603 0.1679926 −
3 −0.2945678 0.1654309 0.1680796 0.1680642
4 +0.6360096 0.1693366 0.1680609 0.1680644
5 −0.3001213 0.1674421 0.1680652 0.1680644
6 +0.6340036 0.1683706 0.1680642 0.1680644
7 −0.2944209 0.1679133 0.1680645 0.1680644
10 +0.6178369 0.1680827 0.1680644 0.1680644
15 −0.2597995 0.1680639 0.1680644 0.1680644
20 +0.5749627 0.1680644 0.1680644 0.1680644
You can clearly see the impressive enhancement in the rate of conver-
gence.
Example 3: For our last example let us come back to the series
for the ln 2, ie
∞
¸
i=1
(−1)
i+1
i
= ln 2 (1.103)
The numerical value is ln 2
∼
= 0.6931472 . . . and we ﬁnd
n s
n
S(s
n
) S(S(s
n
)) S(S(S(s
n
)))
1 1.0000000 − − −
2 0.5000000 0.7000000 − −
3 0.8333333 0.6904762 0.6932773 −
4 0.5833333 0.6944444 0.6931058 0.6931489
5 0.7833333 0.6924242 0.6931633 0.6931467
6 0.6166667 0.6935897 0.6931399 0.6931474
7 0.7595238 0.6928571 0.6931508 0.6931471
8 0.6345238 0.6933473 0.6931452 0.6931472
15 0.7253719 0.6931138 0.6931473 0.6931472
25 0.7127475 0.6931397 0.6931472 0.6931472
35 0.7072289 0.6931444 0.6931472 0.6931472
Again the improvement in the rate of convergence is really impres-
sive.
56CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
If you want to learn more about other possible methods for enhanc-
ing the rate of convergence, then you should have a look at the book
by CM Bender and SA Orszag ”Advanced Mathematical Methods for
Scientists and Engineers” published in Springer Verlag.
1.6 Complex numbers
So far we have got to know the natural numbers, the integers, the ra-
tional numbers and the real numbers. But even the set of real numbers
is not really suﬃcient for all purposes. The real numbers are not too
diﬃcult to imagine as they are the limiting case of rational numbers.
So, in some sense they are not so much diﬀerent from the rational
numbers. The completeness of the real numbers also tells us that all
real Cauchy sequences converge to a real number. However, we have
also seen some strange behaviour. For example, I showed you an itera-
tive method for computing the square root of a numbers by employing
the rule x
n+1
=
1
2
(x
n
+ a/x
n
) to create a sequence that converges to
√
a for a > 0. Then I suggested to you to try and see what happens
when you replace a by −a. Then we have the sequence generated by
x
n+1
=
1
2
(x
n
+(−a)/x
n
). It turns out that this sequence does not con-
verge and in fact it is jumping around quite wildly. What is the reason
for that? If the sequence would converge, then we would have for the
limiting value x the equation x =
1
2
(x − a/x) which has the ’solution’
x =
√
−a for a > 0. But does that make sense? Is there a way in
which the square root of a negative number makes sense? Do we have
to make sense of it? Indeed, these were questions that mathematicians
asked themselves about 500 years ago and it took about 250 years from
the realization that there is a problem to arriving at a satisfactory un-
derstanding. Indeed, for many years complex numbers were considered
an amusing curiosity with no real signiﬁcance. Let us see how the story
went
10
.
Commonly the birth of the complex numbers is dated as 1545 with
the publication of Cardano’s book Ars Magna. This, however, is not
10
Much more than what I am going to tell you now can be found in the excellent
book by T. Needham ”Visual Complex Analysis” published in Clarendon Press
which I thoroughly recommend as an introduction to complex analysis.
1.6. COMPLEX NUMBERS 57
quite true as Girolamo Cardano introduced those numbers but then also
dismissed them as useless. Why did he think that? He was certainly
no fool so he must have had a reason. Amongst other things Cardano
studied quadratic equations, ie equations of the form
x
2
= mx + b. (1.104)
We can very easily write down the solutions for this equation. They
are
x
1/2
=
1
2

m±
√
m
2
+ 4b

. (1.105)
That’s ok, and all of you know this formula. But what do we do when
m
2
+4b is negative. Then we have to take the square root of a negative
number and the formula tells us that the solution to the quadratic
equation is one of these ’imaginary’ numbers. Well, you could accept
this and say, ok these numbers occur so they must make sense. But
Cardano, like most mathematicians of the time, thought more in terms
of geometry and this way of thinking led him to conclude that the
complex numbers do not make much sense. For him the quadratic
equation simply means that we look for the intersection of a straight
line y = mx + b with a parabola y = x
2
. Sometimes this problem
Figure 1.6: This is a caption
has a solution and sometimes it doesn’t. Well, indeed the existence of
solutions is indicated by a real solution x while the non-existence of
a solution is indicated by the occurrence of roots of negative numbers
58CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
in the expression for x. So, for Cardano the occurrence of these weird
imaginary numbers simply indicated that there is no solution and so he
decided that these numbers had no particular use or meaning. Well, this
was not as unreasonable as it may seem today. So, imaginary numbers
lay dormant for quite a bit longer. Cardano also considered solutions
of more complicated polynomials, namely those of cubic polynomials.
In general they have the form
ay
3
+ by
2
+ cy + d = 0 (1.106)
but dividing by a and using the substitution y = x −
b
3a
this can be
reduced to the standard form
x
3
= 3px + 2q (1.107)
ie again we can easily interpret this equation as the quest for ﬁnding an
intersection between a cubic curve y = x
3
and a straight line y = mx+b.
You can see straight away (have a look at ﬁgure ) by drawing this
problem that this always has a solution. Ok, Cardano also derived a
Figure 1.7: This is a caption
formula to solve this problem.
x =
3

q +

q
2
−p
3
+
3

q −

q
2
−p
3
(1.108)
which is not at all easy to ﬁnd. Either Cardano didn’t realize or he
didn’t care anymore after he decided that the imaginary numbers are
1.6. COMPLEX NUMBERS 59
not very signiﬁcant, but here a remarkable phenomenon happens. It
was realized by Bombelli more than 30 years later. Clearly, when
p
3
> q
2
then there are again roots of negative numbers. And this
time their occurrence is more serious because they cannot be dismissed
with the remark that they correspond to the lack of actual solution for
the intersection of two curves because we know that there always is
such an intersection. Bombelli considered
x
3
= 15x + 4 (1.109)
which clearly has the solution x = 4 but for which Cardanos formula
yields
x =
3
√
2 + 11i +
3
√
2 −11i (1.110)
where I have abbreviated
√
−1 with i. Now Bombelli was trying to
make sense of the situation and was ﬁghting very hard with this prob-
lem. Finally he had a very good idea indeed (he called it a ”wild
thought” which it certainly was at that time). Perhaps, he thought,
the solution x = 4 is indeed equal to x =
3
√
2 + 11i +
3
√
2 −11i. So he
made the Ansatz
3
√
2 + 11i = 2 + in
3
√
2 −11i = 2 −in. (1.111)
Of course he already made some assumption here, namely that ’real’
part of the numbers are equal, but that seemed quite a decent guess.
Furthermore he had to make the very reasonable assumption that one
can add imaginary numbers z
1
= a
1
+ ib
1
and z
2
= a
2
+ ib
2
according
to the law
z
1
+ z
2
= (a
1
+ ib
1
) + (a
2
+ ib
2
) = (a
1
+ a
2
) + i(b
1
+ b
2
). (1.112)
Of course, he also needed to compute (2+in)
3
, ie he needed some rules
for multiplication. He chose quite naturally
(a
1
+ ib
1
)(a
2
+ ib
2
) = a
1
b
1
+ i(a
1
b
2
+ b
1
a
2
) + i
2
b
1
b
2
(1.113)
and he made the reasonable assumption that i
2
= −1 given that we
deﬁned i as
√
−1 and he arrived at
(a
1
+ ib
1
)(a
2
+ ib
2
) = a
1
b
1
−b
1
b
2
+ i(a
1
b
2
+ b
1
a
2
). (1.114)
60CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
With these rules he was now able to verify (2 ± i)
3
= 2 ± 11i, ie
3
√
2 ±11i = 2 ± i. Therefore he was able to make sense of imaginary
numbers at least in this situation and also motivated the multiplication
law for these numbers.
For the next 250 years there was not too much progress in the study
of complex numbers. One of the reasons was the problem that no-one
was really able to visualize these numbers, quite unlike the real num-
bers for example which are points on a line. The real progress started
when Wessel and Argand independently realized that one can represent
imaginary numbers in a plane, the complex plane. The real numbers
would form one axis and the other axis is formed by the real multiples
of i. This in itself would not have been a big progress if it weren’t
for the fact that now the laws for addition and multiplication gained a
very simple geometrical intuition. Indeed, the addition of two complex
numbers was just the addition of the two corresponding vectors in the
complex plane. That’s fairly straightforward. Less straightforward is
the realization that the multiplication of two complex numbers (follow-
ing Bombellis rule) is equivalent to obtaining the resulting vector by
multiplying the length of the two vectors and adding the angles that
they make with the real axis. (Prove this as an exercise). This is very
nice indeed. The fact that multiplication is a combined stretching and
rotation makes it a bit more plausible how Euler could get the intuition
for his famous Euler formula
z = re
iφ
= r(cos φ + i sin φ) (1.115)
where r is a real number giving the length of the corresponding vector
and φ gives the angle it makes with the x-axis.
This formula is really useful in many aspects of mathematics and
especially physics.
Example: Verify the law
e
iθ
+ e
iφ
= 2 cos
θ −φ
2
e
i(θ+φ)
2
(1.116)
both by calculation and with a picture.
Examples: Derive cos(3φ) = 4(cos(φ))
3
− 3 cos φ using complex
numbers.
1.6. COMPLEX NUMBERS 61
Example: Let S
n
=
¸
n
k=1
cos(2k −1)θ. Show that
S
n
=
sin 2nθ
2 sin θ
(1.117)
Example: Finally let us study again the map
z
n+1
=
1
2
(z
n
+
a
z
n
) (1.118)
which allows us to compute
√
a numerically. Remember that for a real
initial value z
0
the sequence converges to the positive square root
√
a if
z
0
> 0 and to the negative square root
√
a if z
0
< 0. Now let us assume
however, that we start with a purely imaginary value z
0
= ir
0
. In that
case we get a recursion relation for z
n
= ir
n
which is given by
r
n+1
=
1
2

r
n
−
1
r
n

Now if we start with some value of r
0
the sequence of r
n
does not con-
verge at all. Indeed, it is jumping back and forth quite wildly. For ex-
ample, g =
1+
√
5
2
= 1.618.... maps to 0.5, −0.5, 0.291666..., −1.56845, ....
etc. But some r
0
behave diﬀerently. An example is r
0
= 1 +
√
2 which
maps to 1, 0, ∞. This seems to be some rather wild behaviour alto-
gether and we have to try to get some handle on the problem to be
able to understand what can and cannot happen. To achieve this, we
use a rather nice little trick, namely we make the substitution
r = −cotan(απ) ≡ −
cos(απ)
sin(απ)
Now we can insert this into the recursion relation for r
n
to ﬁnd
1
2

r
n
−
1
r
n

=
1
2

−
cos(α
n
π)
sin(α
n
π)
−
sin(α
n
π)
cos(α
n
π)

= −
cos
2
(α
n
π) −sin
2
(α
n
π)
2 cos(α
n
π) sin(α
n
π)
= −cotan(2α
n
π) = r
n+1
62CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
Taking into account that cotan(απ) = cotan((α + 1)π), we ﬁnd as a
consequence a recursion relation for the α
n
which is
α
n+1
= 2αmod1
where the symbol amodx means that we take the remainder of a upon
division by x. Here x has to be a natural number. So, in the above re-
cursion relation we only keep the digits of 2α
n
after the decimal point.
Now the behaviour of the recursion relation becomes far more transpar-
ent. If we consider α
n
written in binary notation, we see that one step
of the recursion relation amounts to shifting the digits in the binary
representation ’to the left’ by one place. Therefore, we can see that
there are three diﬀerent types of behaviour.
1. If the binary representation of α
0
terminates after n digits, then
the recursion relation will lead to α
n
= 0 and as a consequence
r
n
= ∞, ie the recursion diverges.
2. If the binary representation of α
0
is periodic and non-terminating
(for example 2/3 = 0.10101010...) with period n, then the recur-
sion relation will lead to period n.
3. If the binary representation of α
0
is aperiodic and non-terminating,
then the sequence of the α
n
is also aperiodic and non-terminating.
So, what we observe here, is that a tiny diﬀerence in the initial condi-
tions can lead to a hugely diﬀerent behaviour of the sequence generated
by the recursion relation. This phenomenon, that can be observed for
many recursion relations, is a manifestation of ’chaos’. Indeed, initially
physicists were not aware of this behaviour, but nowadays it is well-
known that chaos reigns in many areas of physics. The dynamics of the
solar system is one example.
So, two almost identical initial conditions diverge which is a signa-
ture for chaos. Then to quantify the degree of chaos one could quantify
the rate at which the two closely spaced points begin to diverge. This
is usually done via the so-called Lyapunov exponent, that is deﬁned by
λ = ln(
α
n+1
α
n
) (1.119)
1.6. COMPLEX NUMBERS 63
This tells you by how much the value of α grows in successive steps
of the iteration and it therefore also tells you the rate by which any
diﬀerence between the initial conditions grow. In the above recursion
you ﬁnd λ = ln 2 = 0.693.....
An even more wild a weird behaviour can be found from the recur-
sion relation
z
n+1
=

2z
n
3
+
1
3z
2
n

64CHAPTER 1. SETS, NUMBERS ANDTHE CONCEPT OF INFINITY
Chapter 2
Functions of real variables
In the last section I have introduced some of the various concepts of
numbers that are in use in physics and mathematics. You have also
learnt about sequences and series and their properties. We viewed a
sequence just as an ordered list of numbers. But one can also consider a
sequence as a function that maps the natural numbers into the real (or
complex) numbers, namely with each natural number n we associate
exactly one other real (or complex) number, namely the n-th element
of the sequence. So, here a function is a way of enumerating (using the
natural numbers) and ordering a set of real (complex) numbers. See
ﬁgure ... for a way to depict such a sequence graphically. The horizontal
Figure 2.1: This is a caption
axis is used to enumerate the natural numbers and the vertical direction
65
66 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
is used to plot the value of the corresponding element of the sequence.
As you can see, there are plenty of gaps, simply because we enumerate
the elements of our sequence by natural numbers. Of course, we could
have also made a more fancy sequence, namely we could have associated
with every rational number another real (or complex) number. This is
generally not what is done, but it is possible, because there is a one-
to-one mapping between the natural and the rational numbers. Such
sequences could also be plotted and they would look as if there were no
gaps, but of course you know that there are, simply because not a single
irrational number is used to enumerate the sequence. Therefore, if you
would like to draw the graph of this sequence, analogously to that of
the sequence enumerated by natural numbers, you could not let the pen
on the paper all the time, but you would have to lift it up and down to
draw single dots for each of the elements of the sequence. The reason
for that is of course the fact that the rational numbers have lots and lots
of gaps between them, namely the irrational numbers such as
√
2 etc.
Ideally we would like to ﬁll these gaps as well. So, we need sequences
that are even more densely enumerated that by rational numbers. This
idea then leads to the concept of functions from real numbers to real
(or complex) numbers. This little heuristic argument should indicate to
you that those function are in a sense continuous versions of sequences.
This innocious step however, allows us to introduce new concepts and
generalizations of old concepts which make the idea of functions so
much more powerful than that of sequences and series. As a bonus we
will obtain new tricks to evaluate discrete sums.
2.1 More about sets
In the ﬁrst chapter we have encountered the basic concept of sets as
well as some of the operations that one can do with them. These oper-
ations included the intersection and the union of sets and some more.
We have also considered the concept of limits and we have encountered
the remarkable phenomenon that a sequence of rational numbers may
not converge to any rational number itself. However, any sequence of
natural numbers that converges, actually converges against a natural
number. Therefore convergence and the properties of the limiting value
2.1. MORE ABOUT SETS 67
of a sequence within a set can tell you something about the basic struc-
ture of a set. We will now distinguish two main types of sets, namely
open and closed sets.
Deﬁnition 30 A subset ( of Ω is called closed, when it contains the
limit points for all convergent sequences that can be formed with its
elements.
A subset O of Ω is called open when it is the complement of a closed
set (the complement of O is deﬁned as
¯
O = ¦x ∈ Ω[x / ∈ O¦).
Let us consider some examples from the real numbers to make these
deﬁnitions clear.
Examples: An interval is a subset of the real numbers which can
be open, closed and half-open. These three types are deﬁned by
• A closed interval from x
0
to x
1
denoted by 1
1
= [x
0
, x
1
] is an in-
terval that contains all the numbers x which satisfy x
0
≤ x ≤ x
1
.
Clearly, any sequence (a
n
)
n=1,...
from points in this interval satis-
ﬁes x
0
< a
n
< x
1
. Therefore, the limit a = lim
n→∞
a
n
can neither
be large than x
1
nor smaller than x
0
, therefore x
0
≤ a ≤ x
1
. As a
consequence all the limiting points of sequences of elements from
the interval 1
1
lie in 1
1
.
• An open interval from x
0
to x
1
denoted by 1
2
=]x
0
, x
1
[ is an
interval that contains all the numbers x which satisfy x
0
< x <
x
1
but not the points x
0
and x
1
. Clearly, this interval is the
complement of the set of points ] − ∞, x
0
] ∪ [x
1
, ∞[ which is a
set which contains all the limiting points of sequences made up
of elements of that set.
• A half-open interval from x
0
to x
1
is denoted by 1
3
=]x
0
, x
1
] when
it contains all the numbers x which satisfy x
0
< x ≤ x
1
and it is
denoted by [x
0
, x
1
[ when it contains all the points x
0
≤ x < x
1
.
This interval is neither open nor closed.
• The points x
0
and x
1
are called boundary points of the interval.
• The set ]x
0
, ∞[= ¦x[x > x
0
¦ is open because both boundary
points x
0
and ∞ are not part of the set.
68 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
• The set [x
0
, ∞[= ¦x[x ≥ x
0
¦ is closed because its complement is
open.
• If two sets / and B are open, so is their intersection /∩ B
• Any ﬁnite intersection of open intervals /
i
, denoted by ∩
n
i=1
/
i
is
open.
• An inﬁnite intersection of open intervals /
i
, denoted by ∩
∞
i=1
/
i
=
lim
n→∞
∩
n
i=1
/
i
can be open or closed depending on the sequence.
For example ∩
∞
i=1
] −
1
i
,
1
i
[= ¦0¦ is closed while ∩
∞
i=1
]0, i[ =]0, ∞[
is open.
• The empty set Ø is open because there is no point that does not
have a ball around it that lies in the set.
• The set of real numbers R is open since for every point there is a
ball that contains that point and lies entirely in R.
• Now, maybe surprisingly, the sets Ø and R are both closed as
their complements are open.
Exercises: Prove the following statements
• Any ﬁnite intersection of open sets is open.
• The intersection of inﬁnitely many open sets may be open or
closed.
• The intersection of any number of closed sets is a closed set.
Open and closed sets play some role in the deﬁnition of functions
and their properties, which is why I mentioned them here.
2.2 The basic deﬁnition of a function
In the general introduction to this section I have outlined the idea of
functions. I have said that sequences are a mapping of the natural
numbers to the real numbers etc. In the following I would like to
2.2. THE BASIC DEFINITION OF A FUNCTION 69
introduce names for many of the basic concepts of functions to allow
us to speak more easily about them.
Deﬁnition 31 A real function f : T → 1 ⊂ R associates with every
x ∈ T ⊂ R a unique real number y = f(x) ∈ 1.
• The set T ⊂ R is called the domain of the function. We will also
write T = dom(f).
• The set 1 ⊂ R of all possible values that the function can assume
is deﬁned by 1 = ¦y[∃x ∈ Tsuch that y = f(x)¦ is called the
image of the function. We will also write 1 = im(f).
• The set R, which includes 1 is also called the range or codomain
of the function.
Examples: 1) The function f : R →R which is given by f(x) = x
has the domain T = R, ie it is deﬁned on all real numbers. Its image
is also the set of all real numbers.
2) The function f : R
+
→ R which is given by f(x) =
√
x has the
domain T = R
+
, ie it is deﬁned on all positive real numbers. Its image
is also the set of all positive real numbers.
3) The function f : R → R which is given by f(x) = x
2
has the
domain T = R, ie it is deﬁned on all real numbers. Its image is also
the set of all positive real numbers.
The domain of a function can have many forms. It can be discrete
such as the natural numbers or the rational numbers in which case
we are taken back to the case of sequences. It can of course also be
continuous.
We have now deﬁned functions and so we should learn some basic
operations on functions. Firstly, a function f : dom(f) → im(f) maps
the domain of f to the image of f. Of course, I could then try to apply
another function g : dom(g) → im(g) to the image of f again, see the
ﬁgure. This is only well-deﬁned when im(f) ⊂ dom(g). Then we get a
new function denoted by h = g ◦ f. We have that dom(h) ⊂ dom(g).
70 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
Figure 2.2: This is a caption
Of course f = g is a possibility and we may iterate the function many
times. Then we denote
f
2
(x) = f ◦ f(x) = f(f(x)) (2.1)
f
n
(x) = f ◦ ... ◦ f
. .. .
(x) = f(f(. . . f(x) . . .)) (2.2)
ntimes
Note the diﬀerence between the notation f
n
(x) which is the n-fold iter-
ated application of the function f and the notation (f(x))
n
which is the
n-th power of the value f(x). To make the diﬀerence clearer, consider
f(x) = x
2
and n = 4. Then we have (f(x))
n
= x
8
and f
n
(x) = x
16
.
For general n we have (f(x))
n
= x
2n
and f
n
(x) = x
2
n
which is a clear
diﬀerence.
Examples: Already in the ﬁrst chapter you have seen a speciﬁc
example for an iterated map, namely the one for the computation of
the square root which I now write in functional form
x
n+1
= f(x
n
) =
1
2

x
n
+
a
x
n

(2.3)
so that we have
x
n
= f
n
(x
0
). (2.4)
2.3. CONTINUITY 71
A little later-on we will see how, using the concept of functions, we
can derive a whole lot of useful iteration procedures to solve equations
other than x
2
= a.
Another interesting map is the so-called logistic map which is de-
ﬁned as
f(x) = ax(1 −x) (2.5)
which exhibits chaos as well.
Given a function f we would sometimes invert it, ie for a given
value from the image of the function f we would like to ﬁnd out which
element from the domain of f led to it, for an y ∈ im(f) we would like
to ﬁnd the x ∈ dom(f) such that y = f(x). The formal deﬁnition is
Deﬁnition 32 For a one-to-one function f : dom(f) → im(f) we
deﬁne the inverse function, denoted by f
−1
: im(f) → dom(f) by
f
−1
(y) = x (2.6)
such that x is the unique value that satisﬁes f(x) = y.
A word of caution is necessary here as not all functions can be
inverted. Indeed, the condition that f is a one-to-one function from
dom(f) to im(f) is very important. If the function would associate
with two diﬀerent values x
0
and x
1
the same result y = f(x
0
) = f(x
1
)
then we could not invert it because given y we would be unable to
decide on a unique x that ensures f(x) = y.
But iterating functions is not the reason why they are important.
In fact, if it were just for the iteration of functions there would have
been no real need to introduce the concept of functions but we could
have stuck with sequences just as well. The real use of functions comes
from the ideas of continuity, diﬀerentiation, integration etc. This will
be introduced in the next section.
2.3 Continuity
Functions may be very regular but they may also be very irregular.
A ﬁrst property of regularity is that of continuity. In a hand-waving
72 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
way this implies, that you can draw the function without ever lifting
the pencil or in other words, there are no jumps in the function. This
deﬁnition is useful for intuition, but not so good when you actually
want to prove something. So, here is a rigorous deﬁnition.
Deﬁnition 33 A real function f : T → R is called continuous in the
point x
0
if for all > 0 there is a δ > 0 such that for all x with
[x −x
0
[ ≤ δ we have that [f(x) −f(x
0
)[ ≤ .
The function f is called continuous in an interval 1 if it is contin-
uous for all x ∈ 1.
This deﬁnition has quite some similarities with the limit of sequences.
To see this more clearly, let us see how we could deﬁne the limit of a
function, ie we would like to give sense to the expression lim
x→x
0
f(x) =
b. Clearly for sequences, ie functions f : N → R we deﬁned the limit
as lim
n→∞
f(n) = b by the requirement that for all > 0 there has to
be an n
0
such that for all n ∈ [n
0
, ∞[ we have [f(n) − b[ ≤ . That
means the closer we come into the neighborhood of ∞, the closer we
come to the value b. Therefore a natural deﬁnition for convergence for
functions is
Deﬁnition 34 A real function f : T → R converges to the point b in
the limit of x approaching x
0
if for all > 0 there is a δ > 0 such that
for all x ∈ [x
0
−δ, x
0
+ δ] we have that [f(x) −b[ ≤ .
Now we realize that continuity of the function f in the point x
0
equiv-
alent to the statement lim
x→x
0
f(x) = f(x
0
). We can visualize these
statements nicely in ﬁgure such as those in ﬁgure ....
Continuous functions are also nicely behaved in the sense that they
do not grow too fast. This is captured in
Lemma 35 A function f that is continuous on T is bounded on every
closed interval [x
0
, x
1
] ⊂ T.
Proof: Assume that f is not bounded on the interval [x
0
, x
1
] then I can
split the interval in two equal halves. In at least one half the function
will be unbounded. Chose that interval, call it middle point y
1
. Cut
the interval in half again and identify the interval in which the function
is unbounded. Name the middle point y
2
. Continue in this way ad
2.3. CONTINUITY 73
Figure 2.3: This is a caption
inﬁnitum so that you will deﬁne a sequence ¦y
i
¦ that converges to y
such that lim
y
i
→y
f(y
i
) = ∞. So that the sequence does not converge.
This is a contradiction so that the function has to be bounded. This
completes the proof.
Again, continuity is a mathematical concept which is highly conve-
nient, but it is also a property that you cannot verify experimentally
in a lab. You can just check it to higher and higher precision. If
the function that you are measuring has a jump that is smaller than
your measurement precision, then you cannot see it. Of course, if the
function does indeed have a discontinuity in some point, then you will
eventually be able to verify that there is a jump, once your experimen-
tal precision is suﬃciently high. However, the continuity in a certain
point cannot be proven by measurement. Nevertheless, continuity is a
very useful concept both in theoretical physics and in mathematics.
Examples:
1. The function f(x) = x is continuous in every point x
0
. This can
be proven easily by starting with any > 0 and choosing δ =
which is suﬃcient to check that for all x with [x − x
0
[ ≤ δ we
have [f(x) −f(x
0
)[ = [x −x
0
[ ≤ .
2. The function f(x) = x
2
.
74 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
Proof: Exercise
3. Given two functions f : T →R and g : T →R that are both con-
tinuous on the whole interval T and let α and β be real numbers,
then the functions
• αf + βg : T →R is also continuous on the interval T.
• f g : T →R is continuous on the interval T.
• If g has no zeros in T then also
f
g
are continuous in T.
The proofs for this are an exercise.
4. If f is continuous, then so is the function g(x) = (f(x))
2
.
Proof: Exercise.
Theorem 36 Given a continuous function f : [a, b] → R. Then for
every y ∈ [f(a), f(b)] there is an x ∈ [a, b] such that y = f(x)
Proof: For y = f(a) and y = f(b) the statement is evidently true.
Furthermore, let us assume f(a) < f(b). An analogous proof works for
f(a) > f(b). For f(a) = f(b) the statement is trivial. Therefore, let
us assume that f(a) < y < f(b). Let deﬁne a sequence of intervals
according to the following rule. We start with 1
0
= [a, b] ≡ [a
0
, b
0
].
Now, given an interval 1
n
= [a
n
, b
n
] with y ∈ [f(a
n
), f(b
n
)] we exam-
ine the two intervals [a
n
,
a
n
+b
n
2
] and [
a
n
+b
n
2
, b
n
]. If y ∈ [f(a
n
), f(
a
n
+b
n
2
)]
then 1
n+1
= [a
n
,
a
n
+b
n
2
] ≡ [a
n+1
, b
n+1
]. If y ∈ [f(
a
n
+b
n
2
), f(b
n
)] then
1
n+1
= [
a
n
+b
n
2
, b
n
] ≡ [a
n+1
, b
n+1
]. Therefore we get a monotonously in-
creasing sequence (a
n
)
n=1,...
and a monotonously decreasing sequence
(b
n
)
n=1,...
. The sequences are evidently bounded and therefore they
converge. As we also have [b
n
−a
n
[ = 2
−n
[b
0
−a
0
[ the limites coincide,
ie lim
n→∞
a
n
= lim
n→∞
b
n
. As the function is continuous, we have that
lim
n→∞
f(a
n
) = lim
n→∞
f(b
n
) = f(a
∞
) where a
∞
∈ [a, b]. This ﬁnishes
the proof.
2.3. CONTINUITY 75
2.3.1 Functions of many variables
If have shown you that continuity and taking the limit of a function are
closely related. This is all pretty harmless when we are talking about
functions of a single variable. But in physics often functions can have
more than a single variable. For example if you have a particle that is
moving in a plane, then the absolute value of its velocity will depend on
two variables that specify its position, ie v = v(x, y). In such functions
there are some important subtleties involved in carrying out limites.
To illustrate this, let me consider the following function
f(x, y) =
x
2
x
2
+ y
2
(2.7)
which is deﬁned everywhere except for the point (x, y) = (0, 0). Now
we could try to ﬁnd the limit of this function
lim
(x,y)→(0,0)
f(x, y) =? (2.8)
What do we mean by this limit? A real function of two variables
f : T → R converges to b in the limit of (x, y) approaching (x
0
, y
0
) if
for all > 0 there is a δ > 0 such that for all [(x, y) −(x
0
, y
0
)[ ≤ δ we
have that [f(x, y) −b[ ≤ . This is just the deﬁnition that we brought
above already for functions of a single variable. The only thing that
I should specify is what [(x, y) − (x
0
, y
0
)[ actually means. Well, it is
the distance between the two vectors (x, y) and (x
0
, y
0
), ie we have
[(x, y) −(x
0
, y
0
)[ =

(x −x
0
)
2
+ (y −y
0
)
2
.
So, now let us see whether we can possibly have convergence for
the function f(x, y) =
x
2
x
2
+y
2
. If we chose y = x then we would have
f(x, x) =
1
2
for any value of x as long as it is unequal to 0. Because
[(x, x) −(0, 0)[ =
√
2x we have in every proximity to (0, 0) points (x, y)
where f(x, y) =
1
2
. Likewise, for y = 2x we have f(x, 2x) =
1
5
for any
value of x as long as it is unequal to 0. Because [(x, 2x) −(0, 0)[ =
√
5x
we have in every proximity to (0, 0) points (x, y) where f(x, y) =
1
5
.
Therefore the function does not converge towards any ﬁxed value. That
in itself would not be so surprising. There are clearly functions that
converge and others that don’t. However, let us now take ﬁrst the limit
76 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
in y and then in x. Then we ﬁnd
lim
x→0
(lim
y→0
x
2
x
2
+ y
2
) = lim
x→0
1 = 1 (2.9)
So, here the limit exists. If we take the limit the other way around the
we ﬁnd
lim
y→0
(lim
x→0
x
2
x
2
+ y
2
) = lim
y→0
0 = 0 (2.10)
So, we make a very important observation here. Limites cannot nec-
essarily be interchanged. Of course there may be compelling physical
reasons for choosing a particular ordering for the limits but in the ab-
sence of such reasons, one has to take great care with multiple limits.
The problem in our example was that the limit lim
(x,y)→(0,0)
f(x, y)
does not exist and as a consequence taking the limits one by one, ﬁrst
in x and then in y and vice versa gives diﬀerent results. However, when
we know that the limit lim
(x,y)→(0,0)
f(x, y) exists, then we can also do
the iterated limites, or more precisely.
Theorem 37 If lim
(x,y)→(a,b)
f(x, y) = c and if lim
y→b
f(x, y) exists
then so does lim
x→a
(lim
y→b
f(x, y)) and it equals c.
Proof: As the total limes exists, then for all > 0 there is a δ > 0 such
that [f(x, y)−c[ ≤ for all [x−a[ ≤ δ and [y−b[ ≤ δ. From the existence
of the limit lim
y→b
f(x, y) it follows that [ lim
y→b
f(x, y) −c[ ≤ for all
[x −a[ ≤ δ and this ﬁnishes the proof.
2.4 Convexity I
Now let us study some other most useful property of function that is
quite closely related to continuity. This is the concept of convex and
concave function which will later turn out to be very useful for deriving
in a simple way a number of often used inequalities.
Deﬁnition 38 A function f is called convex on the interval [a, b] when
for all intervals [x, y] ⊂ [a, b] and all λ ∈ [0, 1] we have that
f(λx + (1 −λ)y) ≤ λf(x) + (1 −λ)f(y) (2.11)
A function f is called concave, if the function −f is convex.
2.4. CONVEXITY I 77
Geometrically this means that for a convex function for any pair of
points the function lies below the straight line joining the points (x, f(x))
with (y, f(y)). This is shown in ﬁgure ... where I have plotted two
functions. The left function is convex while the function on the right
is concave.
Figure 2.4: This is a caption
Examples:
1. The function f(x) = x is evidently both convex and concave.
2. The function f(x) = x
2
is convex. This can be seen by
λf(x) + (1 −λ)f(y) −f(λx + (1 −λ)y) = λx
2
+ (1 −λ)y
2
−(λx + (1 −λ)y)
2
= λ(1 −λ)(x −y)
2
≥ 0
This completes the proof.
3. It is more diﬃcult to prove that the function f(x) = x
2
n
is concave
for all n ∈ N. To see this directly one would have to verify in
particular for λ = 1/2 the equation
(λx + (1 −λ)y)
2n
≤ λx
2n
+ (1 −λ)y
2n
(2.12)
This is indeed a useful inequality to know, but maybe not so easy
to prove. In the following I will use a useful lemma to solve this
78 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
problem. In the next subsection however we will ﬁnd a more
direct and much easier way of approaching this questions.
Lemma 39 Given a convex function f : T → 1 and another
function g which is both convex and monotonuously growing on
1. Then the composition g ◦ f is also a convex function on D.
Proof: Convexity of f on T means that for all x
0
, x
1
∈ T and
λ ∈ [0, 1] we have f(λx
0
+ (1 − λ)x
1
) ≤ λf(x
0
) + (1 − λ)f(x
1
).
Then we have from the monotonicity of g on the interval T as
well because
g ◦ f(λx
0
+ (1 −λ)x
1
) = g(f(λx
0
+ (1 −λ)x
1
))
≤ g(λf(x
0
) + (1 −λ)f(x
1
))
≤ λg ◦ f(x
0
) + (1 −λ)g ◦ f(x
1
).
This proves the convexity of g ◦ f.
Now we realize that the composition of f(x) = x
2
with itself is
simply f ◦ f(x) = f(f(x)) = f(x
2
) = x
4
. As the image of f are
the positive real numbers on which f is convex and monotonous
we conclude from the lemma, that g
2
(x) = x
4
is convex as well.
By induction we can then conclude that g
n
(x) = x
2
n
is convex.
4. The function f(x) = ln x is concave. Again the proof of convexity
is not that straightforward and is equivalent to a useful inequality.
Concavity requires that
ln(λx + (1 −λ)y) ≥ λln(x) + (1 −λ) ln(y) (2.13)
for all λ ∈ [0, 1]. This is equivalent to the following statement.
For all positive p and q with
1
p
+
1
q
= 1 and all positive x and y
we have
x
1/p
y
1/q
≤
x
p
+
y
q
(2.14)
The proof of this inequality is not so straightforward and again I
leave the proof for later.
2.4. CONVEXITY I 79
5. The function f(x) = e
x
is convex on all real numbers. One could
try to prove this directly by verifying that
e
(
λx
0
+ (1 −λ)x
1
) ≤ λe
x
0
+ (1 −λ)e
x
1
(2.15)
but again a useful lemma comes in very handy indeed because it
will allow us to conclude the convexity of e
x
from the concavity
of ln x. We have
Lemma 40 If a monotonously growing and invertible function
f : R → R is convex exactly if the inverse function f
−1
: R → R
is concave.
Proof: First note that the inverse of a monotonically growing
function is also monotonically growing. This can be seen as fol-
lows. Monotonicity is equivalent to
x
1
≥ x
2
⇔ f(x
1
) ≥ f(x
2
) (2.16)
Now using y
i
= f(x
i
) we have the inverse function x
i
= f
−1
(y
i
)
and as a consequence
f
−1
(y
1
) ≥ f
−1
(y
2
) ⇔ y
1
≥ y
2
(2.17)
which conﬁrms the monotonicity of the inverse function f
−1
.
As f is convex, we have
λf(x
0
) + (1 −λ)f(x
1
) ≥ f(λx
0
+ (1 −λ)x
1
)
Denoting again y
i
= f(x
i
) and using f ◦ f
−1
= 1 we ﬁnd
f
−1
(λy
0
+(1−λ)y
1
) ≥ f
−1
(f(λx
0
+(1−λ)x
1
)) = λf
−1
(y
0
)+(1−λ)f
−1
(y
1
)
which is the condition for the concavity of f
−1
. The reverse di-
rection of the proof works analogously. This ﬁnishes the proof.
As the f(x) = e
x
is the inverse function of g(x) = lnx we can
use the concavity of g(x) = ln x to conclude the convexity of
f(x) = e
x
.
80 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
Before we go on to the next topic which will yield as a byproduct a
simple way to prove convexity I would like to show you that convexity
of a function is closely related to its continuity, in fact
Lemma 41 A bounded function f : T →R that is convex on an open
interval is continuous in the same interval.
Proof: We will proof this by assuming that the function is discontin-
uous and conclude that it cannot be convex then. To get an idea for
the proof have a look at the ﬁgure The function f(x) has a jump at
Figure 2.5: This is a caption
the position x
0
. This is the only way that a bounded function can be
discontinuous. Now we can see quite clearly, that if we draw a line from
the point (x
0
−, f(x
0
−)) to the point (x
0
+, f(x
0
+)), then some
parts of the function will lie above this line which cannot be true for a
convex function. Of course this is not a proof yet, but it gives us the
intuition for how to make the proof.
Assume that the bounded function f is discontiuous in the point
x
0
. This means that it has a jump in x
0
which means that
lim
µ→0,withµ>0
f(x
0
−µ) = b
−
(2.18)
lim
µ→0,withµ>0
f(x
0
+ µ) = b
+
(2.19)
(2.20)
2.5. DIFFERENTIATION 81
are diﬀerent. For simplicity let us assume b
−
< b
+
(an analogous proof
works for b
+
< b
−
). Now consider for some µ > 0 the expression
b
+
= lim
µ→0,withµ>0
f(x
0
+ 0.1µ)
= lim
µ→0,withµ>0
f(
1
2
(x
0
−µ) +
1
2
(x
0
+ 1.2µ))
≤ lim
µ→0,withµ>0
(
f(x
0
−µ) + f(x
0
+ 1.2µ)
2
)
=
b
−
+ b
+
2
This implies that
b
+
≥ b
−
. (2.21)
But this is a contradiction because we had assumed that b
+
< b
−
. This
ﬁnishes the proof.
2.5 Diﬀerentiation
When you walk up a hill you would like to know how steep it is and as
most hills are not uniformly steep, you would like to know how steep
the mountain is in every point. In a gross, but useful simpliﬁcation your
path on the mountain can be described by a simple function y = h(x),
which I will refer to as the height. Surely a good approximation to
steepness is to determine by how much the height changes when we
increase x by some amount ∆x, ie
h(x + ∆x) −h(x)
∆x
. (2.22)
For any given ﬁnite ∆x this is an approximation to the steepness in the
point x. Therefore, the we deﬁne
h

(x) = lim
∆x→0
h(x + ∆x) −h(x)
∆x
(2.23)
as the derivative of the function h in the point x. Clearly, a necessary
condition for this limit to exist is that lim
∆x→0
h(x+∆x) = h(x) which
82 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
is just the statement that the function h is continuous in the point x.
However, this alone is not enough. Indeed, the diﬀerentiability of a
function is a much stronger condition than the continuity of a function.
This can be seen by the following example of a continuous function that
is not diﬀerentiable in the point x = 0. Lets deﬁne the function
f(x) = [x[ =

f(x) = x for x ≤ 0
f(x) = −x for x ≥ 0
(2.24)
which is plotted in ﬁgure .... Now let us try to diﬀerentiate the function.
Figure 2.6: This is a caption
This is easy for x > 0 where we simply get
f

(x) = lim
∆x→0
f(x + ∆x) −f(x)
∆x
(2.27)
will not converge, because in any interval around x = 0 there are posi-
tive and negative values of x for which the expression
f(x+∆x)−f(x)
∆x
takes
2.5. DIFFERENTIATION 83
the values 1 and −1 respectively. So, we do not have convergence and
therefore the function cannot be diﬀerentiated in this point.
This is an example for a function that is continuous and can be
diﬀerentiated everywhere except in a single point. There are much
weirder functions which are continuous everywhere, but cannot be dif-
ferentiated anywhere! Their construction, however, is also a bit more
involved.
As I have showed you that there are functions that are continuous
but are not everywhere diﬀerentiable, there are of course also functions
that can be diﬀerentiated once but they may not be diﬀerentiated a
second time. A simple example can be constructed from the example
above. Namely, the function
f(x) =

f(x) =
1
2
x
2
for x ≤ 0
f(x) = −
1
2
x
2
for x ≥ 0
(2.28)
which is plotted in ﬁgure .... It is an easy exercise to check that at
Figure 2.7: This is a caption
the point x = 0 this function can be diﬀerentiated once but not twice.
More generally, one can deﬁne a function that can be diﬀerentiated n
times but not n + 1 times
f(x) =

(x) =
f(a) −f(b)
a −b
(2.30)
Proof: Let us ﬁrst simplify the setting a little bit, by considering
the function g(x) = f(x) − f(b) −
f(a)−f(b)
a−b
(x − b). the function f is
diﬀerentiable exactly if g is diﬀerentiable. For the function g we have
g(a) = g(b) = 0 and we have that
f(a) −f(b)
a −b
= f

(x) = g

(x) +
f(a) −f(b)
a −b
(2.31)
is equivalent to
g

(x) = 0. (2.32)
So, it is enough to show the statement that for a > b and g : [a, b] →R
that is diﬀerentiable in the interval ]a, b[ and satisﬁes g(a) = g(b) = 0
there exists an x ∈]a, b[ such that g

(x) = 0. The function g is dif-
ferentiable and therefore continuous. This implies that the function is
bounded on the interval [a, b]. Therefore the function possesses at least
one local extremum and where, by deﬁnition, its derivative vanishes.
This ﬁnishes the proof.
2.5.1 Convexity II and its application to inequali-
ties
In our ﬁrst section on convexity we had seen that the convexity of a
function is closely related to the veriﬁcation of inequalities that show
up in many mathematical and physical problems. However, in these
cases we had to know the truth of these inequalities to prove the con-
vexity of a function. However, to make the idea of the convexity of a
function useful for proving inequalities, we need independent criteria
for proving the convexity of function. Once in the possession of such
criteria, we can then turn the logic around and use convex functions to
prove inequalities that are really diﬃcult to prove without this tool. To
this end we will try to connect diﬀerentiability with the convexity of a
2.5. DIFFERENTIATION 85
function, thereby obtaining a very eﬃcient way to decide the convexity
(concavity) of a many functions.
Theorem 43 Given a function f that it twice diﬀerentiable. Then the
function is convex on an interval I exactly if f

(x) > 0 for all x ∈ I.
Proof: I will not present this proof as it is not too illuminating and
rather long. You can ﬁnd it in most books on Real Analysis.
With this theorem we can now check very easily the convexity (con-
cavity) of a large number of functions.
1. The function f(x) = x
2n
is convex for all n ∈ N. This is veriﬁed
very easily by diﬀerentiation the function twice. As a by-product
we have also veriﬁed the correctness of the inequality

x + y
2

2n
≤
x
2n
+ y
2n
2
(2.33)
2. The function f(x) = ln x is concave on the interval ]0, ∞]. Again
this is now almost trivial to check by diﬀerentiating the function
twice. This also veriﬁes the inequality: For all positive p and q
with
1
p
+
1
q
= 1 and all positive x and y we have
x
1/p
y
1/q
≤
x
p
+
y
q
(2.34)
Proof: We use the concavity of the logarithm together with the
monotonicity of the exponential function to see
ln(
1
p
x +
1
q
y) ≥
1
p
ln x +
1
q
ln y (2.35)
⇒ e
ln(
1
p
x+
1
q
y)
≥ e
1
p
ln x+
1
q
ln y
(2.36)
⇒
1
p
x +
1
q
y ≥ x
1
p
y
1
q
(2.37)
3. The function f(x) = −x log
2
x −(1 −x) log
2
(1 −x) is concave on
the interval ]0, 1[. This function is also called the entropy function
and its concavity is an important physical property that allows
its interpretation as a measure of disorder.
86 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
Exercises:
1) Consider a function that is convex on the interval 1. prove that
for any set of positive numbers p
i
that satisfy
¸
n
i=1
p
i
= 1 and for any
x
i
∈ 1 we have
f(
n
¸
i=1
p
i
x
i
) ≤
n
¸
i=1
p
i
f(x
i
). (2.38)
2) Prove that every function that is bounded and convex on an open
interval 1 is continuous on the same interval.
2.5.2 Minimization of convex functions on convex
sets.
Quite often in physics you are faced with the task of ﬁnding the absolute
minimum of some function. The minimization of the potential energy
of a system is an example which is important to ﬁnd the equilibrium
position of a particle in a potential. This problem is made diﬃcult
when there are many local minima. A convenient feature of a convex
function on a simply connected interval is now that it does not have
local minima. What is a local minimum? Given a function f we say
the function has a relative minimum in the point x
i
if there is an
i
> 0
such that for x ∈ [x
i
− , x
i
+ ] we have f(x) ≥ f(x
0
). The global
minimum is the point x
glob
such that f(x
glob
) ≤ f(x) for all x.
Assume that for the convex function f deﬁned on a convex set there
is a local minimum x
1
for which f(x
1
) > f(x
glob
), ie the local minimum
at x
1
is not the global minimum. Then from the fact that f(x
1
) is a
local minimum and the convexity of f we conclude
f(x
1
) ≤ f(λx
1
+ (1 −λ)x
glob
) ≤ λf(x
1
) + (1 −λ)f(x
glob
) (2.39)
and for λ ≤ 1 we ﬁnd
f(x
1
) ≤ f(x
glob
) (2.40)
which is in contradiction to the assumption that f(x
1
) > f(x
glob
).
Therefore, the function cannot have any local minimum that is smaller
than the global minimum.
2.5. DIFFERENTIATION 87
For convex function there is therefore a very simple method for
ﬁnding the global minimum of the function which is called method of
steepest descent. It works in the following way. Start in a randomly
chosen point x
0
. Determine the gradient of f in that point, ie
df
dx
(x =
x
0
). Then determine the next point via x
n+1
= x
n
− ∆
df
dx
(x = x
n
). If
we have f(x
n+1
) < f(x
n
) then continue. If not, then half the size of ∆,
ie ∆ → ∆/2 and try again.
This method will stop in some relative minimum, namely when
df
dx
(x) = 0. If the function is convex, then such a relative minimum
is a global minimum and we have succeeded.
Note, that I have described the procedure for a function of a single
variable. However, it straightforwardly generalizes to functions of many
variables.
2.5.3 Newton’s method
Now let us brieﬂy come back to a topic that we had considered earlier,
namely ﬁnding the square roots of the number a employing a sequence
of numbers of the form
x
n+1
=
1
2
(x
n
+
a
x
n
) (2.41)
How did we come to this sequence in the ﬁrst place. Well, it is a special
case of a more general idea which goes back to Newton apparently and is
an application of diﬀerentiability and works particularly well on convex
functions. Newton asked himself the following question: ”If I have a
function f(x) how can I ﬁnd its zeros, ie those x that satisfy f(x) = 0?”
One possible way to arrive at a general procedure that will do this
is the following argument. Given the function f(x) we can expand it
in a Taylor series which we break oﬀ after the ﬁrst term.
f(x
n+1
)
∼
= f(x
n
) + (x
n+1
−x
n
)f

(x
n
) (2.42)
Now set the left hand side to 0, ie assume that x
n+1
is a solution to
f(x) = 0. Then solving for x
n+1
we get
x
n+1
= x
n
−
f(x
n
)
f

(x
n
)
(2.43)
88 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
Why could that work at all? If the function f(x) is linear, ie f(x) = ax,
then the above truncated Taylor expansion is actually exact. As a
consequence, the iteration converges in a single step because x
n+1
= 0.
So we see that we have got a decent method for linear functions, which
is of course not terribly interesting. However, now you can have some
trust in the idea that if we are not too far from the solution of f(x) = 0
then the truncated Taylor series is a very good approximation and
so the sequence of numbers that we generate from it can be useful.
Furthermore, if you look at the Taylor expansion of a function around
its zero you immediately see that in a close neighborhood of the zero
the function is to a very good approximation linear because
f(x+) = f(x)+f

(x)+
1
2

2
f

(x)+. . . = f

(x)+
1
2

2
f

(x)+. . . (2.44)
Of course this is not any form of proof but it least it suggest that one
should look into this in a bit more detail. I will not show you the
proof with all the bells and whistles but will bring you a ’hand-waving’
argument because the full prove for the convergence of the method is
rather lengthy. What I am going to show you now is what I dreamed
up on a sheet of paper rather quickly and illustrates how one proceeds
as a theoretical physicist. Let me assume that I am already quite close
to the true zero x which satisﬁes f(x) = 0. Furthermore, let me assume
that
f

(x)
(2.50)
Now we can see that the deviation from the true solution has decreased
from to some quantity that is of the order of
2
. If is very small
then this means that we have come a lot closer to the true solution of
f(x) = 0. Now wild hand-waving we have worked our way to reasonable
amounts of hand-waving and we can be quite conﬁdent that there is
will be a theorem that one can prove strictly and which ensures the
convergence of the Newton method under certain assumptions. This is
indeed the case and one example is
Theorem 44 Let f : [a, b] → R a twice diﬀerentiable convex function
with f(a) < 0 and f(b) > 0. Let x
0
∈ [a, b] such that f(x
0
) > 0 then
the Newton sequence
x
n+1
= x
n
−
f(x
n
)
f

(x
n
)
(2.51)
is monotonically falling and converges against the solution of f(x) = 0.
Proof: The full and strict proof of this theorem is about two pages
long, so I will leave it out. Again it can be found in most books on
Real Analysis.
Examples and Exercises:
• Verify that the function f(x) = −sin x on the interval [0, π] sat-
isﬁes the criteria of the above theorem. Write down Newtons
iteration formula. Estimate the convergence rate, ie if you are an
away from a solution show what power of the next value is
away from the true zero.
• Let k be a natural number. Show that x = tan x has exactly one
solution in the interval ](k −
1
2
)π, (k +
1
2
)π[ which we call ξ. Prove
that the sequence
x
0
= (k +
1
2
)π (2.52)
x
n+1
= kπ + arctanx
n
(2.53)
converges against ξ. Determine the ξ to a precision of 10
−6
for
k = 1, 2, 3.
90 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
2.6 Integration
In this section we will revisit the concept of integration which you
have encountered already in the ﬁrst term in the form of the inverse
of diﬀerentiation. Here I will show you a diﬀerent approach, namely
the concept of Riemann integral. While this will lead us to the same
answers it has two distinct advantages. Firstly, this approach is closely
related to the concept of area, which allows its generalization to the
integration of functions of more than just one variable which is pretty
important in many areas of physics. Secondly, it will turn out that
some functions are not integrable in the standard sense. However, the
approach via Riemann integration actually motivates a more general
approach to integration which is called Lebesgue integration.
2.6.1 Riemann integration
When Riemann thought about integration, he actually looked at it
from a rather practical point of view. Let us draw the graph of a
positive continuous function f on an interval [a, b]. Now there is a
region that is delimited from above by the graph of the function, from
below by the interval [a, b] and on the sides by the vertical lines from
(x, y) = (a, f(a)) and from (x, y) = (b, f(b)). This is shown in ﬁgure
2.8.
Figure 2.8: This is a caption
The region delimited in the way described above, will have a certain
2.6. INTEGRATION 91
area A and the idea is to denote this area as
A =

b
a
f(x)dx. (2.54)
Why could Riemann possibly think that this is a sensible approach in
the ﬁrst place? In particular it is not clear at all that this ties in nicely
with the idea that the integral of a function f is just given by a function
F such that
dF
dx
(x) = f(x)? You can gain some understanding for this
by looking at some simple cases that you know from geometry. Let us
consider ﬁrst a very simple function, namely the constant function
f(x) = c > 0 (2.55)
and draw this function on the interval [a, b]. Clearly, the area between
the graph of the function and the x-axis is just a rectangle of width
b−a and height c, so that its area is give by c(b−a). Its not quite clear
how that connects to diﬀerentiation, but we can see that easily, when
we consider the more general situation in which we draw the function
on the interval [a, x] where x is a variable. Then the volume of the area
between the graph of the curve and the x-axis is given by
F(x) = c(x −a). (2.56)
It is now straightforward to verify, that
dF
dx
(x) = c = f(x). Therefore
we can see that in this simple example the two concepts do indeed
coincide.
However, what do you do if I give you the example f(x) = sin x
on the interval [0, π/2]. How would you prove that indeed the area
under the graph of the curve and the x-axis is given by cos 0 = 1? As
usual the best approach is to reduce the problem to something that we
understand very well. In this case, this is the area under a constant
function which we have studied above. Instead of a completely constant
function let us consider functions that are called piecewise constant, ie
a function f(x) and a series of values x
0
< x
1
< . . . < x
n
such that it
takes the ﬁxed value c
i
on the open interval ]x
i
, x
i+1
[. An example is
given in ﬁgure 2.9. Quite evidently, it makes sense to say that the area
under this function on the interval [x
0
, x
n
] is given by
I =
n−1
¸
k=0
c
k
(x
k+1
−x
k
). (2.57)
92 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
Figure 2.9: This is a caption
Obviously, only very few functions are stepwise constant and therefore
we have to go a step further. Given that we wish to deﬁne the integral
of a function on an interval as the area between the graph and the x-
axis, what we really need to do is to approximate this area. How could
we do this? One possibility is to ﬁnd upper bounds and lower bounds
which we successively improve such that eventually they coincide. We
will achieve exactly this employing stepwise constant functions.
Consider a function f(x) that is deﬁned on the interval [a, b] and
that is bounded both from above and below on that interval, ie there
are number c and C such that c ≤ f(x) ≤ C for all x ∈ [a, b]. Then
we can clearly say that the area A under the graph of f(x) is smaller
than C(b − a) and is larger than c(b − a). Therefore, we have found
an upper and lower bound on the area A. Of course, these two bounds
will not normally coincide, unless the function f(x) is constant. How
can we improve on these bounds? One possible approach would be one
that you have encountered already on various occasions. Why don’t we
split the interval [a, b] into two halves, namely
1
1
= [a,
a + b
2
] and 1
2
= [
a + b
2
, b] (2.58)
For each of these intervals we determine the maximum and the mini-
mum of the function f(x). For the interval 1
1
these are c
1
and C
1
. For
the interval 1
2
these are c
2
and C
2
. Then we have new upper bounds
c
1
b −a
2
+ c
2
b −a
2
≤ A ≤ C
1
b −a
2
+ C
2
b −a
2
(2.59)
2.6. INTEGRATION 93
or equivalently
c
1
+ c
2
2
(b −a) ≤ A ≤
C
1
+ C
2
2
(b −a) . (2.60)
Now we can continue this procedure, by subdividing both intervals
into two equal halves. After n successive divisions we end up with 2
n
intervals, each of length
b−a
2
n
. On each of these intervals we determine
the largest and the smallest value of the function f(x). For the i −th
interval these are denoted c
i
and C
i
so that we end up with the bounds
A
low
n
=
¸
2
n
i=1
c
i
2
n
(b −a) ≤ A ≤
¸
2
n
i=1
c
i
2
n
(b −a) = A
up
n
. (2.61)
In this way the sequence of successive upper bounds A
up
n
is non-increasing
while the sequence of successive lower bounds A
low
n
is non-decreasing.
As both sequences are bounded, they must both converge. Let us de-
note the limiting values by
lim
n→∞
A
up
n
=

∗b
a
f(x)dx,
lim
n→∞
A
low
n
=

b
∗a
f(x)dx.
These two limits are also called upper and lower integral. The big
question is now, whether they will actually converge to the same value
or not.
Deﬁnition: We call a function f(x) integrable on the interval [a, b]
if the upper and lower bounds A
up
n
and A
low
n
converge to the same value,
ie

∗b
a
f(x)dx =

b
∗a
f(x)dx
In this case we deﬁne the deﬁnite integral of f(x) on the interval [a, b]
by

∗b
a
f(x)dx =

b
∗a
f(x)dx ≡

b
a
f(x)dx
94 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
which equals the area between the graph of the curve and the x-axis.
Example I: Let us consider the function f(x) = x on the interval
[0, 1]. Let us assume that we have split the interval into 2
n
pieces. Then
for any interval [x
k
, x
k+1
] we have that the largest value of the function
is x
k+1
while the smallest value is x
k
. Therefore we obtain for the area
2
n
−1
¸
k=0
k
2
n
1
2
n
≤ A ≤
2
n
¸
k=1
k
2
n
1
2
n
(2.62)
from which it follows that
1
2
2
n
−1
2
n
≤ A ≤
1
2
2
n
+ 1
2
n
. (2.63)
Taking the limit n → ∞ on both sides, we ﬁnd
1
2
≤ A ≤
1
2
, (2.64)
so that upper and lower bound converge to the same value and we can
write

1
0
xdx =
1
2
. (2.65)
Example II: You may wonder whether there are functions that are
not integrable. Consider the example
f(x) =

1 x = rational
0 x = irrational
. (2.66)
Clearly, on any interval, no matter how small, this function is bounded
from above by 1 and from below by 0 and you cannot improve on these
bounds. Therefore, we ﬁnd that

∗b
a
f(x)dx = 1 > 0 =

b
∗a
f(x)dx.
Therefore, the upper and lower integrals are diﬀerent and therefore this
function is not integrable in the sense of Riemann. On the other hand
2.6. INTEGRATION 95
it seems quite reasonable to attribute the value zero to this integral be-
cause the function that we are integrating is practically always zero. In
fact, if you chose a number randomly, then it will always be irrational
and as you evaluate the function, you will ﬁnd it to be zero. What
you need to deal with this question is a generalization of Riemanns
approach, that is not only dealing with a decomposition of the inte-
gration interval into simple intervals, but into more general sets. Then
you have to develop a way to measure the size of the set. This is done
in an approach that is called measure theory and is indeed the basic
building block in the more general framework developed by Lebesgue.
Indeed, as a Lebesque integral the function considered above becomes
integrable and the integral takes the value 0. Fortunately, for many
applications in physics you do not need this more advanced theory of
integration. However, if you intend to study quantum ﬁeld theory, then
it may be useful.
On the other hand there are large classes of functions that are in-
tegrable.
Theorem: Every continuous function on a closed interval [a, b] is
integrable.
Proof: For a continuous function we have that lim
x→x
0
f(x) = f(x
0
).
This has to be true for any point x
0
∈ [a, b]. Now we realize that this
means that the function cannot vary too wildly. Indeed, consider that
we split the interval [a, b] into 2
n
equally sized intervals, via repeated
halving of the interval. Then for given > 0 there is always an n such
that in each of these intervals maxf(x) −minf(x) ≤ . If this were not
the case, then there would be a point x
0
which belongs to a sequence
of ever smaller intervals and in each of these intervals the function f(x)
would vary by more than . That would imply that lim
x→x
0
f(x) would
not exist. This is in contradiction to the continuity of the function.
If we now compute the sequences of upper and lower bounds A
up
n
and A
low
n
, we observe that for every > 0 there is an n such that
A
up
n
−A
low
n
≤ (b −a). Therefore,

∗b
a
f(x)dx −

b
∗a
f(x)dx = lim
n→∞
(A
up
n
−A
low
n
) = 0 (2.67)
which implies integrability and completes the proof.
96 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
There are also discontinuous functions that are integrable. For ex-
ample any monotonically increasing bounded function is integrable.
Proof = Exercise.
Now let us establish the connection between Riemann integrals and
diﬀerentiability. To this end, we need to consider the integral of a
continuous function f(x) over an interval [a, x] where x is variable, ie
we consider the function
F(x) =

0 for 0 ≤ x < 1
x −1 for 1 ≤ x ≤ 2
(2.72)
Can you diﬀerentiate this function. Certainly, you will have a problem
at x = 1 as
lim
∆x→0,∆x>0
F(1 + ∆x) −F(1)
∆x
= 1 (2.73)
lim
∆x→0,∆x<0
F(1 + ∆x) −F(1)
∆x
= 0 . (2.74)
So, obviously they are not the same.
Now let us consider two situations which have not been included
in the above considerations but which nevertheless are very important
in practice. In the discussions above I always considered the situation
in which we wish to integrate a function in a closed and ﬁnite interval
such as [a, b]. However, there may very well be situations in which we
wish to integrate over an inﬁnitely large interval, e.g. [a, ∞[ or even
] −∞, ∞[. How could we deﬁne such integrals? A quite natural way is
to consider these integrals as limits of deﬁnite integrals, ie

∞
a
f(x)dx = lim
R→∞

R
a
f(x)dx, (2.75)

∞
−∞
f(x)dx = lim
R,S→∞

S
−R
f(x)dx (2.76)
where in the second integral the limiting processes have to be carried
out independently. If the limit only exists under the assumption R = S,
then
P −

∞
−∞
f(x)dx = lim
R→∞

R
−R
f(x)dx (2.77)
where the P− in front of the integral indicates the restricted limiting
process and is called principal value.
Whether the left hand side of this equation makes any sense then
simply depends on the existence of the limit on the right hand side.
98 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
Quite clearly, for the existence of this limit it will not be suﬃcient
to have a continuous function as the simple example f(x) = c > 0
shows. For such a function the limit does not exist as it diverges to ∞.
Therefore, the function that you integrate must go to zero suﬃciently
rapidly when its argument grows to inﬁnity.
There is another situation in which we will have to deﬁne the value
of our integrals as a limit. This situation can arise when we wish to
integrate a function f(x) on an interval [a, b] on which the function is
not bounded. An example would be f(x) =
1
√
x
on the interval ]0, 1].
This is not a closed interval because the function is not deﬁned in the
point x = 0. Again, if we wish to make sense of this integral, the
simplest approach is by taking limits, namely by stating

b
a
f(x)dx = lim
R→b,R<b

R
a
f(x)dx, (2.78)

b
a
f(x)dx = lim
R→a,R>a

b
R
f(x)dx. (2.79)
For the example f(x) = 1/
√
x we can then obtain

1
0
1
√
x
dx = lim
R→0,R>0

1
R
1
√
x
dx = lim
R→0,R>0
(2
√
1 −2
√
R) = 2. (2.80)
From now on, when we write an integral which is either over an in-
ﬁnite interval or which is over a function that is not deﬁned at the
end points, we actually mean to write limits as above. Indeed, there
is one further possibility, namely a function that diverges inside the
integration interval. An example is given by
f(x) =
1

[x[
(2.81)
over the interval [−1, 1]. In that case we deﬁne

1
−1
f(x)dx = lim
R→0,R>0
[

−R
−1
f(x)dx +

1
R
f(x)dx]. (2.82)
2.6. INTEGRATION 99
2.6.2 The integral comparison criterion
Now let me use what we have learnt here to derive a very useful method
to determine the convergence of some inﬁnite series that is otherwise
extremely diﬃcult to decide. In an example I had shown you how to
work out the integral of the function f(x) = x. The task was simpli-
ﬁed considerably by the fact that f(x) = x is monotonically growing.
As a consequence we could determine the largest and smallest value
in each interval very easily. They were assumed by the function at
the end points of the intervals. Let us see whether we can make use
of the monotonicity for general functions. As we would like to learn
something about inﬁnite series, we will need to consider functions that
are deﬁned on the interval [1, ∞[ and which are monotonically decreas-
ing on that interval as we could otherwise not hope for convergence in
the ﬁrst place. We also need to assume that the function only takes
positive values. Now we know that we can always ﬁnd a lower and an
upper bound on the integral of f on that interval. We simply divide
the interval [1, ∞[ into intervals of unit length [n, n + 1] where n runs
through the natural numbers. Then we ﬁnd
R−1
¸
n=1
f(n) ≥

R
1
f(x)dx ≥
R−1
¸
n=1
f(n + 1). (2.83)
If the limit of the integral for R → ∞ exists, then the series on the
right hand side is bounded from above and because it is growing mono-
tonically it converges as well. If on the other hand we know for some
reason that the series converges in this limit then we know that the
integral is bounded by the left hand side of the inequality and therefore
it exists. Therefore we have the following
Integral comparison criterion: Given a function f : [0, ∞[→R
+
that is monotonically decreasing, then the series
¸
∞
n=1
f(n) converges
exactly if

∞
1
f(x)dx exists. If the integral exists then we have
∞
¸
n=1
f(n) ≥

∞
1
f(x)dx ≥
∞
¸
n=2
f(n). (2.84)
This is a very useful criterion indeed, as the following examples will
show you.
100 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
1. The series
∞
¸
n=1
1
n
(2.85)
diverges. Of course you know this already because I have shown
you earlier in the lecture. But let us see this by considering the
corresponding integral
lim
R→∞

R
1
1
x
dx = lim
R→∞
lnR = ∞. (2.88)
From the fact that the integral does not exist we can, for the inte-
gral comparison theorem, conclude that the corresponding series
does not converge.
2. Now consider the series
∞
¸
n=2
1
nlnn
(2.89)
to show that it also diverges. This can be seen by considering the
corresponding integral
lim
R→∞

R
2
1
xlnx
dx = lim
R→∞
ln(lnR) = ∞. (2.92)
2.6. INTEGRATION 101
From the fact that the integral does not exist we can, for the inte-
gral comparison theorem, conclude that the corresponding series
does not converge.
3. Now consider
∞
¸
n=2
1
n(lnn)
2
(2.93)
to show it converges. This can be seen by considering the corre-
sponding integral
lim
R→∞

R
2
1
x(lnx)
2
dx = lim
R→∞
(
1
2(ln2)
2
−
1
R(lnR)
2
) =
1
2(ln2)
2
.
(2.96)
From the fact that the integral exists we can, by the integral
comparison theorem, conclude that the corresponding series con-
verges.
Imagine someone would have asked you to prove convergence or
divergence of the series without this criterion.
102 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
2.6.3 Interchanging Limites
In the lectures so far you have learned a lot about various limites. They
showed up in sequences, series, in the deﬁnitions of real numbers, conti-
nuity, diﬀerentiability and the integrability of functions. Sometimes you
may encounter two limiting procedures at the same time. An example
were integrals with inﬁnite boundaries which we deﬁned for example as

∞
0
f(x)dx := lim
R→∞

R
0
f(x)dx . (2.97)
Therefore we ﬁrst evaluate an integral over a ﬁnite interval, which in
itself involves a limiting process, and subsequently take the limit R →
∞. Here it is clearly stated in which order to take the limit, but could
one perhaps interchange the limits? This is a question that shows up in
many diﬀerent contexts and often the order of the limites is important.
In mathematics it will have to be deﬁned while in physics it may be
imposed by the experimental situation. In the following I want to
show you some examples in which it is dangerous to interchange the
limites and I also would like to give you conditions when it is perfectly
acceptable to interchange the limites, ie the results do not depend on
the order in which the limit is taken.
Let us consider a function of a single variable and see whether prob-
lems can arise. Take a really simple example and consider on the in-
terval [0, 1] the function
f
n
(x) =

nx
n−1
for 0 ≤ x < 1
0 for x = 1
(2.98)
Now we ﬁnd that
lim
n→∞
f
n
(x) = 0 . (2.99)
So quite clearly the function converges for every value of the argument
x to zero. Now let us evaluate the integral of the function f
n
(x). We
ﬁnd

1
0
f
n
(x)dx = 1 . (2.100)
Now taking the limit we ﬁnd
lim
n→∞

1
0
f
n
(x)dx = 1 = 0 =

1
0
0dx =

1
0
lim
n→∞
f
n
(x)dx . (2.101)
2.6. INTEGRATION 103
So, as a consequence we learn that for such a harmless function as f
n
(x)
we see that we cannot interchange the limit in n with the integration.
So what is going on here? When we look at the function f
n
(x)
as a whole for various values of n then we see that very near x = 1
then function becomes larger and larger with increasing n, but that the
region where the function is large becomes narrower and narrower. In
the integration we therefore have an area that becomes narrower and
higher in such a way that it always gives a ﬁnite contribution. Indeed,
in the limit n → ∞ we have an area that corresponds to 0 ∞ and as
you learnt in the earlier lectures this is a tricky quantity.
In summary you observe the interesting phenomenon that although
for every ﬁxed x the function f
n
(x) converges there is an increasingly
narrow region for which the function becomes arbitrarily large and it
is exactly this behaviour that leads to problems in the interchanging
of the limit with the integration. This is the motivation to introduce
a stronger notion of convergence than the previously considered notion
of pointwise convergence, ie convergence for a ﬁxed x. Indeed, what
we would like to have is that the function converges uniformly. Let me
give a formal deﬁnition.
Deﬁnition 45 A sequence of function f
n
(x) deﬁned on an interval 1
converges uniformly to a function f(x) if
lim
n→∞
sup
x∈I
¦[f
n
(x) −f(x)[¦ = 0 (2.102)
ie if for all > 0 there is an N such that for all n ≥ N and all x ∈ 1
we have [f
n
(x) −f(x)[ ≤ .
Here sup
x∈I
f(x) denotes the largest value of f(x) on the interval 1.
The great thing about uniform convergence is that once you have
established it, then you can be relatively relaxed about interchanging
the limites. The following statements hold true
Theorem 46 Assume that in the interval [a, b] the sequence f
n
(x) con-
verges uniformly to f(x) and assume that the functions f
n
(x) are inte-
grable for any n, then f(x) is also integrable and we have

b
a
f
n
(x)dx + (b −a)
Now we know from uniform convergence again that in the limit n → ∞
the value of can be made arbitrarily small so that we ﬁnd
lim
n→∞

b
a
f
n
(x)dx ≤

b
a
f(x)dx ≤ lim
n→∞

b
a
f
n
(x)dx
which implies
lim
n→∞

b
a
f
n
(x)dx =

b
a
f(x)dx
which was the statement that we wanted to prove.
Therefore we can interchange the limit with the integration in this
case. Let us now consider when we can interchange the limits with
respect to n and x.
f
n
(x) =
x
2
x
2
+ (
1
n
)
2
(2.104)
2.6. INTEGRATION 105
We ﬁnd
lim
x→0
( lim
n→∞
x
2
x
2
+ (
1
n
)
2
) = lim
x→0
1 = 1 (2.105)
So, here the limit exists. If we take the limit the other way around the
we ﬁnd
lim
n→∞
(lim
x→0
x
2
x
2
+ (
1
n
)
2
) = lim
n→∞
0 = 0 (2.106)
So, again, we make the very important observation that the limites
cannot necessarily be interchanged. Of course there may be compelling
physical reasons for choosing a particular ordering for the limits but
in the absence of such reasons, one has to take great care with multi-
ple limits. Again the problem lies in the lack of uniform convergence.
Indeed, we have
Theorem 47 Given functions f
n
(x) deﬁned on the interval 1 assume
that f
n
(x) converges uniformly to f(x) and assume that lim
x→a
f
n
(x) =
c
n
exists for all n, then also lim
n→∞
c
n
and lim
x→a
f(x) exist and both
limits are equal, or in other words
lim
x→a
lim
n→∞
f
n
(x) = lim
n→∞
lim
x→a
f
n
(x) (2.107)
Proof: The proof is an exercise.
There is another rather important combination of operations that
involve limiting processes. These are diﬀerentiation and integration.
Often it turns out to be useful to introduce an extra parameter in your
integral and diﬀerentiate with respect to it. Or perhaps the physical
situation that you have depends on various parameters. If for example
your function is of the form f(x, t) = tx where t is for example time,
then we have that

1
0
d
dt
f(x, t)dx . (2.109)
106 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
So, in this case you can interchange the order of diﬀerentiation and
integration. But in general this is not the case. Lets see whether we
can guess what is needed. Let us assume a function of two variables
f(x, t) and deﬁne F(t) =

b
a
df(x, t)
dt
dx (2.114)
To be able to say that we have equality all the way through, then we
need to have that the limit can be taken under the integral and that the
function f(x, t) can actually be diﬀerentiated in the parameter t and
ﬁnally the result should be an integrable function in x (these hoped for
equalities are indicated by ). Indeed, one can show the following
Theorem 48 (Leibniz rule) If
df(x,t)
dt
exists and f(x, t) as well as
df(x,t)
dt
are continuous in both x and t, then we have
d
dt

b
a
f(x, t)dx =

b
a
df(x, t)
dt
dx (2.115)
Example: It is well known that

∞
−∞
e
−x
2
=
√
π Consider the integral
F
n
=

∞
−∞
x
2n
e
−x
2
dx . (2.116)
One useful way of evaluating this integral is by introducing an extra
parameter t such that
F
n
(t) =

∞
−∞
x
2n
e
−x
2
dx = lim
t→∞
F
n
(t) =
√
π(
n
¸
k=1
2k −1
2
) (2.120)
This is certainly a lot easier than doing integration by parts n times
and illustrates that it may be useful to insert some parameter into an
integral as long as one pays attention to the possibility that one may
not be able to interchange limites in some cases.
108 CHAPTER 2. FUNCTIONS OF REAL VARIABLES
Chapter 3
Vectors and Matrices
In the ﬁrst part of this section I will brieﬂy recap basic notions of vector
and matrices. I will then put them to work when we are talking about
markov processes, entropy and disorder.
3.1 Vectors
In the ﬁrst term you got to know column vectors with two components
such as
x =

x
1
x
2

(3.1)
with some number x
1
and x
2
. Of course this can also be generalized to
any number of components, ie to column vectors with n components
such as
x =

¸
¸
x
1
.
.
.
x
n

. (3.2)
The laws for the addition and of two vectors and for the multiplication
of a vector by a number are analogous to those for vectors with 2
components, ie
x + y =

¸
¸
x
1
.
.
.
x
n

+

¸
¸
y
1
.
.
.
y
n

=

¸
¸
x
1
+ y
1
.
.
.
x
n
+ y
n

(3.3)
109
110 CHAPTER 3. VECTORS AND MATRICES
and
λx = λ

¸
¸
x
1
.
.
.
x
n

=

¸
¸
λx
1
.
.
.
λx
n

. (3.4)
With these rules you have the usual properties for vectors.
1. We call V a set of vectors and a is a vector if it is element of V .
The we have
(a) ∀a,

.
Again it is easy to conﬁrm that the set of real n n matrices
with the rules that we have deﬁned here forms a vector space.
Note that we are used to consider matrices as objects acting on
vectors, but as we can see here we can also consider them as
elements (vectors) of a vector space themselves.
3.2 Matrices
Vectors can be transformed. For example we can stretch them or ro-
tate them. Many operations are possible, but of particular signiﬁcance
in physics are linear operations which are described by matrices. An
example is

λ 0
0 λ

a
1
a
2

=

λa
1
λa
2

(3.6)
which stretches a vector by a factor or

0 1
−1 0

a
1
a
2

=

a
2
−a
1

(3.7)
112 CHAPTER 3. VECTORS AND MATRICES
which rotates a vector by 90 degrees. We call matrices linear maps
because the have the property that
M(λx + µy) = λMx + µMy (3.8)
which means that one can ﬁrst add the vectors and then apply the
transformation or one can ﬁrst apply the transformation on the indi-
vidual vectors and then add the results. All this is again true for n
dimensions as well.
3.3 Eigenvalues, eigenvectors, singular val-
ues
When a matrix is applied to a vector the result is generally a diﬀerent
vector. However, for any matrix there will be some vectors that are
special in the sense that upon multiplication with that matrix they will
not change their orientation but only their length. Such vectors are
called eigenvectors and the stretching factor is called the eigenvalues.
Both are determined as solution to the equation
Mx = λx (3.9)
or equivalently
(M−λ1)x = 0 . (3.10)
Note that the nullvector that is the vector whose components are all
zero does not qualify as an eigenvector. The way to proceed to compute
the eigenvalues is to solve ﬁrst the characteristic polynomial det(M−
λ1) = and then for every solution you compute the vector that solves
Mx = λx. Note that usually one normalizes the eigenvectors such that
the squares of its components add up to 1.
If for an nn matrix we can ﬁnd n independent eigenvectors, then
we can form the matrix B whose columns are the various eigenvectors.
Let us assume that a matrix M has eigenvalues ¦λ
i
¦ and corresponding
eigenvectors x
i
¦, then we have that
B = (x
1
, . . . , x
n
) (3.11)
3.4. FUNCTIONS OF MATRICES 113
and we ﬁnd that
B
−1
MB = D (3.12)
where D is a diagonal matrix whose diagonal entries are the eigenvalues
of M. This can be seen by applying the left hand side to the various
basis vectors
e
i
=

¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
0
.
.
.
0
1
0
.
.
.
0

(3.13)
where only the i − th component of the vector is equal to 1 while all
others vanish. Then we have that
B
−1
MBe
i
= B
−1
Mx
i
= B
−1
λ
i
x
i
= λ
i
e
i
(3.14)
This is exactly the same action as that of the diagonal matrix D which
together with linearity shows that the actions of the two matrices re
the same for all vectors.
3.4 Functions of matrices
Quite frequently it will be necessary to compute a function of a ma-
trix. In some cases it is straightforward to deﬁne what is meant by
the function of a matrix. For example if we are given the function
f(x) = x
2
then it is clear that for a matrix A we deﬁne f(A = A
2
.
More generally however, it may not be quite so clear how to compute
functions of a matrix and indeed, there are two ways which are useful
in diﬀerent regimes. Fortunately they do not contradict each other in
the situations where both can be applied.
Deﬁnition 49 Given an operator
ˆ
A with eigenvalues a
i
, and eigen-
vectors a
i
such that there is a matrix B that allows to diagonalize the
matrix A, ie we have A = BDB
−1
with a diagonal matrix D. Further
114 CHAPTER 3. VECTORS AND MATRICES
have a function f : R → R that maps real numbers into real numbers
then we deﬁne
f(
ˆ
A) := Bf(D)B
−1
(3.15)
where f(D) is the matrix whose diagonal elements are given by f(a
i
)
when the a
i
are the diagonal elements of D.
Example: Given a diagonal matrix
D =

¸
d
1
0 0
0 d
2
0
0 0 d
3

(3.16)
then
f(D) =

¸
f(d
1
) 0 0
0 f(d
2
) 0
0 0 f(d
3
)

(3.17)
Obviously the above way of deﬁning the function of a matrix is only
good when you can diagonalize the matrix. When this is impossible,
then not all is lost. Indeed, if the function can be expanded into a
Taylor series then there is another way of deﬁning the function of a
matrix.
Deﬁnition 50 Given a function f : R → R that can be expanded into
a power series
f(z) =
∞
¸
i=0
f
i
z
i
(3.18)
then we deﬁne
f(
ˆ
A) =
∞
¸
i=0
f
i
A
i
. (3.19)
Note also the deﬁnition of the derivative of a function of a matrix.
Deﬁnition 51 The derivative of an operator function f(
ˆ
A) is deﬁned
via g(z) =
df
dz
(z) as
df(
ˆ
A)
d
ˆ
A
= g(
ˆ
A) . (3.20)
3.5. MARKOV PROCESSES 115
Let us see whether the two deﬁnitions Def. 49 and 50 coincide for
operators with complete set of eigenvectors and functions that can be
expanded into a power series given in Eq. (3.18).
f(
ˆ
A) =
∞
¸
k=1
f
k
A
k
=
∞
¸
k=1
f
k
(BDB
−1
)
k
=
∞
¸
k=1
f
k
BD
k
B
−1
= B(
∞
¸
k=1
f
k
D
k
)B
−1
= Bf(D)B
−1
(3.21)
Exercise:
1) Show that for any orthogonal operator
ˆ
U, ie an operator that has
the property
ˆ
U
ˆ
U
T
= 1 we have f(
ˆ
U
†
ˆ
A
ˆ
U) =
ˆ
U
†
f(
ˆ
A)
ˆ
U
Proof: We use the fact that
ˆ
U
ˆ
U
†
= 1 to ﬁnd
f(
ˆ
U
†
ˆ
A
ˆ
U) =
∞
¸
k=0
f
k
(
ˆ
U
†
ˆ
A
ˆ
U)
k
=
∞
¸
k=0
f
k
ˆ
U
†
ˆ
A
k
ˆ
U
=
ˆ
U
†
(
∞
¸
k=0
f
k
ˆ
A
k
)
ˆ
U
=
ˆ
U
†
f(
ˆ
A)
ˆ
U .
3.5 Markov processes
You may have heard a few times already statements like: ”Disorder
can only increase in time.” In the following sections I would like to
116 CHAPTER 3. VECTORS AND MATRICES
illuminate this statement critically. Indeed, I would like to show you
when this statement is correct. To this end I will need to deﬁne what
is meant by disordered and how one can quantify disorder.
Given are chairs that are numbered from 1 to N and a probability
distribution for ﬁnding a person on one of those chairs, ie p(1) is the
probability that the person is sitting on chair 1. Of course, as any
decent probability, we have to have that
¸
N
i=1
p(i) = 1 and all the
1 ≤ p(i) ≥ 0. Now consider the following game. Assume that after a
ﬁxed amount of time the person throws a coin and with probability 1/2
he remains in his chair and with probability 1/2 he moves from chair
k to the next chair k + 1. if he is in chair N then he moves to chair 1.
Given the probability distribution after step n, what is the probability
distribution after step n + 1? The answer can be captured in a matrix
via
p
n+1
=

p
n
(3.22)
where the vector p
n
stands for the column vector containing the prob-
ability distribution at step n. In general, for whatever set of rules that
I invent, I will ﬁnd a matrix M such that
p
n+1
= M p
n
(3.23)
Of course, not any matrix will do. Indeed, we have to make sure that a
probability distribution is mapped again into a probability distribution.
In other words, if
¸
N
i=1
p
n
(i) = 1 and all the p
n
(i) ≥ 0 then also
¸
N
i=1
p
n+1
(i) = 1 and all the p
n+1
(i) ≥ 0. As this has to be true
for any valid probability distribution p
n
, we ﬁnd the condition that
¸
N
i=1
M
ij
= 1 for any j because this ensures that
N
¸
i=1
p
n+1
(i) =
N
¸
i=1
N
¸
j=1
M
ij
p
n
(j) =
N
¸
j=1
N
¸
i=1
M
ij
p
n
(j) =
N
¸
j=1
p
n
(j) = 1
(3.24)
Deﬁnition 52 A n n matrix M is called a stochastic matrix if for
3.5. MARKOV PROCESSES 117
every j we have that
¸
N
i=1
M
ij
= 1, ie all the columns of the matrix
sum up to one.
Using the uniform probability distribution for N events deﬁned by
e =
1
N

¸
¸
1
.
.
.
1

(3.25)
we can write this as
M
T
e = e (3.26)
Now the big question is, what happens when we consider the limit
lim
n→∞
p
n
(3.27)
Before we look at this question in general, let us consider two special
cases, that illuminate quite well what can happen.
Example I: To reduce the mathematical complications as much as
possible, let us consider probability distributions with just two possible
events and consider

(3.41)
Therefore the limiting distribution is not uniform, ie both possibilities
1 and 2 are equally likely.
3.5. MARKOV PROCESSES 119
It is clear that there are matrices M such that the limiting dis-
tribution is uniform and there are others where it is not uniform. In
particular in example II we observe that, even if we start with the uni-
form distribution, we obtain in the limit a non-uniform distribution. In
following lectures we will see that the uniform distribution is the most
disordered disribution of all and as a consequence we observe that there
are processes in which disorder decreases with increasing n.
The question is now which structure of M ensures that the limiting
distribution is actually uniform. The answer to that question requires
the concept of doubly stochastic matrices.
Deﬁnition 53 A nn matrix M is called a doubly stochastic matrix if
for every j we have that
¸
N
i=1
M
ij
= 1 =
¸
N
i=1
M
ji
, ie all the columns
as well as all of the rows of the matrix sum up to one. In other words
Me = e and M
T
e = e.
The fact that the matrix M is doubly stochastic is not suﬃcient
to ensure that the limiting distribution is uniform irrespective of the
initial distribution. This becomes obvious from the trivial example
M =

1 0
0 1

(3.42)
for which
lim
n→∞
p
n
= p
0
. (3.43)
for any choice of p
0
. The problem however is quite clear, namely the
matrix M has more than one eigenvector to the eigenvalue 1. Indeed,
we ﬁnd
Deﬁnition 54 For a doubly stochasrtic matrix M which has exactly
one eigenvector corresponding to the eigenvalue 1, then we ﬁnnd that
lim
n→∞
p
n
= e (3.44)
ie the limiting distribution is uniform.
It is under these conditions that the disorder in the distribution is
increasing. But how do we actually quantify disorder?
120 CHAPTER 3. VECTORS AND MATRICES
Chapter 4
Entropy, disorder and
information
Disorder may be understood as the lack of information, eg quantiﬁed
by the average number of questions that we may have to ask to identify
an object or to ﬁnd it. Therefore let me start by trying to build an
intuitive understanding of the concept of classical information. A more
quantitative approach will be taken in section 4.1, but for the full blown
mathematical apparatus I have to refer you to textbooks, e.g. Cover
and Thomas’ book ’Elements of information theory’.
Imagine that you are holding an object, be it an array of cards,
geometric shapes or a complex molecule and we ask the following ques-
tion: what is the information content of this object? To answer this
question, we introduce another party, say a friend, who shares some
background knowledge with us (e.g. the same language or other sets
of prior agreements that make communication possible at all), but who
does not know the state of the object. We deﬁne the information con-
tent of the object as the size of the set of instructions that our friend
requires to be able to identify the object, or better the state of the
object. For example, assume that the object is a spin-up particle and
that we share with the friend the background knowledge that the spin is
oriented either upwards or downwards along the z direction with equal
probability (see ﬁg. 4.1 for a slightly more involved example). In this
case, the only instruction we need to transmit to another party to let
him recreate the state is whether the state is spin-up ↑ or spin-down
121
122 CHAPTER 4. ENTROPY, DISORDER AND INFORMATION
Figure 4.1: An example for a decision tree. Two binary choices have to
be made to identify the shape (triangle or square) and the orientation
(horizontal or rotated). In sending with equal probability one of the
four objects, one therefore transmits 2 bits of information.
↓. This example shows that in some cases the instruction transmitted
to our friend is just a choice between two alternatives. More generally,
we can reduce a complicated set of instructions to n binary choices. If
that is done we readily get a measure of the information content of the
object by simply counting the number of binary choices. In classical
information theory, a variable which can assume only the values 0 or
1 is called a bit. Instructions to make a binary choice can be given by
transmitting 1 to suggest one of the alternative (say arrow up ↑) and 0
for the other (arrow down ↓).
To sum up, we say that n bits of information can be encoded in
a system when instructions in the form of n binary choices need to
be transmitted to identify or recreate the state of the system. In the
following we will turn this idea into a more precise form.
4.1. QUANTIFYING CLASSICAL INFORMATION 123
4.1 Quantifying classical information
In 1948 Shannon developed a rigorous framework for the description of
information and derived an expression for the information content of the
message which depends on the probability of each letter occurring and
results in the Shannon entropy. We will illustrate Shannon’s reasoning
in the context of the example above. Shannon invoked the law of large
numbers and stated that, if the message is composed of N letters where
N is very large, then the typical messages will be composed of Np
1
1’s
and Np
0
0’s. For simplicity, we assume that N is 8 and that p
1
and
p
0
are
1
8
and
7
8
respectively. In this case the typical messages are the
8 possible sequences composed of 8 binary digits of which only one is
equal to 1 (see left side of ﬁgure 4.2). As the length of the message
Figure 4.2: The idea behind classical data compression. The most
likely sequences are relabeled using fewer bits while rare sequences are
discarded. The smaller number of bits still allows the reconstruction of
the original sequences with very high probability.
increases (i.e. N gets large) the probability of getting a message which
is all 1’s or any other message that diﬀers signiﬁcantly from a typical
sequence is negligible so that we can safely ignore them. But how many
distinct typical messages are there? In the previous example the answer
was clear: just 8. In the general case one has to ﬁnd in how many
ways the Np
1
1’s can be arranged in a sequence of N letters? Simple
124 CHAPTER 4. ENTROPY, DISORDER AND INFORMATION
combinatorics tells us that the number of distinct typical messages is

N
Np
1

=
N!
(Np
1
)!(Np
0
)!
(4.1)
and they are all equally likely to occur. Therefore, we can label each of
these possible messages by a binary number. If that is done, the num-
ber of binary digits I we need to label each typical message is equal
to log
2
N!
Np
1
!Np
0
!
. In the example above each of the 8 typical message
can be labeled by a binary number composed by I = log
2
8 = 3 digits
(see ﬁgure 4.2). It therefore makes sense that the number I is also
the number of bits encoded in the message, because Alice can unam-
biguously identify the content of each typical message if Bob sends her
the corresponding binary number, provided they share the background
knowledge on the labeling of the typical messages. All other letters
in the original message are really redundant and do not add any in-
formation! When the message is very long almost any message is a
typical one. Therefore, Alice can reconstruct with arbitrary precision
the original N bits message Bob wanted to send her just by receiving I
bits. In the example above, Alice can compress an 8 bits message down
to 3 bits. Though, the eﬃciency of this procedure is limited when the
message is only 8 letters long, because the approximation of considering
only typical sequences is not that good. We leave to the reader to show
that the number of bits I contained in a large N-letter message can in
general be written, after using Stirling’s formula, as
I = −N(p
1
log
2
p
1
+ p
0
log
2
p
0
) . (4.2)
If we plug the numbers
1
8
and
7
8
for p
0
and p
1
respectively in equation
4.2, we ﬁnd that the information content per symbol
I
N
when N is very
large is approximately 0.5436 bits. On the other hand, when the binary
letters 1 and 0 appear with equal probabilities, then compression is not
possible, i.e. the message has no redundancy and each letter of the
message contains one full bit of information per symbol. These results
match nicely the intuitive arguments given above.
Equation 4.2 can easily be generalized to an alphabet of n letters ρ
i
each occurring with probabilities p
i
. In this case, the average informa-
tion in bits transmitted per symbol in a message composed of a large
number N of letters is given by the Shannon entropy:
4.2. ELEMENTS OF THE THEORY OF MAJORIZATION 125
I
N
= H¦p
i
¦ = −
n
¸
i=1
p
i
logp
i
. (4.3)
We remark that the information content of a complicated classical
system composed of a large number N of subsystems each of which
can be in any of n states occurring with probabilities p
i
is given by
N H¦p
i
¦.
4.2 Elements of the theory of majoriza-
tion
In the previous section we have derived the entropy as a sensible mea-
sure for quantifying disorder via the idea of disorder being characterized
by a lack of information. In this section I am going to reﬁne these ideas
somewhat. Instead of comparing two probability distributions only via
their entropy, we are now looking at a whole range of functions of these
probability distributions and compare those numbers. This approach
runs under the title of the theory of majorization. This approach will
give us a reﬁned picture of disorder of probability distributions. Fur-
thermore, in recent years it has emerged that the theory of majoriza-
tion is of central importance in quantum mechanics and in particular
in quantum information theory.
Let us consider probability distributions that are ordered in de-
scending order, ie p(1) ≥ p(2) ≥ . . . ≥ p(n) and q(1) ≥ q(2) ≥ . . . ≥
q(n) then we say that p is majorized by q, or in formulas, p ≺ q, if for
any value of k that satisﬁes 1 ≤ k ≤ n −1 we have
k
¸
i=1
p(i) ≤
k
¸
i=1
q(i) (4.4)
and
n
¸
i=1
p(i) ≤
n
¸
i=1
q(i) (4.5)
More generally, for two vectors p and q that are not yet in non-increasing
order, one ﬁrst orders them and then checks the conditions above.
126 CHAPTER 4. ENTROPY, DISORDER AND INFORMATION
The relation to disorder is that it will make sense to say that if
x ≺ y then x is more disordered than y.
Examples: Consider the distributions
x =

¸
¸
¸
¸
0.4
0.4
0.1
0.1

y =

¸
¸
¸
¸
0.5
0.25
0.25
0

(4.6)
Then it is straightforward to check that we do not have that x ≺ y
because
0.4 ≤ 0.5 (4.7)
0.4 + 0.4 ≥ 0.5 + 0.25 (4.8)
0.4 + 0.4 + 0.1 ≤ 0.5 + 0.25 + 0.25 (4.9)
0.4 + 0.4 + 0.1 + 0.1 = 0.5 + 0.25 + 0.25 (4.10)
The entropies corresponding to the two distributions are diﬀerent. In-
deed one ﬁnds that H(x) = 1.7219 ≥ H(y) = 1.5. As a consequence
we observe that the relation ≺ is not a total order in the sense that
sometimes we neither that x ≺ y nor y ≺ x while we can always say
that either H(x) ≥ H(y) or H(x) ≤ H(y).
Consider the distributions
x =

¸
¸
1
3
1
3
1
3

y =

¸
¸
1
2
1
2
0

(4.11)
Then it is straightforward to check that x ≺ y because
1
3
≤
1
2
(4.12)
1
3
+
1
3
≤
1
2
+
1
2
(4.13)
1
3
+
1
3
+
1
3
=
1
2
+
1
2
(4.14)
4.2. ELEMENTS OF THE THEORY OF MAJORIZATION 127
Quite clearly, the distribution x would be considered as being more
disordered than the distribution of y, a viewpoint that is conﬁrmed by
computing the entropies H(x) = 1.5850 ≥ H(y) = 1. Indeed, we will
soon learn that x ≺ y implies H(x) ≥ H(y)
The fact that the idea of majorization is related to that of disorder
is made more clear by the following. Take a probability vector x and
interchange two of the elements to obtain x

and chose any λ ∈ [0, 1]
then
λx + (1 −λ)x

≺ x (4.15)
Quite clearly, the probability distribution on the left hand side has been
obtained from x by jumbling things a bit more and it can therefore be
rightfully be said that the left hand side represents a more disordered
situation than the right hand side. Instead of proving the above relation
directly, I will prove a somewhat more general statement in terms of
doubly stochastic matrices which includes, as a special case, the above
relation.
Indeed, we have
Theorem 55 Given a doubly stochastic matrix M and the Markov pro-
cess
p
k+1
= M p
k
(4.16)
then we have for all values of k that
p
k+1
≺ p
k
(4.17)
ie in the sense of majorization, the probability distribution becomes
more and more disordered.
Proof: To simplify notation, we will assume in the following that both
vectors p
k+1
and p
k
are in non-increasing order, ie p
k+1
(1) ≥ p
k+1
(2) ≥
. . . ≥ p
k+1
(n) as well as p
k
(1) ≥ p
k
(2) ≥ . . . ≥ p
k
(n). If this is not
the case, then one can always achieve this by reordering the vectors.
This does also lead to a rearrangement of the matrix M but it does not
aﬀect the property that M is doubly stochastic.
In the following we will use the facts that
0 ≤
r
¸
i=1
M
ij
≤ 1
n
¸
j=1
r
¸
i=1
M
ij
= r (4.18)
128 CHAPTER 4. ENTROPY, DISORDER AND INFORMATION
which both follow from the doubly stochasticity of the matrix M. Now
consider
r
¸
i=1
p
k+1
(i) −
r
¸
i=1
p
k
(i) =
r
¸
i=1
n
¸
j=1
M
ij
p
k
(j) −
r
¸
i=1
p
k
(i)
=
n
¸
j=1

(p
k
(i) −p
k
(r))
≤ 0
where we have used in the last step that p
k
(1) ≥ p
k
(2) ≥ . . . ≥ p
k
(n).
Therefore we have shown that for any r
r
¸
i=1
p
k+1
(i) ≤
r
¸
i=1
p
k
(i)
and therefore
p
k+1
≺ p
k
This ﬁnishes the proof.
4.2. ELEMENTS OF THE THEORY OF MAJORIZATION 129
Now we realize that the example I gave above is just a special case
of this theorem. If we are given the probability vector x and we inter-
change the elements j and k to obtain x

and then form
λx + (1 −λ)x

(4.19)
we can write this as well as
λx + (1 −λ)x

= Mx (4.20)
where the matrix M has non-zero entries only when i takes all values
from 1 to n except j and k when we have M
ii
= 1. Furthermore we
have M
jj
= λ and M
kk
= λ and M
jk
= 1 − λ = M
kj
. This matrix is
easily veriﬁed to be doubly stochastic so that the above theorem applies
and we ﬁnd
λx + (1 −λ)x

= Mx ≺ x. (4.21)
Note that indeed, there is an even closer relation between majoriza-
tion and doubly stochastic matrices. In fact, we have x ≺ y exactly if
there is a doubly stochastic matrix such that x = My. The proof of
this statement is a little tedious, but can be found for example in the
books of Horn and Johnson.
So, we have seen that in the sense of majorization a Markov process
governed by a doubly stochastic matrix leads to an increase in disorder
in every step. But of course we would also like to see whether the
concept of majorization and of entropy are actually compatible with
each other, ie if a probability distribution p is more disordered than q
according to majorization we hope that the same is true for the entropy.
The aim of the next two small theorems is exactly the veriﬁcation of
this statement. To this end we will employ some ideas from the theory
of convex functions.
Lemma 56 Given a concave function f(x) then we have for x ≤ y
that
f(y) −f(x)
y −x
≤ f

(x) ≥
f(y) −f(x)
y −x
(4.26)
which in turn ﬁnishes the proof.
Example: This is a very useful property which allows us to ﬁnd all
sorts of inequalities. Indeed, if we consider
f(x) = −x log
2
x
then we ﬁnd
f

(x) = −1 −log
2
x
. Then we use the lemma to ﬁnd
−1 −log
2
x ≥
−y log
2
y + x log
2
x
y −x
(4.27)
which leads to
y(log
2
y −log
2
x) ≥ y −x. (4.28)
For probability distributions p and q we therefore ﬁnd
¸
i
p(i) (log
2
p(i) −log
2
q(i)) ≥ 0. (4.29)
Now we wish to prove
Lemma 57 Given two probability distributions p and q such that p ≺ q
then we have that H( p) ≥ H(q).
4.2. ELEMENTS OF THE THEORY OF MAJORIZATION 131
Proof: Consider
n
¸
i=1
p(i) log
2
p(i) = p(1) (log
2
p(1) −log
2
p(2))
+(p(1) + p(2)) (log
2
p(2) −log
2
p(3))
+(p(1) + p(2) + p(3)) (log
2
p(3) −log
2
p(4))
.
.
.
+(p(1) + . . . + p(n −1)) (log
2
p(n −1) −log
2
p(n))
+(p(1) + . . . + p(n)) log
2
p(n)
≤ q(1) (log
2
p(1) −log
2
p(2))
+(q(1) + q(2)) (log
2
p(2) −log
2
p(3))
+(q(1) + q(2) + q(3)) (log
2
p(3) −log
2
p(4))
.
.
.
+(q(1) + . . . + q(n −1)) (log
2
p(n −1) −log
2
p(n))
+(q(1) + . . . + q(n)) log
2
p(n)
=
n
¸
i=1
p(i) log
2
q(i)
≤
n
¸
i=1
p(i) log
2
p(i)
where the last step used the inequality proven in the example. This
ﬁnishes the proof.
Now the proof of the ﬁnal theorem is rather simple.
Theorem 58 Given a doubly stochastic matrix M and the Markov pro-
cess
p
k+1
= M p
k
(4.30)
then we have for all values of k that
H( p
k+1
) ≥ H( p
k
) (4.31)
ie in the sense of entropy, the probability distribution becomes more and
more disordered.
132 CHAPTER 4. ENTROPY, DISORDER AND INFORMATION
Proof: From theorem 55 we ﬁnd that
p
k+1
≺ p
k
(4.32)
and employing the above lemma we ﬁnd
H( p
k+1
) ≥ H( p
k
). (4.33)
This ﬁnishes the proof.
With the above I have shown you under what circumstances we
can expect to obtain a behaviour that corresponds to the second law
of thermodynamics, which very loosely spoken state that the entropy
in an isolated system can never decrease. We have achieved this con-
nection by making heavy use of the ideas of doubly stochastic maps,
majorization and convex functions all of which are very important tools
in thermodynamics and, as has been realized very recently, also in quan-
tum mechanics.
If you wish to learn more about majorization, which is a beautiful
theory in itself, have a look at the books by Horn and Johnson as well
as the one by Marshall and Olkin ’Inequalities: Theory of majoriza-
tion and its applications’ and perhaps the book by Bhatia on ’Matrix
Analysis’ .