Looking at various recent examination papers, it has become clear to
me that there is significant confusion between these two questions.
This post is intended to bring some clarity to the situation.

At the start of this post, I will give an example of the confusion as
it appears in exam questions (and probably elsewhere), and clarify
what the two different phrases mean using the above example. I will
then delve more deeply into the mathematics of these two things, going
beyond A-level content, and use some undergraduate analysis to find
equivalent conditions for them in terms of the derivatives of the
functions. It is fine to skip over the technical stuff and just look
at the results (theorems)!

(Exactly the same applies to the use of the term “decreasing”, but for
simplicity we will focus on increasing functions in this post.)

Here is an example of a question (based on a real exam question) which
typifies the confusion.

The equation of a curve is $y=x^3+4x^2-5x$.

Find the set of values of $x$ for which $y$ is an increasing
function of $x$.

If we replace “increasing function” by another familiar A-level term
describing functions, “one-to-one function”, the question becomes:

A function is given by $f(x)=x^3+4x^2-5x$.

Find the set of values of $x$ for which $f(x)$ is a one-to-one
function of $x$.

This is clearly nonsensical, because whether a function is one-to-one
or not is a property of the function as a whole, not a property of
the function values at any particular input value.

Likewise, a function either is or is not an increasing function; it
is a property of the function as a whole.

Informally (and not quite correctly), we can describe the difference
as follows:

A function is increasing at a point if at that point, the function
has a positive gradient.

An example which shows that these are not the same is the function
$f(x)=-\dfrac{1}{x}$ for $x\ne0$ shown above. This function is
increasing at every value of $x\ne0$, as the gradient is always
positive. However, it is not an increasing function, because
$f(1)<f(-1)$. If, though, we restricted the domain of the function to
$x>0$, then it would be an increasing function.

So the above-quoted exam question does not make any sense, just as the
modified version did not: either $y$ is an increasing function of $x$
or it is not. If the question had instead asked “Find the set of
values of $x$ at which $y$ is increasing,” it would have been fine.

Incidentally, the idea of increasing and decreasing functions connects
very well with the issue of rearranging inequalities (increasing the
depth of connections within the subject): a function can be applied to
both sides of an inequality without changing the direction of the
inequality if the function is (strictly) increasing; it can be applied
but with a change in the direction of the inequality if the function
is (strictly) decreasing, and if the function is neither, then the
function cannot be applied to the inequality. So we cannot square
both sides of an inequality unless we are restricted to non-negative
values, and we cannot take the reciprocal of an inequality unless we
have the same restriction (and in that case, we must also reverse the
direction of the inequality).

It seems reasonable to assert that if a function is an increasing
function, then it will be increasing at every point. There turns out
to be some subtlety to this, as we now delve into a little more
deeply.

A formal definition of increasing

We can give a formal definition of an increasing function. For
example, this definition is from Apostol, Mathematical Analysis, 2nd
ed, p94, and identical definitions appear on the internet:

Definition 1: Let $f$ be a real-valued function whose domain is a
subset $S$ of $\mathbb{R}$.
Then $f$ is said to be an increasing (or nondecreasing)
function if for every pair of points $x$ and $y$ in $S$,
$x<y$ implies $f(x)\le f(y)$.
If $x<y$ implies $f(x)<f(y)$, then $f$ is said to be a strictly
increasing function. (Decreasing functions are similarly defined.)

Note the distinction between increasing and strictly increasing here:
a constant function such as $f(x)=0$ for $x\in\mathbb{R}$ is both an
increasing and decreasing function, though it is not a strictly
increasing function.

We could also try to come up with a definition of increasing at a
point. There are no standard definitions of this idea, and the
following proposed definition is certainly beyond A-level in its
formality. It is based on the definition of continuity, which is
about the behaviour of a function “near” to a point.

Definition 2: Let $f$ be a real-valued function whose domain is a
subset $S$ of $\mathbb{R}$. Then $f$ is said to be increasing at
the point $x$ in $S$ if there is some $\delta>0$ such that:

for every $y$ in $S$ with $x<y<x+\delta$, $f(x)\le f(y)$, and
for every $y$ in $S$ with $x-\delta<y<x$, $f(y)\le f(x)$.

If the $\le$ signs are replaced by $<$ signs in these two
inequalities, then $f$ is said to be strictly increasing at $x$.

With this definition, the above exam question (reworded) makes sense,
and the correct final answer is what the examiner would expect. (One
might wonder whether one could make such a local definition of
one-to-one, and indeed, this is done when considering the Inverse
Function and Implicit Function theorems. But that is a story for
another day.)

Using calculus

So far, no calculus has appeared, yet we typically teach our students
to determine whether a function is an increasing function or to find
where it is increasing by differentiating the function. So let us now
consider how we could use calculus to help us.

For us to be able to use calculus, we need to assume that our function
is differentiable throughout $S$. We could then propose the following
theorem:

Theorem 1 (incorrect attempt): Let $f$ be a real-valued continuous
function whose domain is a subset $S$ of $\mathbb{R}$ and is
differentiable at every (interior) point of $S$. Then $f$ is an
increasing function if and only if $f’(x)>0$ for all $x$ in (the
interior of) $S$.

We could try changing this to say that $f$ is a strictly increasing
function, but that fails if the function has a point of inflection.
For example, $f(x)=x^3$ is a strictly increasing function, even though
its derivative is zero at $x=0$.

We could also try changing the condition to say that $f’(x)\ge0$ for
all $x$ in $S$. However, this also fails: if the graph has a
discontinuity, such as the function $f(x)=-\dfrac{1}{x}$ for $x\ne0$
that we looked at before, then it might have $f’(x)>0$ for all $x$ in
$S$, yet not be an increasing function.

This feels more hopeful, though: after all, the only problem now is
the “hole” in the domain $S$. And it turns out that if we restrict
the domain to be an interval (that is, a subset of the reals with no
“holes”), then it will work:

Theorem 1 (correct version): Let $f$ be a real-valued continuous
function whose domain is an interval $I$ of $\mathbb{R}$ and is
differentiable at every point in (the interior of) $I$. Then $f$ is
an increasing function if and only if $f’(x)\ge 0$ for all $x$ in
(the interior of) $I$.

The formal proof of this is found below, and though it is quite
technical, the theorem itself seems clearly true, and school students
could probably be convinced to believe it (at least once it is written
in more student-friendly language).

What can we say, though, about whether a (differentiable) function is
increasing at a point? Using Definition 2 above, we get the
corresponding theorem:

Theorem 2: Let $f$ be a real-valued continuous function whose domain
is an interval $I$ of $\mathbb{R}$ and which is differentiable at
every (interior) point of $I$. Then is $f$ is increasing at the
point $x$ in $I$ if and only if there is some $\delta>0$ for which
$f’(y)\ge0$ for all $y$ in (the interior of) $I$ with
$x-\delta<y<x+\delta$.

Why is it not sufficient to just require $f’(x)\ge0$? Well, consider
the functions $f(x)=x^3$ and $f(x)=-x^3$. They both have $f’(x)=0$,
yet the first is increasing (indeed, even strictly increasing) at
$x=0$, while the second is decreasing at $x=0$. And a function such
as $f(x)=x^2$ is neither increasing nor decreasing at $x=0$. So we
really do need to consider a small interval around the point of
interest.

(Theorem 2 could be extended, with care, to more general subsets of
$\mathbb{R}$, as we are only discussing a local property of the
function. But it is not particularly interesting to do so.)

So the question of determining at which points a function is
increasing (or decreasing) is more subtle than it appears: not only
does one have to find where the function has derivative $\le0$ (and
not just $<0$), but one also has to determine what is happening at
those points where the derivative is zero, as there are different
types of stationary points. (At those points where the derivative is
strictly positive, the function is certainly strictly increasing,
which follows from Theorem 4 below.)

Things get more complicated if we now wish to consider strictly
increasing (or decreasing) functions. There is a relatively weak
theorem which will suffice much of the time:

Theorem 3: Let $f$ be a continuous real-valued function whose domain
is an interval $I$ of $\mathbb{R}$ and which is differentiable at
every (interior) point of $I$. Then if $f’(x)>0$ throughout $I$,
$f$ is a strictly increasing function.

Note that this is a one-directional theorem; $f(x)=x^3$ for
$x\in\mathbb{R}$ is our standard example of a strictly increasing
function which does not have $f’(x)>0$ throughout the domain because
of the point of inflection at the origin. The proof of Theorem 3
follows exactly as that of Theorem 1.

An easy corollary of this is the following (local) theorem:

Theorem 4: Let $f$ be a continuous real-valued function whose domain
is a subset $S$ of $\mathbb{R}$. If $f$ is differentiable at the
point $x$ in the interior of $S$ and $f’(x)>0$, then $f$ is strictly
increasing at $x$.

This is the theorem which is typically used when answering A-level
exam questions such as the one above. Unfortunately, as we see from
our example of $f(x)=x^3$, this too is a one-directional theorem:
every point at which $f’(x)>0$ is a point at which the function is
strictly increasing, but there may be other points where this is the
case but where $f’(x)=0$. (If $f’(x)<0$, then the function is
strictly decreasing at this point, so it cannot be increasing.) The
question of using calculus to determine where a function is
increasing, rather than strictly increasing, is somewhat more
complicated, as we see from Theorem 2 above. But at A-level, the
functions are always nice enough that the only difficulties will be at
the stationary points.

There is actually a necessary and sufficient condition for a function
to be strictly increasing, but this is more subtle:

Theorem 5: Let $f$ be a continuous real-valued function whose domain
is an interval $I$ of $\mathbb{R}$ and which is differentiable at
every interior point of $I$. Then $f$ is strictly increasing on $I$
if and only if $f’(x)\ge0$ throughout $I$ and there is no
non-trivial subinterval $J$ of $I$ with $f’(x)=0$ for all $x$ in the
interior of $J$.

The proof can be found below.

Teaching this topic at A-level

Putting this all together, we see that Theorem 4 is the crucial
theorem for school use. Teaching the meaning of the term “increasing
function” (Definition 1) and a simplified explanation of “increasing
at a point” (Definition 2), along with Theorem 4 should give a good
grounding. It would also be wise to caution that it is a one-way
theorem by comparing and contrasting examples such as $f(x)=x^2$ and
$f(x)=x^3$.

Proofs of Theorems 1 and 5

This technical appendix uses tools from undergraduate analysis. The
proofs of the other three theorems are very similar to these or they
follow immediately from these.

Theorem 1

Let $f$ be a real-valued continuous function whose domain is an
interval $I$ of $\mathbb{R}$ and is differentiable at every point in
the interior of $I$. Then $f$ is an increasing function if and only
if $f’(x)\ge 0$ for all $x$ in the interior of $I$.

Proof

We show first that if $f$ is an increasing function, then $f’(x)\ge0$
for all $x$ in the interior of $I$, and we argue by contradiction.
Assume that $f’(x_0)<0$ for some $x_0$ in the interior of $I$. Using
the definition of derivative, this means that
$\lim\limits_{\substack{x\to x_0\\ x\in
I}}\dfrac{f(x)-f(x_0)}{x-x_0}<0$. So there is some $x_1\in I$ (where
$x_1\ne x_0$) with $\dfrac{f(x)-f(x_0)}{x-x_0}<0$ (otherwise the limit
would be $\ge0$). If $x_1>x_0$, then multiplying by $x_1-x_0$ gives
$f(x_1)-f(x_0)<0$, so $f(x_1)<f(x_0)$, If $x_1<x_0$, then multiplying
by $x_1-x_0$ gives $f(x_1)-f(x_0)>0$, so $f(x_1)>f(x_0)$. Either way,
this shows that the function is not increasing on $I$, and we have our
desired contradition. Thus if $f$ is an increasing function, we must
have $f’(x)\ge0$ for all $x$ in the interior of $I$.

Conversely, if $f’(x)\ge0$ for all $x$ in the interior of $I$, then
let $x<y$ be any two points in $I$. Then by the mean-value theorem,
there is some $z$ with $x<z<y$ for which $f(y)-f(x)=f’(z)(y-x)$ (and
note that $z$ lies in the interior of $I$ as $I$ is an interval).
Since $f’(z)\ge0$ by assumption, and $y-x>0$, it follows that
$f(y)-f(x)\ge0$, so $f(x)\le f(y)$. Therefore $f$ is an increasing
function.

Theorem 5

Let $f$ be a continuous real-valued function whose domain is an
interval $I$ of $\mathbb{R}$ and which is differentiable at every
interior point of $I$. Then $f$ is strictly increasing on $I$ if
and only if $f’(x)\ge0$ throughout $I$ and there is no non-trivial
subinterval $J$ of $I$ with $f’(x)=0$ for all $x$ in the interior of
$J$.

Proof

We first prove that if the derivative condition is not met, then $f$
is not strictly increasing on $I$. If $f’(x)<0$ at any point in $I$,
then $f$ is not increasing (by Theorem 1), so it is certainly not
strictly increasing. If $f’(x)\ge0$ throughout $I$ but there is a
non-trivial subinterval $J$ of $I$ with $f’(x)=0$ for all $x$ in the
interior of $J$, then $f$ is constant throughout $J$ (by the
mean-value theorem). In particular, there are $y<z$ in $J$ with
$f(y)=f(z)$, showing that $f$ is not strictly increasing.

Conversely, if $f’(x)\ge0$ throughout $I$, then $f$ is increasing by
Theorem 1. Assume now that there is no non-trivial subinterval $J$ of
$I$ with $f’(x)=0$ for all $x$ in the interior of $J$. But if $f$
were not strictly increasing, then there would be $y<z$ in $I$ with
$f(y)=f(z)$, so $f(x)$ is constant on the interval $y<x<z$. (For if
$f(y)<f(x)$ for some $x$ in this interval, we would have $f(x)>f(z)$,
contradicting $f$ increasing.) Therefore $f’(x)=0$ throughout this
interval, contradicting our assumption. So $f$ must be strictly
increasing.

]]>A visit to Michaelahttps://blog.d-and-j.org/teaching/2018/07/20/michaela-visit.html
Julian Gilbey2018-07-20T15:00:00+01:00https://blog.d-and-j.net/teaching/2018/07/20/michaela-visit.htmlHaving recently listened to about 5.5 hours of Craig
Barton interviewing Dani Quinn (part
1
and part
2),
the Head of Mathematics at Michaela Community
School, I decided that it was worth visiting
the school to see their principles in action for myself, so last week,
I took to the buses to visit Wembley.

Though my main interest was the maths teaching, I was fascinated by
the whole experience, so that is what I will focus most of my
attention on here. I used to teach in a school (“W”) with a broadly
similar type of intake: it was in an area with many students from
ethnic minorities and many students on free school meals; that school
was also in an area in which there was a grammar school system, so
many of the highest-attaining students in the catchment area attended
the more selective local schools. This gave me an interesting basis
for comparison.

The most obvious thing which struck me was the atmosphere that
Katharine and her staff have established in the school. It was very
purposeful, and the students I met generally seemed happy and to like
the school. They were polite to me, and some were genuinely
interested in talking to me. (Or at least they gave the convincing
impression that they were!) Some students were immensely proud of
what they were doing and showed off their work to me (without my even
asking).

Many have written about the very strictly enforced behaviour policies.
But what I had not expected was the huge warmth pouring forth from the
staff to the students in their lessons, and the humanity pervading the
school. Whilst demerits were regularly given for infringements of the
school’s very strict behaviour policies - generally accompanied by
just a few seconds’ calm explanation of the positive benefits of doing
what was expected or the negative impact the behaviour was having on
others - merits were given even more liberally (and fairly
consistently between lessons) for behaviours the school wants to
encourage, such as good vocal projection when answering a question,
asking good questions and giving good explanations. And these were
always accompanied by brief warm words. This contrasts so
dramatically with my experience at “W”, where though some teachers
managed their classes well, there wasn’t anything close to a
consistent school-wide system at this level of detail. There is
clearly a benefit to be gained from having such an consistently
enforced system throughout the school, though it is tough for
teachers. (Mind you, it is not as tough as teaching in a school where
students throw things at teachers on a semi-regular basis.)

The most challenging class I saw was a small bottom-set year 10 class,
several of whom had already been permanently excluded from one or more
other schools. Yet there they were, behaving and mostly participating
in the lesson, learning and targeting a grade 4 or 5 at GCSE
Mathematics. Wow. At “W”, lower-middle sets were only targeting a
grade D (on the old system, the equivalent of a grade 3 on the new
system), and most of them did not achieve even that. The contrast
could not be greater.

A few things struck me immediately during the day, without even
entering a classroom. The first was the immaculate state of the
building: not a speck of litter to be seen during the course of the
day. This is in stark contrast to most of the schools I’ve taught in
and visited over the years, and vastly different from “W”. The
students have clearly been taught to respect their environment.

Is the school’s approach a good thing? This is a difficult question
for me to answer. I certainly had a sense that the school was
infusing students with British culture (whatever that means), and yet
for students living in the UK, is this not a good thing? It will give
them significant (British) cultural capital on which they will be able
to draw in later life, and which they might well otherwise not gain.

On the other hand, students are constantly being watched, as are
staff: for example, visitors (with DBS certificates) are permitted to
just walk into any lesson, yet the teachers and students generally
didn’t bat an eyelid when I quietly walked in. Yet this significantly
reduces the chances of bullying and destructive behaviour: there are
no “safe spaces” within the school for bullying or other damaging
behaviour to take place without a teacher seeing.

I observed parts of about eight maths lessons during the day (as well
as a smattering of other subjects). My concern was that they would be
very procedural in nature, given the rigidity of the school system.
However, I was pleasantly surprised: while they were clearly
teacher-led, the questioning did include a good mix of knowledge and
deeper understanding questions. For example, in a lesson on
Pythagoras’s Theorem, there were early questions designed to ensure
that the students knew which side was the hypotenuse, and later
questions which required more thought, such as “If I have a triangle
with side lengths 6, 7, 8, can I draw a right-angle here?” (I am not
overly concerned with Year 8 students not clearly distinguishing
between Pythagoras’s Theorem and its converse. Students were spending
enough effort getting to grips with what the question meant.) Some
time was spent working on questions from their workbooks, but this was
far from the majority of the time.

During one of the lessons, students were asked to read out from their
workbooks. I was surprised - though I probably should not have been -
at how difficult they found it to read technical vocabulary; how often
do we ask our students to read a piece of technical material?

There were also opportunities for discussion in pairs; these were
short and effective, and the students were continually encouraged to
use the time productively, as anyone could be picked on to answer a
question after the discussion time.

Dani spoke much more about the planning process and lesson structure
at Michaela in her podcast, which was fascinating, so I won’t say more
about it here.

Returning home and reflecting, I have two big questions about this
model, and in particular with regard to maths. The first (somewhat
mathematics-specific) question is whether students get enough
opportunity to think about any (mathematics) problem for a protracted
period of time. Hearing about other countries’ approaches, I wonder
whether this is a potentially missed opportunity, especially once
behaviour is so well-managed that there is a good learning atmosphere.

The second, more pervasive question, is about the use of streaming
within the school. (“Streaming” means that students are put in groups
which are dependent upon their academic performanace across a number
of subjects. They remain in these groups for all of their subjects.
As far as I could tell, it is used in Years 7-9 and possibly in Year
10 as well.) It is very effective for behaviour management, as the
entire class is together the whole time, including at lesson
changeover time. However, I am very unconvinced that it is good for
equity, which is part of the school’s mission. Hearing of the
experiences of primary and secondary schools which have moved away
from ability (or better: attainment) grouping to mixed-attainment
grouping, one has to ask whether this would be better for the majority
of the students within the school, certainly at Key Stage 3 (11-14
year olds) and possibly older too. Teachers’ academic expectations of
students are lower when they are teaching lower-attaining groups, and
I strongly doubt that Michaela’s excellent teachers are any less
affected by this.

And would I consider teaching at Michaela? I’m not sure it would be
the “right” school for me, but I would take it over “W” any day.

Finally, the “family lunch”. The initial poetry reading was like
being at a summer youth camp: the energy, enthusiasm and fun were
palpable. There was quite a buzz in the room during this! And the
discussion over lunch - this time about volunteering, in light of the
outstanding work of volunteers in the Thailand cave rescue - was
fascinating.

It was a pleasure to visit the school, and I thank the staff for being
so open and welcoming. I look forward to hearing of their results,
both academic and beyond, in the years to come.

]]>Small angle approximations - an applicationhttps://blog.d-and-j.org/mathematics/teaching/ks5/2018/07/08/small-angles-application.html
Julian Gilbey2018-07-08T22:00:00+01:00https://blog.d-and-j.net/mathematics/teaching/ks5/2018/07/08/small-angles-application.htmlI thought a bit more about my previous
post on small
angle approximations, and decided it might be helpful to describe an
application of the small angle approximations. While this example
contains non-examinable aspects (at least in single maths A-level),
the context should be fairly familiar (or can easily be demonstrated),
and the mathematics is accessible to single maths students (at least
as a demonstration). It also ties together ideas from mechanics and
pure maths, so is helpful in this regard.

The question is: what is the period of a pendulum?

We can model the pendulum as a thin rod (inextensible and rigid) of
length $L$, freely pivoted about a point $O$, with a single point mass
$P$ of mass $m$ on the end of the rod, as shown here (where $T$ is the
tension in the rod):

The velocity and acceleration of $P$ are as follows, where
$\dot\theta$ means $\dfrac{d\theta}{dt}$ and $\ddot\theta$ means
$\dfrac{d^2\theta}{dt^2}$; a derivation of these can be found at the
end:

We can now apply Newton’s second law (“$F=ma$”) to the situation:
working perpendicular to the rod, this gives
$-mg\sin\theta=mL\ddot\theta$ (the minus sign is because the component
of the force $mg$ is in the opposite direction to the $L\ddot\theta$
on our diagram). Rearranging this, we get the differential equation:

Unfortunately, this equation is impossible to solve in terms of simple
functions. But if we assume that the swing of the pendulum is
small, so that $\theta$ is small, then we can approximate
$\sin\theta$ by $\theta$, and our differential equation becomes

This differential equation (an example of simple harmonic motion) has
a solution

(which is easy to check), where $A$ is the amplitude (maximum angle)
of the swing. The period of this swing is $2\pi\sqrt{\dfrac{L}{g}}$,
which is independent of the amplitude and the mass at the end of the
rod! So as long as the swing is relatively small, the period is only
dependent upon the length of the pendulum (and the acceleration due to
gravity), which is likely to be a surprising result the first time it
is met. This would have had great significance for clock-makers in
times gone by.

Deriving the formulae for the velocity and acceleration of $P$

We can work out the velocity and acceleration of $P$ in several
different ways. One way is to use coordinates, where $O$ is the
origin, and the vertical line is the $y$-axis. Then when $P$ is at an
angle of $\theta$, it has a position vector of

A unit vector in the direction of $\overrightarrow{OP}$ is

and a unit vector perpendicular to this in the direction of increasing
$\theta$ is

as shown in this diagram:

The velocity of $P$ can be found by differentiating $\mathbf{r}$ with
respect to time, giving:

Then the acceleration can be found by differentiating again (using the
product rule on both of the components of $\dot{\mathbf{r}}$) to
obtain:

These are the components of the velocity and acceleration shown
above.

Without as much rigour, one could observe that the distance of $P$
along the circumference of the circle is given by $L\theta$, so it is
reasonable to suggest that its speed is $L\dot\theta$ (as $L$ is a
constant). Then the acceleration in this direction is plausibly
$L\ddot\theta$, while the radial acceleration - which we are not
interested in for this application - is a result of the velocity
changing direction.

]]>Small angle approximationshttps://blog.d-and-j.org/mathematics/teaching/ks5/2018/07/05/small-angles.html
Julian Gilbey2018-07-05T22:10:00+01:00https://blog.d-and-j.net/mathematics/teaching/ks5/2018/07/05/small-angles.htmlAt a conference run by the BBO Maths
Hub today, Jo
Morgan mentioned that small angle
approximations are a topic recently (re)introduced to the single maths
A-level course, and many teachers may be unfamiliar with it.

During the day and on my journey home, I thought about this and some
of the connections between it and other areas of the syllabus. So
here are a few quick thoughts on ways we could think about them,
making connections between this and other areas of the syllabus. I
hope that this post offers some different perspectives on the topic.

This is a diagram probably familiar from most A-level textbooks (I
don’t have one to hand, unfortunately). We have our familiar unit
circle, and draw a right-angled triangle with angle $\theta$, opposite
$\sin\theta$ and adjacent $\cos\theta$. We also see that the arc
length subtended by the angle $\theta$ is $r\theta=\theta$ as the
radius is 1. (We must be working in radians for this to be correct!)
Already in this diagram, $\sin\theta$ and $\theta$ do not look very
different, so $\sin\theta\approx\theta$. On the other hand,
$\cos\theta$ looks pretty close to $1$, so we have
$\cos\theta\approx1$. Visually, say using GeoGebra, we see that these
approximations get better as $\theta$ gets smaller: the arc and the
half-chord become closer and closer to each other. We can then work
out $\tan\theta=\dfrac{\sin\theta}{\cos\theta}\approx
{\theta}{1}=\theta$.

Another way of seeing this approximation to $\tan\theta$ is to draw
the triangle with adjacent equal to $1$:

If we take $\sin\theta\approx\theta$, then we can work out a better
approximation for $\cos\theta$ using the binomial theorem. We have,
for small $\theta$ (positive or negative):

where we have used the first two terms of the binomial expansion on
the last line. So $\cos\theta\approx 1-\frac{1}{2}\theta^2$.

Another way of obtaining the approximation for $\cos\theta$ is to
relate cos and sin using a double-angle formula:

so

where we have used $\sin\tfrac{1}{2}\theta\approx\tfrac{1}{2}\theta$
on the second line.

The approximations for $\sin\theta$ and $\tan\theta$ are also closely
related to the shape of their graphs near the origin (though there is
potentially some circular reasoning here - no pun intended!):

We have drawn the graphs of $y=x$ (red), $y=\sin x$ (green) and
$y=\tan x$ (blue). Near the origin, the three graphs look very
similar, so for small $x$, $\sin x\approx x \approx \tan x$.

This also tells us that at the origin, $\frac{d}{dx}(\sin x)$ and
$\frac{d}{dx}(\tan x)$ equal $1$.

We can also argue in the opposite direction. If we have already
convinced ourselves why the derivative of $\sin x$ is $\cos x$ using a
different approach (for example, by using Rotating
derivatives),
then we can say that for small values of $x$, the graph of $y=\sin x$
is approximated by the tangent to the graph at $x=0$ (see A tangent
is…
for more on this point). We can calculate the tangent: since
$\frac{d}{dx}(\sin x)=\cos x$ giving $\cos 0 = 1$, and $\sin 0 = 0$, the
tangent has equation $y=x$. So for small $x$, $\sin x\approx x$.

]]>Dividing fractionshttps://blog.d-and-j.org/mathematics/teaching/ks2/ks3/2018/06/20/dividing-fractions.html
Julian Gilbey2018-06-20T08:05:00+01:00https://blog.d-and-j.net/mathematics/teaching/ks2/ks3/2018/06/20/dividing-fractions.htmlWhy is it that

or as the rule that students are frequently taught: “turn the second
fraction upside-down and multiply”?

I’ve been inspired to revisit this question after listening to Ed
Southall talking on
Mr Barton’s Maths Podcast,
where he mentioned this question.

In this post I suggest a teaching sequence which might lead to an
understanding of the rule above, as well as a procedural knowledge of
how to perform the rule.

Some comments on a familiar approach

I have seen textbooks and websites explain the rule for division of
fractions by talking about how many times we can fit $\frac{1}{3}$
into $\frac{4}{5}$, say, but that seems to me to be quite challenging:
students have to hold on to several ideas at once, and make sense of
diagrammatic representations at the same time as trying to think about
what division means. It also becomes very hard as the fractions
become more complicated. In my experience, few students develop a
solid understanding through this approach: they either get lost in the
reasoning or they resort to following a rule.

This problem ties in quite neatly with some things I have recently
read, in particular:

Liping Ma’s book “Knowing and teaching elementary mathematics”, in
which US and Chinese teachers’ understanding of this rule is
compared.

John Mighton, the founder of JUMP Math,
wrote
The end of ignorance;
he observes there that meaningful symbolic manipulation can
precede both an attempt to explain an idea or technique in
everyday terms, and the development of understanding; moreover,
understanding can emerge from the manipulations if examples are
well-chosen and students are given the opportunity to reflect.

An overview of the idea

The calculation $8-5$ means “what number $\square$ makes $\square+5=8$
true?” Similarly, when we write $12\div 3$, we mean “what number
$\square$ makes $\square\times3=12$ true?” This says that division is
the inverse of multiplication. (More precisely, for each non-zero
number $c$, dividing by $c$ is the inverse of multiplying by $c$.)
The same applies to division of fractions:
$\frac{3}{5}\div\frac{2}{3}$ means “what number $\square$ makes
$\square\times\frac{2}{3}=\frac{3}{5}$ true?”

Once we notice that $\frac{3}{2}\times\frac{2}{3}=1$, we can then
multiply both sides of this equation by $\frac{3}{5}$ to obtain

Therefore $\square$ must be $\frac{3}{5}\times\frac{3}{2}$, or

This method will work for any fraction division question, and so these
steps give us our familiar rule: “turn the divisor upside-down and
multiply”.

A possible teaching sequence

What follows is a suggestion for how these ideas could be introduced
over a sequence of lessons, which could span several months or even
years. This offers students the chance to revisit the ideas again and
again, thereby reinforcing them, as well as building up stronger
connections and a deeper understanding. In the later steps, I assume
that students can multiply fractions.

Step 1: What is subtraction?

We begin by asking students what other number statements they can
deduce from $3+5=8$. There are many possible answers (such as
$30+50=80$), and here we highlight those obtained by rearranging the
numbers. (These could be encouraged by a question such as “Using only
the numbers 3, 5 and 8, what other number statements can you get from
$3+5=8$?”) Three key statements are:

as well as the same statements written the other way round, such as
$5=8-3$; we won’t mention these reversed statements again here.

The last of these three statements says that addition is
commutative: the order of adding does not matter. The other two say
that subtraction is the inverse of addition: the three problems

are equivalent, as are similiar problems about $8-3=\square$. Making
this connection explicit would be beneficial, especially in relation
to the later parts of this sequence of steps.

Students could then be asked to write statements equivalent to
statements such as $10-3=\square$ to reinforce this idea.

This idea may well have already been introduced via a bar model
approach or using Cuisenaire rods or suchlike.

It is useful to recognise that it doesn’t matter whether we are
working with whole numbers, directed numbers, fractions or whatever:
subtraction always has this meaning, so returning to this idea
periodically will benefit students’ understanding.

Step 2: And what is division?

This is the parallel of Step 1 for multiplication and division. What
can be deduced from $3\times4=12$? This again leads to interesting
points such as why $30\times40=120$ is an incorrect statement, whereas
$30+50=80$ is correct. But for our current purposes, the key
deductions are again those obtained by rearrangement:

As before, we see that multiplication is commutative and that division
is the inverse of multiplication. In particular, this means that
answering the question $12\div4=\square$ is the same as filling in the
missing number in $4\times\square=12$; asking students to make
deductions from $12\div4=\square$, as above, will reinforce this idea.

Step 3: 1 divided by a unit fraction

A key part of this approach is to learn about reciprocals of
fractions. We start with the reciprocals of unit fractions.

For this missing-number problem, I would suggest asking students to
work on this themselves rather than showing them how to do the first
one. (I am assuming that they already know enough about fractions to
work out the answers to these questions.)

Students should spot the pattern. Following this by asking questions
such as $\frac{1}{82}\times\square = 1$ can help them to realise that
they can now do some very complicated-sounding questions, even if they
can’t imagine what $\frac{1}{82}$ of a cake might look like. (I was
reminded of this approach by John Mighton’s book.)

Students should then connect this back to the earlier steps, by asking
them to rearrange $\frac{1}{2}\times2=1$. This will allow students to
(re)discover that $1\div 2=\frac{1}{2}$ (and similarly for the other
statements); this can also be used to reinforce the idea that a
fraction such as $\frac{1}{2}$ just means “1 divided by 2”. (The
division symbol itself suggests this: $\div$ is just a fraction with
dots in place of actual numbers.) Another way of rearranging the
number statement gives $1\div\frac{1}{2}=2$, which could be related to
the “practical” meaning of division: there are 2 halves in a whole.

Step 4: Turning a general fraction into an integer

It might be too big of a jump for some students to go straight to
finding the reciprocal of a general fraction, so this step provides a
structured intermediate step, once they are developing some confidence
with the above idea.

Here is a second sequence of missing-number problems:

Once students have worked out answers to these (and perhaps adding a
few more similar examples), either ask them to generalise by making up
their own similar examples, or ask superficially harder questions such
as $\frac{74}{133} \times \square = 74$, so that the structure becomes
clear.

Asking students to rearrange these statements once again results in
statements like $2\div3 = \frac{2}{3}$ (further reinforcing the
division idea) and $2\div \frac{2}{3} = 3$.

Step 5: Finding reciprocals

A useful preparatory question before this step would be something
like: “If you know that $96\times 48=4608$, then what is the missing
number in $96\times \square = 2304$?” This recalls the idea that we
can divide the product by 2 by dividing the multiplicand (or
multiplier) by 2. (The use of two-digit numbers is designed to
discourage students from doing a division!)

In this step, we replace the integers on the right-hand sides of the
previous set of questions with 1:

If students cannot work out how to answer the first question, it would
be helpful to remind them of their answer to $\frac{2}{3}\times
\square = 2$. Tying this to the preparatory question above should
help them get to the answer.

Again, students can be invited to generalise at this point, or to
answer a question like the one in the previous step: $\frac{74}{133}
\times \square = 1$. Also, it is helpful to then rearrange these
results; we have $1\div \frac{2}{3} = \frac{3}{2}$, and we are seeing
the first clear case of turning fractions upside-down.

After these, it could be interesting to also revisit unit fractions:
following the same pattern that we have seen, how else could the
answer to $\frac{1}{3}\times \square = 1$ be written, besides as $3$?

Step 6: Dividing fractions

Before working on the full-blown division of fractions, it would be
useful to preface it by another relevant rearranging activity: how can
the number statement $2\times 3\times 4=24$ be rearranged, while
keeping all of the numbers involved the same? This gives rise to a
number of statements, such as:

This may cause some difficulty and lead to some interesting class
discussions.

And now we can build on the ideas developed in Step 5. How could we
complete the following statements?

A prompting question, if needed, is “What is $\frac{2}{3}\times
\frac{3}{2}$?”

And then what about these, where the two squares should be filled in
following the pattern we have just seen?

Once students feel competent at these, ask how they can use these
to work out:

And with this, students have reached a point where the rule for
dividing by a fraction will make some sense: we multiply the
reciprocal of the divisor (so as to get 1 when it is multiplied by the
divisor itself) by the dividend, which is our well-known rule.

Stuart Price noted that the answer to the last part can be obtained
as the answer to part (iii) divided by the answer to part (iv), by the
definition of conditional probability.

But if we think about what’s going on a little further, we will be
able to understand the structure of this problem more and see further
connections.

The first thing to do to make our life a little simpler is to replace
the specific numbers 5.1 and 3.6 with variables, so that the algebraic
structure becomes clearer. So let’s call the means of the two
independent Poisson distributions $\lambda$ and $\mu$. We will stick
with the 5 and 7 for the time being, and generalise those later.

Therefore our problem says that the number of lorry drivers is
distributed as $\mathrm{Po}(\lambda)$, the number of car drivers is
distributed as $\mathrm{Po}(\mu)$ and the total number of drivers is
distributed as $\mathrm{Po}(\lambda+\mu)$. The relevant probabilities
are then as follows:

But this is just a binomial probability! It is the probability of 5
successes from 7, where the probability of success is
$\frac{\lambda}{\lambda+\mu}$, which equals the mean number of lorry
drivers divided by the mean number of drivers. It is clear that we
could replace 5 and 7 by any numbers $r$ and $n$ in the above
calculation, to deduce that given that there are $n$ drivers in total,
the probability that $r$ of them are lorry drivers is

If we had assumed that the probability that a visiting driver picked
at random is a lorry driver is $\frac{\lambda}{\lambda+\mu}$, then we
would have got the same answer without having to calculate any Poisson
probabilities at all.

This seems like a reasonable suggestion, but how can we justify it?

One technical way is to say that the binomial probabilities we have
found above prove that this is the case. But this gives little
insight into the reason for it.

A better way is to simply observe that the ratio of the rate of lorry
drivers arriving to the rate of car drivers arriving is $\lambda:\mu$,
so the probability that a particular driver is a lorry driver is
indeed $\frac{\lambda}{\lambda+\mu}$. This might feel a little
problematic, though, as it seems to ignore the probabilistic aspects
involved.

A more careful way of doing this is to think about the behaviour and
meaning of Poisson distibutions. The means $\lambda$ and $\mu$ are
for the unit time period of 1 hour. If we had a time period of
$t$ hours, with the same uniform random driver arrivals over the
whole period, then the mean number of lorry and car drivers would be
$t\lambda$ and $t\mu$ respectively, with the distribution of
the number of drivers still being Poisson. A standard thing to do at
this point is to take $t$ to be very small. In this case, the
probability of there being more than one driver arriving in the period
is negligible, so the probabilities become:

Two ways of deriving these probabilities are: (a) calculate the
Poisson probabilities, expanding $e^{-t\lambda}$ and ignoring all
terms involving $t^2$; (b) assume that the number of lorry drivers
arriving is zero or one, then calculate what the probability of one
lorry driver arriving would have to be so that the expected number of
lorry drivers is $t\lambda$. Note that we also ignore the negligible
probability that both a lorry driver and car driver arrive.

Therefore, in this very short time period of $t$ hours, we have

This means that whenever a driver arrives, the probability that this
driver is a lorry driver is indeed $\frac{\lambda}{\lambda+\mu}$,
exactly as we wanted.

Incidentally, the small-time-slice thinking also shows why the Poisson
distribution is a good approximation to the binomial distribution:
imagine we are dealing with a time interval and our Poisson
distribution has mean $\lambda$. Divide the whole time interval into
$N$ equal slices, and assume that no slices can have more than one
event. Then each slice has a probability of $\lambda/N$ of having an
event, and the number of events is distributed as
$\mathrm{B}(N,\lambda/N)$. The larger $N$ is, the better the
assumption that no slice can have more than one event becomes, and so
the more closely $\mathrm{B}(N,\lambda/N)$ approximates
$\mathrm{Po}(\lambda)$.

This also reminds me of a lovely and surprising probability question
on this topic that I saw on an undergraduate problem set (question
6(ii) on Examples sheet 2
here):

The number of misprints on a page has a Poisson
distribution with parameter $\lambda$, and the numbers on different
pages are independent. A proofreader studies a single page looking
for misprints. She catches each misprint (independently of others)
with probability 1/2. Let $X$ be the number of misprints she catches.
Find $\mathrm{P}(X=k)$. Given that she has found $X=10$ misprints,
what is the distribution of $Y$, the number of misprints she has not
caught? How useful is $X$ in predicting $Y$?

]]>Strong induction and ordinary inductionhttps://blog.d-and-j.org/mathematics/2018/03/04/induction.html
Julian Gilbey2018-03-04T14:20:00+00:00https://blog.d-and-j.net/mathematics/2018/03/04/induction.htmlOne of my UKMT Mentoring scheme
mentees was asking me about induction, and we were discussing how
strong induction and ordinary induction are related to each other. In
the end, I ended up writing this piece, which I’m sharing here for
general interest.

Our aim in this note is to prove the equivalence of “ordinary”
induction and strong induction. For concreteness, let us assume that
we are trying to prove the statement $P(n)$ (which is a statement
about the integer $n$) is true for all $n\ge n_0$, where $n_0$ is some
integer. (Typically we will have $n_0=0$ or $n_0=1$, but not
necessarily.)

For example, we might be trying to prove that the sum of the first $n$
positive integers is $\frac12 n(n+1)$, in which case we could take
$P(n)$ to be the statement $1+2+\cdots+n=\frac12 n(n+1)$ and $n_0=1$.
Or we might be trying to prove some statement about all finite graphs,
in which case $P(n)$ might be “blah is true for all graphs with $n$
vertices” and $n_0=1$ again.

The principle of mathematical induction

$P(n)$ is true for all $n\ge n_0$ if the following two conditions hold:

(a) $P(n_0)$ is true (the base case), and

(b) if $k\ge n_0$ and $P(k)$ is true, then $P(k+1)$ is true (the
induction step).

The principle of strong (mathematical) induction can be useful when
the proof of $P(k)$ depends on more than one smaller case.

The principle of strong induction

$P(n)$ is true for all $n\ge n_0$ if the following conditions hold:

(a) $P(n_0)$ is true (the base case), and

(b) if $k>n_0$ and $P(j)$ is true for all $n_0\le j<k$, then $P(k)$ is
true (the induction step).

For example, if we are trying to prove a result about Fibonacci
numbers, we might use the definition $F_n=F_{n-1}+F_{n-2}$ and have to
make use of properties of two smaller numbers. Or we might be arguing
about graphs with $n$ vertices, and split a graph up into two smaller
graphs with $m$ and $n-m$ vertices; in this case, we may need to
assume that whatever result we are trying to show holds not just for
graphs with $n-1$ vertices but also for graphs with $m$ and $n-m$
vertices for any $1\le m<n$. In cases such as these, this
“stronger” version of induction is very useful.

It turns out that we can actually combine these two conditions into
the single condition:

if $k\ge n_0$ and $P(j)$ is true for all $n_0\le j<k$, then
$P(k)$ is true.

The induction step where $k>n_0$ is exactly as before, and the base
case is where $k=n_0$. In this case, this condition becomes “if
$P(j)$ is true for all $n_0\le j<n_0$, then $P(n_0)$ is true”. But
there is no $j$ with $n_0\le j<n_0$, so it is vacuously true that
$P(j)$ is true for all such $j$, and hence $P(n_0)$ is true. It is
easy to overlook this special vacuous case, though, or to argue about
it incorrectly within a general argument, so it is often wise, in
practice, to handle the base case separately as above.

We are now in a position to prove the equivalence of these two
formulations of induction. We first need to be clear what we mean by
these being equivalent. What we mean is as follows: if we assume that
the principle of mathematical induction is true, then the principle of
strong induction follows from this, and vice versa.

Theorem

The principle of mathematical induction and the principle of strong
induction are equivalent to each other.

Proof

Let us assume first that the principle of strong induction is true,
and aim to prove that the principle of mathematical induction follows
from this.

So let $P(n)$ be a statement, $n_0$ an integer, and assume that $P(n)$
satisfies the conditions for mathematical induction, namely:

(i) $P(n_0)$ is true, and

(ii) if $k\ge n_0$ and $P(k)$ is true, then $P(k+1)$ is true.

We wish to show that $P(n)$ is true for all $n\ge n_0$, and we do this
by showing that it also satisfies the conditions for strong induction.
Now, the base case (i) is the same as the base case (a) for strong
induction on $P(n)$. Furthermore, $P(n)$ satifies condition (b) for
strong induction, for if $k>n_0$ and $P(j)$ is true for all
$n_0\le j<k$, then in particular $P(k-1)$ is true, so by (ii), it
follows that $P(k)$ is true. (And note that $k-1\ge n_0$.) Thus the
induction step for strong induction also holds, and so by strong
induction, $P(n)$ is true for all $n\ge n_0$, as we required.

We now prove the converse: we assume that the principle of
mathematical induction is true, and aim to prove that the principle of
strong induction follows from this.

So let $P(n)$ be a statement, $n_0$ an integer, and assume that $P(n)$
satisfies the conditions for strong induction, namely:

(i) $P(n_0)$ is true, and

(ii) if $k>n_0$ and $P(j)$ is true for all $n_0\le j<k$, then
$P(k)$ is true.

We wish to show that $P(n)$ is true for all $n\ge n_0$. We define a
new statement $Q(n)$ for $n\ge n_0$, which states “$P(k)$ is true for
all $n_0\le k\le n$”. Then (i) is equivalent to stating that
$Q(n_0)$ is true, and we can rewrite (ii) as: if $k>n_0$ and $Q(k-1)$
is true, then $P(k)$ is true. But if $Q(k-1)$ is true and $P(k)$ is
true, then $Q(k)$ is true (as now $P(j)$ is true for all
$n_0\le j\le k$). So (ii) becomes: if $k>n_0$ and $Q(k-1)$ is true,
then $Q(k)$ is true. If we now replace $k-1$ by $k$, we get: if
$k\ge n_0$ and $Q(k)$ is true, then $Q(k+1)$ is true.

These are now the base case and induction step for the principle of
mathematical induction, and so it follows that $Q(n)$ is true for all
$n\ge n_0$. But if $Q(n)$ is true, then $P(n)$ is true (by the
definition of $Q(n)$), and so $P(n)$ is true for all $n\ge n_0$, as we
required.

This argument shows that the principle of mathematical induction and
the principle of strong induction are equivalent and can be used
interchangeably.

It is also worth noting that these principles are axioms of
arithmetic: it is impossible to “prove” the principle of
mathematical induction or the principle of strong induction, though we
have proven them to be equivalent to each other. More about them can
be found in articles on Peano arithmetic or books on mathematical
logic.

]]>Implicit differentiation Ihttps://blog.d-and-j.org/mathematics/teaching/2017/02/24/implicit-differentiation.html
Julian Gilbey2017-02-24T09:17:18+00:00https://blog.d-and-j.net/mathematics/teaching/2017/02/24/implicit-differentiation.htmlI’ve been thinking about implicit differentiation with my colleagues
recently. How do we teach it (at high school level), and what
subtleties are involved? It started by trying to understand what we
mean by the equation

(c) Where would this result be useful to them (besides in artificial
exam questions)?

In this post, I will offer some thoughts on (a) and (b), but I’m still
fairly stuck on (c).

A typical textbook explanation of the formula begins as follows:
“Suppose that $x$ is given as a function of $y$” and then goes on to
give a reasonable-looking explanation involving $\delta x$ and $\delta
y$. Some books draw a sketch to illustrate this, while others just
use algebra.

In a particular commonly-used textbook, a few examples then show how
this can be used when we have $x=f(y)$ for some function $f$. One of
them is $x=y^2$. Here we have $\frac{dx}{dy}=2y$, so
$\frac{dy}{dx}=\frac{1}{2y}$. The textbook notes that although this
could be written as $\frac{dy}{dx}=\frac{1}{2\sqrt{x}}$, it is more
common to leave it as a function of $y$, matching the form of the
original relation.

But if we sketch the graph of $x=y^2$, it becomes clear that this note
is simply incorrect.

Here, if we regard the derivative as $\frac{1}{2\sqrt{x}}$, then at
both $A(4,2)$ and $B(4,-2)$, we would obtain the derivative
$\frac{dy}{dx}=\frac{1}{4}$, which is clearly wrong. However, the
original version $\frac{1}{2y}$ would give derivatives of
$\frac{1}{4}$ at $A$ but $-\frac{1}{4}$ at $B$. (And we can’t fix
things by saying, “Well, the derivative is $\pm\frac{1}{2\sqrt{x}}$”,
because how do we decide which sign to take at any particular value of
$x$?)

So there is something inherently different about the two offered forms
of the derivative: one is given as a function of $y$ and “works”,
while the other is given as a function of $x$ and fails, and it is
clearly because we are given $x$ as a function of $y$, so
$\frac{dx}{dy}$ is a meaningful function of $y$.

Another point to note is that when we write $\frac{dy}{dx}$, we are
thinking of $y$ as a function of $x$, and then asking how the function
$y$ changes as $x$ changes. Therefore, when we write $\frac{dx}{dy}$,
we are thinking of $x$ as a function of $y$ – as it is in our case,
and then asking how $x$ changes as $y$ changes. So the original
equation \eqref{eq:recip} is actually relating the behaviour of $x$ as
a function of $y$ to the behaviour of $y$ as a function of $x$. It is
not even obvious that this makes sense, as we have seen that $y$ may
not be a function of $x$!

There is a function from analysis called “The Inverse Function
Theorem” which sheds light on this. I’ll briefly describe that later,
but in our context, it (roughly) tells us the following:

Consider the function $x=f(y)$, and assume that at $(x_0, y_0)$
(where $x_0=f(y_0)$), the derivative $f’(y_0)$ is non-zero. Then we
can restrict the domain of $f$ to an interval containing $y_0$ so
that it becomes invertible with inverse $y=g(x)$, say. Then $g(x)$
is differentiable and we have

where $x=f(y)$ and $y$ lies in this restricted domain.
In other notation, this equation reads

So in our case of $x=y^2$, when looking at the point $A(4,2)$, we
could restrict the domain of the function to $1<y<3$ as shown here:

(We could alternatively have restricted to $y>0$, but it makes no
difference to the derivative at $A$.) Then the function is one-to-one
on the domain $1<y<3$, so it has an inverse $y=+\sqrt{x}$ there, and
we have $\frac{dy}{dx}=1\bigm/\frac{dx}{dy}$ as required. And if we
wish, we could write the derivative in terms of $x$ as
$\frac{dy}{dx}=\frac{1}{2\sqrt{x}}$. If, on the other hand, we looked
at the point $B(4,-2)$, then we could restrict the domain to $-3<y<-1$
and find that the inverse function is $y=-\sqrt{x}$. In this case,
then, $\frac{dy}{dx}=-\frac{1}{2\sqrt{x}}$. Finally, at the origin,
we have $\frac{dx}{dy}=0$: the function does not have a local inverse
there, and we do not have a value for $\frac{dy}{dx}$. (There is some
sense in which it is infinite at the origin.)

How can we explain this subtlety to students?

One way may just be to offer them examples such as the above, and ask
how we can write the derivative $\frac{dy}{dx}$.

A visual argument for the relationship between $\frac{dy}{dx}$ and
$\frac{dx}{dy}$ is the approach the textbook offered, once we
understand that we are talking about functions and their inverses.

An alternative argument, which is more algebraic, is to use the chain
rule: if $y=g(x)$ is the (local) inverse of $x=f(y)$, then we have
$g(f(y))=y$. If we differentiate both sides with respect to $y$, we
obtain

If we write $x=f(y)$, then this becomes our familiar $g’(x).f’(y)=1$,
or $g’(x)=1/f’(y)$.

(It may also be worth noting that $x=f(y)$ may have an inverse even if
$f’(y_0)=0$, for example $x=y^3$ has the inverse $y=\sqrt[3]{x}$, but
this is not differentiable at the origin.)

This still doesn’t give a reason for why students might want to use
this result! And of course, any time that we want to find
$\frac{dy}{dx}$ and we are given $x$ as a function of $y$, we can
differentiate both sides with respect to $x$, using implicit
differentiation. And that renders this result somewhat pointless for
school calculus. So any thoughts on why students might find a need
for this would be welcomed!

The Inverse Function Theorem

I mentioned the Inverse Function Theorem earlier. Here’s a statement
of the theorem from Tom Apostol’s “Mathematical Analysis” (2nd
edition).

Theorem 13.6 (The Inverse Function Theorem) Assume
$\mathbf{f}=(f_1,\dots,f_n)\in C’$ (i.e., continuously
differentiable) on an open set $S$ in $\mathbb{R}^n$, and let
$T=\mathbf{f}(S)$. If the Jacobian determinant
$J_{\mathbf{f}}(\mathbf{a})\ne 0$ for some point $\mathbf{a}$ in
$S$, then there are two open sets $X\subseteq S$ and $Y\subseteq T$
and a uniquely determined function $\mathbf{g}$ such that

(a) $\mathbf{a}\in X$ and $\mathbf{f}(\mathbf{a})\in Y$,

(b) $Y=\mathbf{f}(X)$,

(c) $\mathbf{f}$ is one-to-one on $X$,

(d) $\mathbf{g}$ is defined on $Y$, $\mathbf{g}(Y)=X$, and
$\mathbf{g}[\mathbf{f}(\mathbf{x})]=\mathbf{x}$ for every
$\mathbf{x}$ in $X$,

(e) $\mathbf{g}\in C’$ on $Y$.

(I won’t attempt to explain the technical terms here, as this post is
too long already; the internet has much on these for the interested
reader.)

We can apply this theorem to our context. We are dealing initially
with a function $x=f(y)$, so we take $n=1$ and let
$\mathbf{f}=(f_1)=(f)$. Our functions at high school level are almost
all well-behaved (that is, smooth), except perhaps at an occasional
point, so we will just ignore the $C’$ issue, so we can take $S$ to be
the domain of the function $f$ and $T$ to be its range.

The Jacobian determinant for our one-dimensional function $f$ is just
$f’(y)$, so then this theorem simplifies to the (less precisely
stated) result we gave above, noting though that the $\mathbf{x}$ of
the theorem is our $y$, and $\mathbf{a}$ is our $y_0$. The
relationship between the derivatives follows from (d) using the chain
rule, as we described above.

]]>Comments on Ellenberg and Gijswijt's capset paperhttps://blog.d-and-j.org/mathematics/2017/01/15/capset.html
Julian Gilbey2017-01-15T09:17:18+00:00https://blog.d-and-j.net/mathematics/2017/01/15/capset.htmlI recently had the fun of reading
Ellenberg and Gijswijt’s paper on
the capset problem, where they bound the size of a subset of
$\mathbb{F}_q^n$ with no three terms in arithmetic progression by
$c^n$ with $c<q$.

The paper is beautifully written, and amazingly needs only relatively
elementary undergraduate algebra. (It is generalised to the Galois
field $\mathbb{F}_q$, but if we take $q$ to be prime, then even that
is unnecessary to understand the argument.)

I was somewhat stuck on two small points at the start of the proof of
Proposition 4, and thought I would share my realisation of the
argument here for others’ benefit.

The first is the assertion in the first paragraph that “The space $V$
of polynomials in $S_n^d$ vanishing on the complement of $-\gamma A$
has dimension at least $m_d-q^n+|A|$”. For simplicity, write $B$ for
the complement of $-\gamma A$, so $|B|=q^n-|A|$ (assuming that
$\gamma\ne0$). Considering now the evaluation function $e:S_n\to
\mathbb{F}_q^{\mathbb{F}_q^n}$ described before Proposition 2, we can
look at the restriction $e_d$ of $e$ to $S_n^d$, and then take the
restriction of the image of $e_d$ to $B$. In other words, if $p\in
S_n^d$, then $e_d(p)$ is a function $\mathbb{F}_q^n\to\mathbb{F}_q$;
we then take the restriction of this: $e_d(p)|_B$. This composition
$e_d|_B$ therefore gives us a linear map $S_n^d\to\mathbb{F}_q^B$,
from a vector space of dimension $m_d$ to one of dimension $|B|$. The
required space $V$ vanishing on $B$ is the kernel of this linear map,
which therefore has dimension at least $m_d-|B|$, as required.

The second point is the assertion in the next paragraph that if
$|\Sigma|<\dim V$, then there is a non-zero $Q\in V$ vanishing on
$\Sigma$. The argument for this is fairly similar. Let $p_1$, $p_2$,
…, $p_k$ be a basis for $V$, where $k>|\Sigma|$. Then under the
linear isomorphism $e$, the functions on $\mathbb{F}_q^n$ given by
$e(p_1)$, …, $e(p_k)$ are linearly independent. But now restricting
them to functions on $\Sigma$, a space of dimension $|\Sigma|$,
necessarily gives a linear dependence between the restricted functions
(as $k>|\Sigma|$). So this gives a non-trivial linear combination of
these functions which will be zero on $\Sigma$ but is not the zero
function on the whole of $\mathbb{F}_n^q$, as they are linearly
independent in $V$.