This is a very long note. I think you should read it all. But just in case you don’t, here’s a summary of what I’m going to say.

SUMMARY

(a) The students who get the highest grades are the ones who do the most work. Ultimately it is their job to do the work. If you have to make them do it, or if you try to get me to make them do it, then we all need to sit down and have a chat.

(b) But you can get a B grade in Physics with only about 50% of the marks. You’ll need 70% for a B grade in Maths, but the easy modules count equally with the harder ones, so it’s possible to get a B grade overall, despite getting an E in the hardest module. It’s very hard to fail to get into any university at all, and I’m not persuaded that which university it is matters as much as people seem to think.

(c) It’s all about using past exam papers and mark schemes. It is not cheating to use a mark scheme.

WORK

In my 26 years experience as a school teacher and tutor, the biggest single predictor of the final grade a student ends up with is how much work they did. My job as a tutor is to help my students understand the material, but ultimate responsibility lies with the student in doing the necessary work.

I’m pretty old school about these things. As we approach April, when I hear stories of students going on holiday, going on nights out, still doing their part-time job, and so on, I roll my eyes. Although I’ll soften this statement a little later, A levels are hard, and you don’t get high grades and get into good universities unless you make sacrifices.

To the parents who are reading this I send a contradictory message. On the one hand, if your son or daughter is still going out every Saturday night and is going skiing at Easter I am going to ask myself “what the hell are these people thinking?” On the other hand, your son or daughter is either 18 or nearly 18 — they need to take some responsibility for themselves.

UNIVERSITY

My old school side is actually pretty comfortable with people failing exams. If they have to be dragged, kicking and screaming, to a desk every day then you have to ask what the point is. Maybe they’ll get into university, but what then? It’s not going to get any easier and you’ve got to let go eventually. At some point they have to decide what they want and understand what this will take. Failing to get into university may well be the right thing. And, if it isn’t, there are plenty of options to turn things round. I’ve seen this happen. More than once. So stop worrying so much.

All this being said, it is actually a lot easier to get into university than many people fear. Universities are motivated by money, and each student is a big old walking bag of cash. Someone will take your kid. And I’m not persuaded that if this someone is not a Russell Group university then the whole thing is a disaster. Many people live happy, productive — and even wealthy, if that’s important to you — lives despite not going to a RG university. Indeed, RG universities are not even necessarily the best ones. And they may not be the best ones for your kid.

I often use the example of my friend Robin. He did poorly in his A levels and went to a university you’ve never even heard of. I did superbly in my A levels and went to Cambridge. He’s now got a PhD in Physics, a high-paying job that involves endless travel around the world with stays in luxury hotels, a happy marriage to his childhood sweetheart, five kids (one of whom is now at Oxford), and a lovely house in the country with an orchard. I live on my own with a dog on a council estate and cry myself to sleep every night wondering where it all went wrong.

REVISION STRATEGY

What does “work” mean? Is there a strategy? I believe there is — and here’s what I suggest.

Professional musicians will tell you that they do not practice for concerts by playing entire pieces over and over again — they focus on individual elements, perfecting each by endless repetition. To an extent this approach may be better employed during the first year and a half of an A level course. But it may still be necessary for some topics during the final revision period.

Whether a student wants to take a topic-based or entire paper-based approach, the key is to use actual past exam questions. I do not at all recommend textbooks or even revision guides. Students will be tested on the type of questions the exam board sets. Only past examples of these provide the most reliable guide to what they’re going to face.

There are two basic issues: what to revise and how to revise.

WHAT TO REVISE

A mistake I think a lot of students (and their teachers) make is to aim to get 100% in the exam. By which I mean, they cover all the topics and try to perfect all of them. For students aiming for an A* this is essential: they will require an average of 90% in (some of) the papers.

But, for the rest, aiming for 100% is not only unnecessary, it may be undesirable. Some topics are harder than others. Some topics are almost guaranteed money in the bank. Others are conceptually very challenging. My belief is that there is an argument that some students may need to identify topics to give up on. Let me try to justify this.

I'll start with Physics. The new Physics A level is clearly harder than the old modular one. But the grade boundaries are surprisingly low. Here are the figures for June 2017

You'll see that in Paper 3 Section BD, which includes tricky material like special relativity, a B grade can be gained with a mark as low as 43%. Even an A grade only requires 55%. That's barely more than half the marks. You can get an A grade in Paper 3 Section A with under half the marks.

Maths is slightly more complicated. A candidate’s actual mark (known as the raw mark), which is out of 72 or 75 (depending on the exam board) is converted into a uniform mark, which is out of 100. The conversion is fiddled each year to give the grade distribution the exam board considers fair compared with previous years. In Edexcel C3, for example, an A grade typically requires a raw mark of around 60 out of 75. But in June 2013 the paper was vastly harder than it had ever been before (the candidates filled the internet with Hitler videos in response) and this dropped to 50 out of 75. Both of these raw marks were translated into 80 uniform marks, which is the fixed boundary for an A grade.

An additional complication arises with Maths. The final overall grade is simply based on the total of the uniform marks obtained in the six modules. The easiest module, C1, counts equally with the hardest ones, like C3 and C4. It is possible to get, say, a B grade overall, despite getting an E grade in one of the hard modules. (I’ve had students who’ve done this.) By now, most students should be aiming for at least 90 in C1 — which is “too many” marks for even an A grade. So their score in C3 or C4 can be commensurately lower.

In other words, a student who wants a B grade overall can, in Physics, do so by getting an average of about 50% across the three papers. Putting it another way, they can get nearly half of the questions wrong — or not even attempt them at all — and they’ll still end up with a B. In Maths, they’ll need 70%, but this can be achieved with very high marks in the easy modules, and much lower marks in the harder ones.

So it seems to me that it is entirely legitimate to focus most of the energy on the topics a student already feels most confident with, and perhaps abandon those topics that they just can’t seem to understand. Perfection is not required — and it is soul-destroying and wasteful of time to pursue it.

HOW TO REVISE

The broad strategy is what I call the “three sheets” approach. The student should have (a) the exam paper (b) a blank sheet of paper and (c) the mark scheme.

They should look at question 1 and ask, “Do I know how to do this?”

If the answer is “yes”, they should do it, and then check (and mark) their answer using the mark scheme.

If the answer is “not sure”, they should look at the mark scheme to see if that gives them enough of a hint to get started. If it does, they should do it, and then check (and mark) their answer using the mark scheme. This is not cheating. Unlike other subjects, it is possible to read a maths or physics question and not be able to write anything at all. This can waste a lot of time, and can be very dispiriting. The mark scheme can provide the spark that allows a student to get started. Without this approach, they may just end up with a whole lot of blank answers, which is of no use to anyone, having sat staring at the wall for an hour and a half not knowing what to do.

If the answer is “no”, they should highlight the question for discussion with me during the next lesson. If the problem is something quick, they should send me a message there and then so that I can deal with the problem as soon as possible, rather than them having to wait a week.

For Physics students, I would add one extra step. Every time they come across a question asking for a factual answer (“What are the two postulates of Special Relativity?”) they should copy down the answer given in the mark scheme word-for-word in a special notebook and learn it off by heart.

Oh, and if my students taking Maths don’t learn the formulas on my website at https://www.profmatt.com/cribsheets/ then I am likely to become violent. There’s just no excuse for it. Ask your kid now what the sine of 30 degrees is. If they don't immediately say “a half”, then they’re just bone idle and I’m not taking the blame when they fail. Which they will. They learnt the alphabet and their times tables when they were younger and sweeter, so they can learn the maths formulas now. It’s not beyond the wit of man.

While I’m addressing the parents, I want to repeat something. Using mark schemes is not only not cheating, it is an essential part of the preparation. Students need to know how papers are marked. They need to understand what the examiners are looking for. If you’re worried that your son or daughter is just going to sit there copying out answers from the mark scheme then, yes, that’s cheating, though it’s not an entirely worthless exercise — I got through A level Physics by copying Andrew Pilbeam’s homework for two years. (It worked out OK. I got the Physics prize for my work on the Helmholtz coils, which I did all by myself, and I got an A grade in the actual A level without copying from him.) But, frankly, if that’s what they’re doing then you’ve got bigger worries. If three months before their A levels they’re still trying to con themselves and you into thinking everything’s OK, then you need to have a long conversation with them.

My final remarks address what happens during the exam itself.

THE EXAM

I have a few rules for exams.

First, if you don’t know what to do, do something. You don’t lose marks for things that are wrong. And no-one reading your answer knows who you are, so there’s no need to be embarrassed if you think you’re writing nonsense. Besides, you can actually get marks for things that are wrong. True story.

Second, if you don’t know what that something is, ask yourself, “what kinds of things do I normally do in this situation?” What’s the topic the question is about? In Maths, the material is split into modules. You should know what topics are in each module. So you must have some idea what’s going on. They’ve given you a “y =” formula? Differentiate it. Factorise it. Shove a number into it. Just do something.

Third, never give up on a question half-way through. If you can’t do part (a) it doesn’t mean you can’t do part (b). Either part (a) is irrelevant to part (b). Or they’ve told you the answer to part (a) so you can use it in part (b). Or (honestly) you can make up an answer for part (a) and use it in part (b).

But do give up on a question if it’s just taking too much time. It’s not worth it. If you’ve written more than about four lines of algebra, you’re probably doing the wrong thing. Give it up. Move on. Remember: you don’t need 100%. Or even close.

There is very rarely an excuse for writing nothing at all. This is especially true in Physics for the wordy questions. Often some of the marks are for stating things which are incredibly obvious. And there are usually more things to say than there are marks awarded — in other words, you can give an incomplete answer and still get full marks. You’ll often see in the mark scheme four or five points you could make, yet you only need to have made three of them to get 100% in the question.

If you have any questions about any of this, I’ll be very happy to discuss it with you.

I was recently invited by Talksport Radio to be their “maths expert” to speak on the topic of coin tossing, in the context of determining who bats first in a cricket match. Here’s a fuller version of what I said.

One of the best-known studies of coin tossing (at least amongst statisticians) was carried out by John Kerrich. He was interned by the Nazis in Belgium during World War II. Being a mathematics lecturer he took the opportunity to carry out a series of experiments in probability, including tossing a coin 10,000 times.

He noted two interesting phenomena.

First, the difference between the number of heads you expect to get and the number of heads you actually get tends to increase as the number of coin tosses increases. In his first hundred tosses, he got 44 heads, which is 6 fewer than you would expect. After ten thousand tosses, he had got 5,067 heads, which is 67 more than you would expect, more than ten times as many.

Second, although the absolute difference increases, the relative difference decreases. In other words, the proportion of heads tossed becomes closer to what you expect as the number of tosses increases. After the first hundred tosses, he’d had 44% heads, which is 6% below what you’d expect. But after the full ten thousand tosses, he’d had 50.67% heads, which is only 0.67% above what you’d expect.

The graph above, which is based on Kerrich’s data, explains why casinos are successful businesses. Consider someone playing blackjack in a casino. The odds of winning any particular hand are roughly 49:50, so it’s almost like tossing a coin. A typical player might play a hundred hands. So he’ll be operating on the left-hand side of the graph. Here wild, unpredictable swings are very typical. This is exciting: you might lose a lot and you might win a lot.

But the casino is playing against many different players and pretty much 24 hours a day. So they are playing thousands of hands and are thus operating on the right-hand side of the graph. Here the trend has clearly established itself. There is little variation: despite the random nature of any given hand, over thousands of hands the pattern is clear – steady, virtually guaranteed winnings for the casino.

So in the short-term, the outcome of a coin toss is inherently unpredictable, but in the long-run the overall outcome is very predictable.

This begs the question: can you consistently toss heads rather than tails? An experiment carried out in 2009 set out to test this. Thirteen otolaryngologists (doctors who specialise in the ear, nose and throat) were asked to toss a coin 300 times each, with the intention of tossing as many heads as possible. All of them tossed more heads than tails: the least being 160 heads (10 more than expected) and the greatest being 203 heads (53 more than expected).

There is of a course a natural variation in the number of heads you would actually get if you toss a coin 300 times. You wouldn’t expect exactly 150 heads every single time. But even taking this into account, five of the 13 doctors got results that you would expect to see less than one time in a hundred. Pretty compelling evidence that you can bias the outcome of a coin toss in the way you toss the coin.

Another way of achieving this would be to use a biased coin: one which is weighted so as to give more heads than tails. There was much excitement in 2002 when a group of Polish statisticians claimed that the Belgian one-euro coin was biased in exactly this way. They claimed that in 250 spins (they preferred spinning the coin rather than tossing it) heads came up 140 times, which is 56% of the time. They believed the heavy embossed image of the King of Belgium on the heads side of the coin was responsible for the bias. Sadly, however, a result such as 140 heads of 250 is not particularly unlikely for a perfectly fair coin. This result was not as significant as that achieved by the otolaryngologists.

The mother of one of my students asked me for advice on how many significant figures her son should give in his physics exams. Since it was raining, I decided to answer her question very fully . . .

So far as Maths is concerned, the ruling is that you should give three significant figures unless the question specifies otherwise. In general this is interpreted to mean that an answer is acceptable if it is capable of being rounded to three significant figures, even if it hasn’t actually been so rounded.

Thus if the question asks for the area of a circle of radius 1, then

π, which is exact, is OK

3.14, is OK, since it has been rounded to three significant figures

3.14159 is also OK, since it could be rounded to 3.14, even though it hasn’t been

but 3.1 is not OK, since it cannot be rounded to 3.14

So far as Physics is concerned, the ruling is that you cannot give more significant figures in your answer than were given to you in the question, since you cannot magic up accuracy where it didn’t already exist.

No circle has a radius of exactly 1m. If I say a circle has radius 1m, I may mean that it has a radius of 1m to the nearest cm, so that its actual radius could be anywhere between 0.995m and 1.005m, giving an area somewhere between 3.11... and 3.17... square metres. Since these two values only agree to one significant figure, I should strictly speaking only give the area as 3 square metres.

Examiners don’t expect this kind of analysis, but it does illustrate the importance of not giving more significant figures in an answer than were available in the data used to calculate it.

Unfortunately, physics examiners don’t seem particularly consistent (even in the same question!) with how they deal with this. The key words appear to be “giving your answer to an appropriate degree of accuracy.” When these words appear in a question (and they don’t always appear) then you need to take particular care not to give too many significant figures.

The difficulty is that it’s not always obvious how accurately information in a question has been given. If you write “1m”, the convention is that this is correct to the nearest whole number. If you’d measured it to the nearest centimetre and found that it came out to 1m, you would write it as 1.00m.

The problem with writing “1m” is that, although this implies you’ve given it to the nearest whole number, you can’t tell how many significant figures you’re working to because, for example, you might have another measurement of 12m in the same set of data – that’s also presumably measured to the nearest whole number, too – but it has two significant figures. This is because 1m only has one significant figure to offer! So if you have a rectangle with sides 1m and 12m, how many significant figures are you using? In a sense you’ve contradicted yourself because if the answer is “two significant figures” then you should have written 1.0m, but then that would imply you’ve measured that side more accurately (to the nearest 10cm) than the other side. And the answer can’t be “one significant figure” because you should have written 10m and not 12m.

Don’t even get me started on 10m! Is that one significant figure (because the actual measurement was 12m and you rounded it) or two significant figures (because the actual measurement was 10.2m and you rounded it)? You can’t tell without someone making it explicit.

So the contradictory advice is: in Maths, it’s usually OK to give too many, but in Physics, it’s sometimes wrong to give too many. In Maths it’s usually wrong to give too few, but in Physics it’s better to give too few than too many, unless you get carried away and only give one significant figure, even though one significant figure may be all that you can claim in the real world.

The resolution of the apparent inconsistency is that mathematicians argue that all of their numbers are exact, so that rounding is more for convenience than anything else. Indeed, pure mathematicians always prefer exact answers where possible – for example they'd rather give the area of the circle as π – even though such answers are impractical in the real world.

Physicists argue that no measurement is exact, so that rounding is important so as not to overstate what we really know.

For mathematicians, there exists such a thing as a circle and it has an exact area. For physicists, there is no such thing as a circle, since you cannot construct a perfect one.

The physicists approach is actually important in experimental physics. If you take two measurements and use them in a calculation you may end up with two (slightly) different answers. That difference may be telling you that something interesting is happening physically. Or it may simply be because the two initial measurements differed. Even two apparently similar objects will have slightly varying dimensions, for example: not all pound coins are exactly the same. If you are careful with how you deal with significant figures, you can distinguish between these two cases – natural variation and interesting physical phenomenon – and so you can tell if the difference in your calculations is indicative of something physically significant.

The role of Statistics in all this is to help you decide if – for example – two averages are different because of the natural variation in any set of measurements, or because the sizes of the underlying quantities you are measuring are, in fact, different from each other. (One-pound coins vary slightly in mass, but they have a very different mass from two-pound coins.) We call the latter a “significant difference”. Unhelpfully, significant differences are not necessarily significant in practical terms. Statistical significance merely indicates that two things do not have the same value; but the difference between them may be trivial and not of practical importance.

For example, it may be that there is a statistically significant difference between the average global temperature measured in 1995 and the average global temperature measured in 2010. (The BBC reported a rise of 0.19°C, describing it as “significant”.) There will certainly be a difference because each average will necessarily come from a finite amount of measurements, and these will necessarily vary because everything varies, up and down, over time. Statisticians may establish that the difference is statistically significant, but all that this will mean is that the true average global temperatures were not the same. (The averages we calculated from our measurements will merely be estimates of the true averages, since to find these we would need infinitely much perfect data.) Of course, the true average global temperatures may not be the same because they differ by 0.19°C – to statisticians this is a difference. In practical terms, this difference may only have a negligible effect on the biological, chemical and physical processes on the planet, and so it would not be regarded as practically significant.

Some commentators are comparing the upcoming general election with the one in 1992. The opinion polls predicted a hung parliament or a Labour victory; in fact the Conservatives won, with a majority of 21. The reason? People lied to opinion pollsters: they were embarrassed to admit their support for the Tories.

This is an example of a more general issue in data collection: data quality.

Another political example would be the notorious Literary Digest poll of 1936. Despite over two million responses the poll wrongly predicted that Governor Landon of Kansas would win the presidency. Unknown pollster George Gallop, using a sample of only fifty thousand, correctly forecast that Franklin Roosevelt would be re-elected; indeed he won by a landslide, taking 48 of the 50 states.

In this case, the flaw had been the people the Literary Digest had polled. Their sample was based in large part on people who owned telephones and people who owned cars – both luxury items in the 1930s. Wealthy people disproportionately support the Republican party, whose candidate was Landon.

What we learn from the Literary Digest poll is that quantity is no guarantee of quality. In this case the poll was flawed by bias.

Another issue that can impact survey sampling is non-response. What do you do if a member of your sample refuses to answer one of your questions?

In 2010 the Office for National Statistics published the results of its research into how many British adults are gay. While the Kinsey Reports of the late 1940s and early 1950s are often used to support the claim that around 10% of the population is gay, the commonly accepted figure more recently has been somewhere in the region of 5-7%.

The ONS data reduces the figure even further, putting it at around 1.5%.

So which is it?

The problem with the ONS survey is that only 96% of respondents produced a ‘valid response’ to the question about their sexuality. In other words, 4% didn’t produce a valid response. What does this mean? Why would someone not produce a valid response? Perhaps because they’re gay and don’t want to admit it to a researcher. (The ONS survey comprised personal interviews.) If all of those 4% of non-respondents were gay then the true proportion of gay people jumps from 1.5% to 5.5%, which is within the previously accepted range. If only half of them were gay, this still increases the proportion to 3.5%, more than doubling the estimate of the total number of gay adults in the UK.

Of course, we can’t actually know. All we have is an invalid response and it’s impossible to interpret that. But because the proportion of non-respondents is of the same order of magnitude as the proportion of people identifying as gay (it’s actually more than double the size) we have to treat the 1.5% statistic with considerable caution.

Does it really matter? Well, yes, because social policy is shaped by data. Accurate data helps governments and others decide what legislation to pass, what priorities to make, what funds to distribute. The 5-7% figure was quoted by the Blair government when it was proposing the introduction of civil partnerships. Other parties are interested, too. Advertisers, perhaps lured by the so-called ‘pink pound’, will be very interested to know how large the market for a product targeted at gay people is.

So how can we collect data on sensitive topics such as sexuality? How big a problem is cannabis use amongst teenagers, for example? Do you smoke cannabis? Would you tell a researcher? For the ONS survey, respondents were asked to say ‘stop’ as soon as the interviewer read out the sexual orientation that applied to them. (Someone who is not openly gay might not wish to answer the direct question ‘What is your sexuality?’ with the answer ‘gay’ for fear that someone they know might overhear them. The ONS ‘stop’ method was designed to avoid this problem. But it still required the respondents to be open about their sexuality to the interviewers. (In the same way that respondents to pollsters for the 1992 election had to ‘admit’ to supporting the Conservative party to someone standing right in front of them.)

The ideal approach is to make the respondent feel absolutely certain that their answer is given in complete confidence. There is, in fact, one way to achieve this in which the respondent can feel certain that no-one at all knows their answer, not even the researchers. It works like this.

Each respondent is given two unsealed envelopes marked heads and tails, a piece of paper with the words yes and no printed on it, a pencil, a coin and a die. The respondent is free to examine the contents of each envelope before proceeding.

The respondent then goes into a private room where he tosses the coin and rolls the die. If the coin comes down heads, he opens the heads envelope; if it comes down tails he opens the tails envelope. Inside the heads envelope is the question ‘Are you gay?’; inside the tails envelope is the question ‘Does the die show an even number?’ The respondent indicates his answer by circling the word yes or the word no on the piece of paper. He then folds it in half so that his response cannot be seen.

Finally, the respondent emerges from the private room and puts the piece of paper into a sealed box along with other similar pieces of paper from other respondents.

This elaborate procedure should ensure that the respondent answers the question honestly. He can feel safe in doing so because no-one can know which question he is answering. If his response is ‘yes’ that could simply mean that he is answering the entirely innocuous question ‘Does the die show an even number?’ Further, there is no way to link his answer to him because it is in a box of identical slips that are just the same as his; indeed, since he didn’t write the word out himself, there’s not even a link via his handwriting to his answer.

But how, then, do we interpret the results if we don’t know which of the two questions have been answered on any given slip?

Probability theory comes to the rescue. On average, about half of the respondents will get a tail when they toss the coin, so half of the respondents will be answering the question about the die. Of these, about half will have got an even number when they rolled the die and these people will answer ‘yes’. One quarter of all the responses will therefore be ‘yes’.

If the percentage of ‘yes’ answers is more than one quarter then the excess must be due to respondents answering ‘yes’ to the question ‘Are you gay?’

For example, suppose that there are 100 respondents and 30 of them say ‘yes’. Then (on average) 25 of these are people who got a tail and a even number. The remaining five got a head and are saying ‘yes’ they are gay. Since half of the respondents will have been answering the question ‘Are you gay?’, the proportion of gay people is 5 out of 50, or 10%.

Of course those figures are approximate because if you toss a coin 100 times you don’t get exactly 50 heads every time. So to feel confident in the results, you have to ask many more people than 100. The ONS survey was based on interviews with over 450,000 people. Had they used this methodology, their results would carry a far higher degree of confidence.

The prime minister is to be chosen by 15 electors. Each is asked to order their preference – first, second and third choice – from three candidates: Nick Clegg, David Cameron and Ed Miliband.

The results are as follows.

Six vote for Cameron first, Clegg second, Miliband third.

Five vote for Miliband first, Clegg second, Cameron third.

And four vote for Clegg first, Miliband second and Cameron third.

Who wins?

Under the UK’s current system, first-past-the-post, Cameron wins – he was the first choice of more voters (six) than anyone else.

But is this fair? Nine of the fifteen voters not only didn’t place him first, they placed him third.

So who else do we choose? Nine of the voters preferred Clegg to Miliband, and nine of the voters preferred Clegg to Cameron. So Clegg wins.

But is this fair? Looking at the first choices, Cameron won six votes and Miliband won five. Clegg only won four, so let’s eliminate him and have a run-off between Cameron and Miliband. In the run-off, six voters would choose Cameron (because six of them placed him ahead of Miliband) but nine would choose Miliband (because they had placed Brown ahead of Cameron). So Miliband wins.

Who wins the election depends on the system you choose to run it. You can make anyone be the winner with the ‘right’ system.

In 1950 Kenneth Arrow published a paper titled A Difficulty in the Concept of Social Welfare. The central result has come to be known as Arrow’s Impossibility theorem. In simple terms it states that in an election with three or more candidates, where voters are asked to rank the candidates, it is impossible to find an overall ranking of the candidates that reflects the preferences of the electorate as a whole. In other words, all electoral systems (of this type) are unfair.

You want to go out for a walk this afternoon, but you’re worried that it might rain. You turn on the television: the forecast is for rain. Should you give up on your walk?

You decide to do a little research. You go to the weather forecaster’s website and discover that they claim a 90% accuracy rate: out of 100 days on which it rained, they predicted it would rain on 90 of those days. Sounds pretty good.

Digging a little deeper you discover that out of 100 days on which it did not rain, they correctly predicted it would be dry on 80 of those days. That’s not too bad, either.

It looks like the forecaster is pretty reliable. You decide to go ahead with your walk but you take an umbrella with you, trusting the forecast of rain.

It’s bright sunshine the whole time! You didn’t need the umbrella at all!

Why?

Because you didn’t use Bayes theorem.

You see, it turns out that it rains only 10% of the time where you live. So in 100 days, it rains on 10 of those days. And the weather forecaster, with its 90% accuracy rate, would correctly predict rain on 9 of those 10 days.

However, it doesn’t rain on 90 out of 100 days. But the weather forecaster would wrongly predict that it would rain on 20% of these. So on 18 days the forecast would be for rain when it didn’t actually rain.

In total then, the weather forecaster predicts rain on 9 + 18 = 27 days out of 100. But on only 9 of those days does it actually rain. So the proportion of days on which it rains when the weather forecaster has predicted rain is 9/27, which is only one third. That’s pretty unreliable.

The impressive statistic (“90% accuracy!”) on the weather forecaster’s website was the answer to the following question: “Given that it did in fact rain, what is the probability that the forecast was for rain?”

The problem arose because this question is the wrong way round. What you really want to know is, “Given that the forecast is for rain, what is the probability that it will actually rain?” The statistic here is much less impressive: about 33%.

Why did this happen?

Although the weather forecaster often correctly predicts rain when it actually rains, it doesn’t rain very often, so the number of days on which it rains and on which rain is predicted is small (9 days). And although the weather forecaster rarely predicts rain when it doesn’t rain, there are many days on which it doesn’t rain, so there are many opportunities for an incorrect forecast (18 days out of 100).

Thus a prediction of rain is more often associated with a dry day than with a wet day. And that’s what happened to you today.

A similar problem arises in diagnostic testing for diseases: for rain read ‘disease’, for forecast read ‘diagnostic test’. Bayes theorem says that the question of interest is “Given that the test is positive, what is the probability that the patient actually has the disease?”

There are two things we wish to avoid. A false positive occurs when a healthy patient is diagnosed as having the disease. (Statisticians creatively call these Type I errors.) A false negative occurs when a patient with the disease is diagnosed as being healthy. (Statisticians creatively call these Type II errors.)

Pregnancy isn’t a disease, but the picture below illustrates the distinction between the two types of error.

The answer to our question – “Given that the test is positive, what is the probability that the patient actually has the disease?” – is the ratio of ‘the number of sick patients who get a positive test result’ to ‘the number of patients (both sick and healthy) who get a positive rest result’. (If you like: true positives divided by all positives.)

For this ratio to be high (i.e. for the diagnostic test to be reliable) we need the number of false positives to be very low.

For example if we have 10 true positives and 1 false positive, then the proportion of true positives is 10/11, which is very high. But if we have 10 true positives and 10 false positives, then the proportion is 10/20, which is no better than diagnosis by tossing a coin!

Problems arise when the base rate of the disease amongst people who are tested is low. In a screening programme for a rare disease, even a low rate of false positives will throw up a large number of positive test results, because so many of the people tested will be healthy and a small proportion of a large amount is still a reasonable number of people, all of whom will be wrongly diagnosed. And even if the test is very good at identifying sick people, the actual number of sick people is low (because the disease is rare) so that number of true positives may not be very high. Thus the ratio true positives to all positives may, therefore, not be very high, as in my rain example.

School league tables (or, as the Department for Education calls them, School Performance Tables) were published last month, to much complaint from some private schools. Schools such as Eton, Harrow and Winchester scored 0% because they no longer enter their students for GCSEs, preferring International GCSEs, which are thought to be more challenging and therefore better suited to more able pupils.

I’ve long been troubled by league tables, though for a different reason: an A grade is not the same as an A grade.

League tables are based on grades obtained in public examinations. I’m simplifying here, but basically you add up the number of A grades obtained in each school, divide by the number of students in that school and you get an average. Good schools have high averages, bad schools hve low averages. Pretty uncontroversial, surely?

I argue that pretty much every step of the process is flawed. For example, adding up the number of A grades. This is only meaningful if all A grades are the same. Does an A grade in Maths mean the same thing as an A grade in Theatre Studies? (If so, what does it mean?)

Let’s try an easier question. Does an A grade in Maths mean the same thing as an A grade in Maths? That's a ridiculous question, surely? Well, no. Different schools use different exam boards for their maths exams. Can we be certain that A grades given by different exam boards mean the same thing? (OCR's exams have always struck me as a much harder that Edexcel’s, for example.)

Let’s narrow it down further. Does an A grade in Edexcel’s Maths mean the same thing as an A grade in Edexcel’s Maths? Not necessarily. Students can choose which modules they sit. Are the exams on the Statistics modules directly comparable to those on the Mechanics modules? (Edexcel’s module in Decision Maths is often seen as markedly easier than the other modules, despite counting equally for the final grade.)

Hmm. OK, then. Does an A grade in Edexcel’s Maths mean the same thing as an A grade in Edexcel’s Maths where the modules taken are the same? Not if they’re not taken at the same time. Can we be certain that the C3 exam and the marks obtained in it by candidates are consistent from one exam sitting to another? (Edexcel’s C3 exam in June 2013 was an internet sensation within hours of the end of the exam because it was considered unusually difficult. Edexcel responded by dropping the grade boundaries quite markedly. How precise is that process? How precise could it possibly be?)

Surely I’ll concede that if two students sit the same maths modules set by the same exam board at the same time and both get A grades, then those two A grades are equal?

Nope. An A grade requires an average of 80 marks per module. Or more. One candidate could have got an average of 80. The second could have got an average of 90.

Which makes the second one better? Not necessarily. Maybe the second one got very high marks on the easier modules which boosted his average. C1 is the simplest module, but it counts equally. Very high marks in C1 can make up for low marks in, say, C3. Maybe the second student was sitting some of the modules for the second time, having tried them a year earlier and not done so well.

I think it’s perfectly possible that a student with a B grade is meaningfully better at maths than a student with an A grade. Yet the A grade student will be off to a top-ranked university, and the B grade student will have to settle for his second choice.

But now I have fallen into the league table trap. Top-ranked university. What does that mean? If you can’t even compare A grades in the same subject and be sure that you’re making a meaningful, consistent judgement, how can you compare entire universities and say that some are ‘better’ than others?

I'll bet there are some lecturers at London Metropolitan (ranked bottom of the Guardian’s table) who are better than some lecturers at Cambridge. (Uh oh. Better. What does that mean?) Stephen Hawking was a professor at Cambridge: that didn’t mean you’d be certain to be taught by him or even that you’d ever see him at all. And just because he’s incredibly clever doesn’t mean he’s an incredible teacher. I know I’m not the only person who gave up on A Brief History of Time well before the final chapter.

Malcolm Gladwell agrees with me. He wrote an excellent piecefor the New Yorker on the subject of ranking colleges in the USA.

DARPA, the research arm of the Department of Defense in the US, staged the Red Balloon Challenge in 2009. Competitors had to locate ten red weather balloons that had been tethered at random locations across the US.

The intention was not that one person drive around the country with a pair of binoculars. Rather, I might ask all my friends on Facebook to look out for a red balloon and tell me if they saw one. They might then ask all their friends, and so on.

The winning team from MIT found all of the balloons in just nine hours using this type of strategy. But to encourage participation they offered $2,000 to the first person to send them the co-ordinates of a balloon. On its own this may not have been very efficient. So, crucially, they also offered $1,000 to whomever recruited that person to the challenge, and $500 to whomever recruited that person to the challenge, and so on.

One interesting question mathematically is how much money did the MIT team stand to lose? The Red Balloon Challenge offered a prize of $40,000.

In principle, the sender of a winning set of co-ordinates might have been been at the top of a long line of recruiters. Doesn’t this mean that the MIT team risked an enormous payout?

Well, no. Consider a recruitment chain of seven people. The total payout would be:

$2,000+$1,000+$500+$250+$125+$62.50+$31.25 = $3,968.75

The seventh payment of $31.25 is pretty small. If there were more people in the chain, their payments would be even smaller. Nonetheless, lots of small amounts can quickly add up to a large amount.

Suppose there were 17 people in the chain. The total payout would be $3,999.97 (to the nearest cent). The seventeenth person would have got 3¢.

Even so, there could have been many more people in the chain, and might not the total have slowly grown to an unaffordable amount?

Suppose the total amount payable is T and that there are infinitely many people in the chain. Then,

T = 2000 + 1000 + 500 + 250 + ...

Now multiply both sides of this by 2:

2T = 4000 + 2000 + 1000 + 500 + ...

Finally subtract the first equation from the second:

2T – T = 4000 + 2000 + 1000 + 500 + ... – (2000 + 1000 + 500 + ...)

The terms in the brackets cancel out all of the terms at the beginning except the first one. Thus we end up with T = 4000.

So even if there had been infinitely many people in the chain, the total payout would have been $4,000. Since there were ten balloons that gives a grand total of $40,000 which was the value of the prize. MIT were certain to make at least a little profit, provided they actually won the prize. Since the kudos of winning was worth more than the prize, their investment was well worth the risk.

Series such as this one are called geometric series. One of the earliest examples of them was Zeno’s dichotomy paradox. For me to walk 4000 metres, I first have to walk 2000m, i.e. half the total distance. I then have to walk 1000m, i.e. half the remaining distance. Then 500m. Then 250m. And so on. No matter how far I've travelled there's always half the remaining distance left to go. So I have infinitely many stages to complete and will therefore never get to the end of them.

The flaw in the argument is that last sentence. It assumes that the infinitely many stages of the journey will take infinitely long to get through. But they won't.

Suppose I walk at 1m/s. Then the first stage will take me 2000s. The second will take 1000s. The third 500s, and so on. So the total time taken will be T = 2000 + 1000 + 500 + ... We now know that this adds up to 4000. So I can finish the journey in a finite amount of time. (Indeed the 4000 seconds you would expect me to need to walk 4000m at 1m/s.)

The above screenshot shows an interesting point in a sequence of games of roulette. The results of the previous nine spins are shown at the bottom of the screen in reverse order, so black 17 was the most recent.

Above this we see the bets that have been made on the next spin. Note the huge stack of chips on red (circled) compared with the tiny stack betting on black. The gamblers clearly believe that a red is “overdue”, since the last seven spins have all been black. This is an example of the so-called gambler’s fallacy. Probability theory tells us that a black is equally likely on the next spin.

So what did happen next?

Not only was the next spin a black, it was black 17 again. In fact, black 17 had come up three times in the last nine spins.

This is a great example of how clumpy randomness is. People tend to associate randomness with evenness. In the very long run they’re right. In a very large number of spins, you would expect to see black about half the time, and black 17 about one time in 37. But in the short run, you often get clumpy results such as this.

An exercise I like to use with students is to ask them to write down a sequence of 100 random digits generated from their own heads. Two things typically happen. First, they find it surprisingly hard. Initially they write their digits quite quickly, but they soon slow down. This is because they’re thinking: they’re trying to make the digits look random. (Sometimes they just give up and start writing down sequences they already know, like phone numbers.)

But the second thing that happens is that they fail. For example, about one time in ten you would expect the same digit to be repeated. So in a list of 100 random digits, you’d expect about ten repeats. Typically students will generate fewer than this. You’d also expect to see one example (on average) of three of the same digits in a row in a set of 100 random digits. In a class of students, this very rarely happens. Indeed, in a class of 30 students, you’d expect to see about three examples of four identical digits in a row. This never seems to happen because it just doesn’t feel random.

Of course you know the answer to this: the Sun. (Though often when I ask people they tend to look sheepish and say nothing, for fear of being wrong!)

Actually, it’s not at all obvious which one is bigger: after all, they look the same size. You know which one is bigger because you were told it when you were very young. Which rather spoils the pleasure of trying to work it out. But how do we know the Sun is bigger?

It’s fairly easy to establish that the Sun is a least a bit bigger than the Moon. They look the same size in the sky, but during a solar eclipse the Moon completely covers the face of the Sun. So the Sun must be further away than the Moon and therefore it must be bigger. But how much bigger?

To establish this we need another rare sighting in the sky: have you ever seen the Moon and the Sun in the sky at the same time? It does happen — as the picture above shows. (Indeed, Lewis Carroll wrote about it in the first two verses of The Walrus and the Carpenter.)

But imagine that you can see both the Sun and the Moon in the sky together and that the Moon is a half Moon. What must the geometry of the solar system be at that moment?

The Sun-Moon-Earth forms a giant right-angled triangle, with the Moon at the right-angle. If we can measure the Sun-Earth-Moon angle, then we can use trigonometry to find the relative distances of the Sun and the Moon from Earth. Since the Sun and the Moon look the same size in the sky, these relative distances must also be the relative sizes of the Sun and Moon themselves.

The hard part is measuring the angle. You need it to be exactly a half Moon. You need to measure the angle between the Sun (without looking at it directly), you and the Moon.

But it gets worse, if you’re even slightly wrong with your measurement, the relative distances and sizes change by a surprisingly large amount. This is because the angle is very nearly 90° — and that is because the Sun is really very much farther away than the Moon and very much bigger. In fact the angle is about 89 5/6°, which indicates that the Sun is about 340 times the size of the Moon. But if you measure it as 89 4/6°, you’ll get the Sun being only 170 times the size of the Moon, in other words half the size (and therefore one-eighth of the volume, and one-eighth of the mass). So your measurement needs to be incredibly precise!

Notes

The diagram above is merely intended to indicate the relative positions of the Sun, Moon and Earth. The sizes are not to the same scale (compared with the Earth, the Sun should be vastly bigger than shown and the Moon slightly smaller) and the distances are not in proportion (the Earth-Sun distance is substantially bigger than the Earth-Moon distance – indeed, the triangle should look almost like two parallel lines).

The Sun is, in fact, almost exactly 400 times the size of the Moon, by which I mean its radius is about 400 times the radius of the Moon. This means that its volume is 400×400×400 times the volume of the Moon, which makes it 64 million times as large. (The ratio of the masses of the Sun and the Moon is different, however, because they do not have the same average density.)

The figures of 340 and 170 come from tan(89 5/6°) and tan(89 4/6°), which actually represent the Moon:Sun and Moon:Earth distance ratios. It’s an incredible coincidence that, although the angle differs by one-sixth of a degree, one tangent is (almost exactly) double the size of the other. (In fact, it’s double to one part in a hundred thousand!)

Why do we use radians? After all, there’s nothing wrong with degrees. Everyone understands degrees. Almost any fraction of a circle is a whole number of degrees. In radians, it seems like every angle is an irrational number. So what’s the point?

The truth is that no-one uses radians to measure angles. We use radians because it makes the graph of sin(x) look nice. If you draw the graph of sin(x) in degrees, it is a virtually flat, featureless graph: the y-axis values vary only from –1 to +1, yet the x-axis values vary from 0° to 360°. But if you draw the graph of sin(x) in radians, the x-axis values vary from 0 to about 6. So the graph has the familiar S-shape.

This is not a trivial point. Look at the graph of sin(x) in radians near to the origin. It looks like a straight line. Specificially, it looks like the line y = x. So, for small values of x, sin(x) is pretty much the same as x – the sine of an angle is equal (almost) to the angle itself, provided the angle is small and is measured in radians.

So what is sin(18°)? Well, 18° is a tenth of 180°, so it’s a tenth of π, i.e. about 0.31. So the sine of 18° is equal to the sine of 0.31 radian. But the sine of angle in radians is approximately equal to the angle itself. Thus sin(18°) ≈ sin(0.31) ≈ 0.31. Easy!

(If you check this on a calculator, you’ll see it’s correct!)

But how do we find sin(54°) – for angles that large, the sine graph doesn’t look at all like a straight line. So what do we do then? What does the calculator do? (Hint: the graph of sin(x) looks a little bit like a cubic equation between –180° and +180°.)

It is highly unusual to have infinity as a term in the middle of a sequence – whatever could it be?

Well, the nth term is the number of convex regular polytopes in n dimensions. Ah! That explains it?

It’s easiest to start with the second term of the sequence. It’s the number of regular polygons that you can make. A regular polygon is a plane figure with straight sides all of equal length. The most obvious is perhaps a square. But we additionally require the polygons to be convex: this means that the sides cannot turn in on themselves. The figure below on the left is a convex regular pentagon; the figure on the right is also a regular pentagon, but it is not convex.

It should be fairly obvious that you can construct regular polygons with any number of sides:

So the number of convex regular polygons is infinite. And that’s the second term of the sequence.

What about the others? Let’s look at the third term: 5. This is the number of regular polyhedra. That is, the number of three dimensional shapes, where each face is a regular polygon. (Again, we require them to be convex.) The five regular polyhedra are called the Platonic solids, and they are illustrated in the photograph at the top of this blog. From left to right: icosahedron, dodecahedron, cube, octahedron and tetrahedron. It is not possible to construct any other regular polyhedra: the angles won’t fit together to form a closed solid.

In informal language, then, the sequence is the number of regular shapes in one dimension, two dimensions, three dimensions, and so on. The word “polytope” is the generic term that covers polygons (two dimensions), polyhedra (three dimensions), and all the others in higher dimensions. Perhaps surprisingly, it is the two-dimensional world that offers the greatest variety.