The one thing that particularly worries me about the Department of Health in its various semi-devolved guises making 40% cuts to non-NHS spending is that some of the activities I get involved in or advise on, which rely on accurate data, can appear beguilingly simple to cut by falling back on existing data sources, but the devil is in the detail. It is very hard to draw meaningful conclusions from data that were not collected for that purpose, but when the analytical team or their boss is pressed to give a one-line summary to the politicians, it all sounds hunky dory. The guy holding the purse strings might never know that the simple one-liner is built on flimsy foundations.

I got a bit bored (sorry Mike), and wrote this. I didn’t take long (I tell you that not so much to cover my backside as to celebrate the majesty of R). First, I estimated probabilities of a day being rainy if the previous was dry, and of it being rainy if the previous day was rainy. I couldn’t be bothered with any thinking, so I used ABC, which basically is an incredibly simple and intuitive way of finding parameters than match data. I started with 16 rain days in July and 15 in August, in my home town of Winchester (which is also Swithun’s hood) from here, and that led to probabilities that I plugged into a simulator (it’s only AR1, but I’m no meteorologist). That ran 10,000 years of 40-day periods (there’s a sort of run-in of ten days to get to a stable distribution first; it’s basically a Markov chain), and not a single one had rain for 40 days.

If you have any interest in using data to improve public services like education or healthcare, whether enthusiastic or sceptical, read this article. The story is a familiar one to me but rarely sees the public eye in such careful detail as it does here. The road to hell, as you know, is paved with dashboards, performance indicators and league tables.

Righton Johnson, a lawyer with Balch & Bingham who sat in on interviews, told me that it became clear that most teachers thought they were committing a victimless crime. “They didn’t see the value in the test, so they didn’t see that they were devaluing the kids by cheating,” she said. Unlike recent cheating scandals at Harvard and at Stuyvesant High School, where privileged students were concerned with their own advancement, those who cheated at Parks were never convinced of the importance of the tests; they viewed the cheating as a door they had to pass through in order to focus on issues that seemed more relevant to their students’ lives.

Lewis said, “I know that sometimes when you’re in the fight, and you’re swinging, you want to win so badly that you don’t recognize where your blows land.”

There have been similar stories in the UK news recently, but you can get it all from Parks, so I suggest just reading this and carrying those cautionary ideas around with you.

One patient – so unlucky as to have also been caught up in a previous terrorist attack – was only reassured after being taken to a bookmakers to see how long the odds were of being in a third.

It makes me wonder how statisticians can help provide this sort of information to clinicians caring for people with PTSD and related problems. It is so easy to construct comparisons that are not quite right (horse-riding=ecstasy, for example, or Sally Clark, or lightning = cricket balls), so it is unfair to expect the doctor or psychologist to put them together as required. Perhaps future generations will have a more advanced grasp of statistics from school, but I wouldn’t count on it being able to overcome pathological distortions of risk perception. There may be a role for the quantified-self sort of data here; consider the popularity of pedometers and GPS for knowing how much exercise you really are doing.

When StatsLife collected responses to the BASP p-value ban (see blogs hither and yon), I suggested they contact Ian Hunt, a wise and philosophically minded critical voice in the wilderness of cookbook analysts. I also know that he and I take rather divergent opinions on deduction and induction and such, but I hold his arguments in the highest respect because they are carefully constructed. Alas! he couldn’t send in a response in time, but here it is, reproduced with his kind permission:

How good are the reasons given by the editors of Basic and Applied Social Psychology (BASP) to ban hypothesis tests and p-values?

I argue that BASPs reasoning for banning “statistical inference” is weak.

First, they (the editors) offer a non sequitur: ban p-values because “the state of the art remains uncertain” (2015 editorial) and “there exists no inferential statistical procedure that has elicited widespread agreement” (2014 editorial). I argue inter-subjective (dis)agreement is not decisive.

Secondly, they imply that good inductive inferences require posterior probabilities. This is contentious, especially since both posteriors and p-values are just deductions.

Thirdly, they plead for larger sample sizes “because as the sample size increases, descriptive statistics become increasingly stable and sampling error is less of a problem.” This is contradictory: the evidence for this claim is best shown by the statistical inferences being banned.

Fourthly, they correctly assume that with a large enough sample size many “significant effects” (or “null rejections” or low p-values or interesting things or whatever) can be identified by looking at canny descriptive statistics and adroitly drawn charts. But I believe p-values ARE descriptive statistics – with which both frequentists and Bayesians can work.

Finally, BASP “welcomes the submission of null effects”. But without tests and concomitant power profiles the evidential value of a “null effect” is unclear.

BASPs editors appear to conclude that modern statistics is inductive and akin to “the art of discovery” (as David Hand puts it). Fair enough. But I conclude that careful deductive inferences, in the form of hypothesis tests with clear premisses and verifiable mathematics, still have a role in discovering interesting things.

Now, it would be unfair of me to say anything more on this here but I believe you can hear Ian talking on this very subject at this year’s RSS conference, which is in Exeter. Personally, I’ve never been to Exeter, and I don’t think this is going to be the year for it either, but as Southern towns go, I suspect it’s neither as depressing as Weymouth nor as humourlessly overrated as Salisbury. (That counts as enthusiasm round here.) I recommend the conference to you. It’s just about optimal size and always interestingly diverse.

I have been working on a Stata add-on command to fit Bayesian models using Stan, and this is now out for testing. In this post, I want to introduce it, explain why it’s important, and encourage you all to try it out and give your feedback. I have already used it ‘in anger’ in two substantive projects, predicting student achievement and diagnosing tuberculosis of the eye, and it works fine – at least on my computers.

Stata version 14 includes, for the first time, Bayesian commands which provide the Metropolis-Hastings algorithm and the Gibbs sampler. These are the procedures available in the popular software BUGS and JAGS. However, there are many situations where they can grind along very slowly. The most common cause is a lot of correlated parameters, for example in a Bayesian multilevel model where each cluster gets its own parameter. Broadly speaking, you can picture this as the Gibbs sampler tries out each parameter one at a time, conditional on the others. This is like trying to move diagonally across a grid of orthogonal streets: it will take a while to get to the other side. Sometimes you can rotate the grid (orthogonal parameterisation), but sometimes you can’t. What you really need is a different algorithm, which is where Hamiltonian Monte Carlo (HMC) comes in. Radford Neal’s chapter in the Handbook of Markov Chain Monte Carlo provides an excellent overview. Further improvements in efficiency were achieved with the No U-Turn Sampler (NUTS), proposed by Hoffman and Gelman and explained in full in their 2014 paper. This was the starting point for Stan.

Stan is collaboratively-built, open-source software to run HMC, NUTS and more. HQ is at Columbia University, but the developers are based all over the world. Personally, I think it’s amazing, and they don’t pay me to say that. It is stable and superfast and tries to help you get the model code right, which is more than can be said for its Gibbs predecessors. There’s a very active and supportive online community on Google Groups. At heart, it’s a C++ library, but you don’t have to tangle with that stuff because there’s a command-line interface and interfaces for R, Python, Julia and Matlab. Now there is one for Stata too.

There is currently only one place where you can get StataStan: https://github.com/stan-dev/statastan/tree/alpha-test, where you can get the main stan.do file and a stan-example.do file. It is under alpha-testing, meaning that we have not run it on every combination of Stata version, flavor and operating system, and need to be assured that there are no fundamental incompatibilities before we move to beta-testing. This is where you come in: please try it out and let us know if it’s OK or not, also what version and flavor of Stata you have (like “12/SE”) and what operating system you’re using, including version and 32/64-bit if using Windows.

When it goes to beta-testing I’ll add it to my website so you could download from there inside Stata, and we’ll put links on the Stan homepage. When it passes that, and all the important wishes have been incorporated, I’ll send it to the SSC repository. I will update this blog post as we pass each of those hurdles. The latest stable and under-development versions will always be on GitHub, so if you are registered with them, you can contribute to StataStan. Don’t be shy, there’s plenty to do.

Now, here’s a toy example of using it, where you have a model file called bernoulli.stan (this is contained in the examples folder when you install Stan) that contains this model:

That means there are two parts to the data: N, the total number of observations, and y a vector of integers of length N. Our model is that y arose from a Bernoulli process with probability parameter theta, and we are putting a flat prior on theta anywhere from zero to one. That prior is specified as a beta distribution in the Stan example folder but you could make it even more efficient with a uniform distribution (HMC is more forgiving of uniform priors than M-H/Gibbs). Then in Stata, you can make some silly data like this:

clear
set obs 10
generate y=0
replace y=1 in 1/2

That’s basically two 1s and eight 0s. OK, now get ready to set the world alight by estimating the probability of success when you’ve just got 2 out of 10 in an experiment. StataStan will pass the variables you specify over to Stan, as well as global macros, so let’s put the total number of observations into a macro called N:

The options for the stan command are explained at the top of the stan.do file, and in the GitHub ‘README’ file. Soon, they will move into a Stata help file and a pdf manual. In this case we say that we want to send the variable called y, we name the file that contains the model, point out the location of Stan (you’ll need to change this for your computer), say that we want to send the global macro called N to Stan, along with the variable y, that we want to load the chains of HMC steps back into Stata when it’s done, and that we want to find the posterior mode as well as the default mean and median.

It’s also possible to specify the model inline, which is to say inside your do-file, so you don’t have to mess around with Stata and a text editor open side-by-side. You can read about the different ways to achieve this in the stan-example.do file on the GitHub page.

Note: on a Windows computer, you won’t see progress in the Stata output window; all the output will appear when it’s done. It’s a Windows thing; what can I say? Compiling even this toy example can take a couple of minutes, so don’t panic if you see nothing happening, but I’ve got a cunning plan to get around this and will add it to StataStan soon.

This interactive visualisation of Human Development Index values, by country and over time, was released last week.

For me, it follows in the mould of The State of Obesity, but is much more transparent in how it is constructed when you look at the source code. That makes it a good exemplar — in fact, perhaps the exemplar currently available — for introducing people to the possibility of making interactive dataviz for their research projects.

Oh for those early days of D3, when nobody was terribly fluent with it, and websites would have all the code right there, easy to read and learn from.

That transparency is important, not just for teaching about dataviz, but for the whole community making and innovating interactive data visualisation. Oh for those early days of D3, when nobody was terribly fluent with it, and websites would have all the code right there, easy to read and learn from. Now they are tucked away in obscure links upon links, uglified and mashed up with other JS libraries, familiar to the maker but (probably) not you. There are obvious commercial pressures to tucking the code away somewhere, and you can actually obtain software to obfuscate it deliberately. At the same time, having everything in one file, hard-coded for the task at hand, may be easy to learn from, but it isn’t good practice in any kind of coding culture, so if you want to be respected by your peers and land that next big job, you’d better tuck it all away in reusable super-flexible in-house libraries. And yet, the very availability of simple D3 code was what kick-started the current dataviz boom. Everyone could learn from everyone else really quickly because everything was open source. I don’t like to think that was a short-lived phase in the early part of the technology’s life cycle, but maybe it was…

Anyway, that’s enough wistful nostalgia (I learnt yesterday that I am the median age for British people, so I am trying not to sound like an old duffer). Here’s the things I don’t like about it:

it requires a very wide screen; there’s no responsiveness (remember your audience may not work in a web design studio with a screen as big as their beard)

life expectancy gets smoothed while the other variables don’t – just looks a bit odd

why have colors for continents? Doesn’t it distract from the shading? Don’t we already know which one is which?

Why give up on South Sudan, Somalia (which seems to be bunged together with Somaliland and Puntland in one big “hey it’s somewhere far away, they won’t notice” sort of way) and North Korea? Aren’t these countries’ estimates kind of important in the context, even if they are not very good? Do you really believe the Chinese estimates more than these just because they’re official?

But all in all, these are minor points and a nice amount of grit for the mill of students thinking about it. I commend it to you.