Sunday, May 25, 2014

I am a listener to BBC's podcast More or Less. In the program Tim Harford looks at data with both humour and determination to find what the numbers mean. Last week he handled a listener question. Does everybody get a significant birthday (20, 30 years etc.) in a weekend. His back of the envelope answer was, yes, we think so. Given the regularity by which days shift over the years, it should happen by 60 latest. If anybody has this at 70, please contact us. This blog post tries to answer by which youngest age anybody born in the previous century has a significant birthday in the weekend and concludes that indeed by 60 latest it should have happened. Not surprising, 20 years has 2/7=28% of days. Somewhat surprising, 30, 40 and 50 are approximately equally probable (3/14=21%).
Additional note. I was not the only one who did this calculation. This weekend he mentioned some other listeners who emailed equivalent results to him.

Sunday, May 18, 2014

As final post on European MEPs voting I wanted to look at the individual MEP. The variables examined are how often present and how often present but not voted. The latter might be a marker of sign in and slope off. The analysis chosen is a hierarchical Bayesian analysis, which should push individual's outcomes towards the population behavior.
As mentioned before, votewatch's data describe how often MEPs voted what in the European Parliament. For each MEP the number of votes, percentages Yes, No, Abstain, number of elections and number of elections not voted. A description of the data can be found in this pdf.

Reading data

Data read as previously. All code shown at end of the post.

Modelling Data

Initially I wanted to use a beta binomial model as the beta distribution is the conjugate prior for the binomial and often seen in the text books. Then I realized the usual approach would be using the logit transformation. The wish to compare the two approaches lead me to adding the probit and finally a different way to calculate the beta binomial. Calculations are done in Jags. In text the models are displayed, end of text is full code. The number of samples is chosen 1000, which is more than needed for interpretation of the data, but with the comparison between methods a few extra samples seemed nice.

Model 1: beta binomial

The model is simple enough not to need much explanation. Priors on a and b are intended to be non-informative.

mbb1 <- function() {

for (i in 1:N) {

nchosen[i] ~ dbin(p[i],nvotes[i])

p[i] ~ dbeta(a,b)

}

a ~ dexp(.001)

b ~ dexp(.001)

}

Model 2: Beta binomial

The thought leading to this specification is that with beta distribution conjugate prior, it should be possible to calculate posteriour distributions for probabilities outside of Jags, thereby saving Jags a bit of time and saving a load of data transfer between Jags and R.

mbb2 <- function() {

for (i in 1:N) {

nchosen[i] ~ dbetabin(a,b,nvotes[i])

}

a ~ dexp(.001)

b ~ dexp(.001)

}

Model 3: Logit

The model is reasonable straigthforward. The awkward names for mean and sd, ap and asd respectively, are chosen thus so samples can be extracted similar to model 1.

mlogit <- function() {

for (i in 1:N) {

nchosen[i] ~ dbin(p[i],nvotes[i])

logit(p[i]) <- lp[i]

lp[i] ~ dnorm(ap,tau)

}

ap ~ dnorm(0,.001)

tau <- pow(asd,-2)

asd ~dunif(0,10)

}

Model 4: Probit

This is just replacing logit() by probit().

mprobit <- mlogit <- function() {

for (i in 1:N) {

nchosen[i] ~ dbin(p[i],nvotes[i])

logit(p[i]) <- lp[i]

lp[i] ~ dnorm(ap,tau)

}

ap ~ dnorm(0,.001)

tau <- pow(asd,-2)

asd ~dunif(0,10)

}

Results

Presence of MEPs

The beta distribution obtained shows that most of the MEPs are signed in. MEPs with more that 20% of the votes not singed in are rare.

The best performing MEPs are shown below for beta binomial 1, logit and probit model are shown below. The results show different MEPs for each calculation. This is an artifact due to using a sampler, these MEPs are way to close to each other for meaningful ordering

beta-binomial

Last.Name First.Name National.Party

463 STAVRAKAKIS Georgios Panhellenic Socialist Movement

218 JEGGLE Elisabeth Christlich Demokratische Union Deutschlands

80 AUDY Jean-Pierre Union pour un Mouvement Populaire

452 DESS Albert Christlich-Soziale Union in Bayern e.V.

273 LISEK Krzysztof Platforma Obywatelska

372 ZEMKE Janusz Władysław Sojusz Lewicy Demokratycznej

Mean p5 p95

463 0.001439957 4.612416e-05 0.004372737

218 0.001450050 3.465712e-05 0.004839997

80 0.001470705 4.752273e-05 0.004617277

452 0.001491458 3.185192e-05 0.005023073

273 0.001497189 4.672592e-05 0.004592358

372 0.001499812 5.112743e-05 0.004861630

logit

Last.Name First.Name Mean p5 p95

57 BACH Georges 0.002845830 0.0005561477 0.006789563

514 LUDVIGSSON Olle 0.002854100 0.0005746037 0.006441176

548 SCHALDEMOSE Christel 0.002882655 0.0006066471 0.006735655

372 ZEMKE Janusz Władysław 0.002925083 0.0006555610 0.006699789

85 SCHWAB Andreas 0.002927801 0.0006034058 0.007039026

273 LISEK Krzysztof 0.002940536 0.0006825662 0.007001949

Probit

Last.Name First.Name Mean p5 p95

687 EHRENHAUSER Martin 0.002929976 0.0006652270 0.006671044

57 BACH Georges 0.002935575 0.0006351852 0.006839898

98 MATULA Iosif 0.002947576 0.0006889951 0.006754586

449 CERCAS Alejandro 0.002953899 0.0006167549 0.007239345

564 BELDER Bastiaan 0.002961148 0.0006210927 0.007109177

184 LANGEN Werner 0.002967294 0.0006372111 0.006767979

The worst MEPs are quite different from each other, the calculations yield similar order

It thus seems that beta binomial have slightly different outcomes than logit and probit, with beta-binomial giving slightly smaller parameter values. The shift for PETROVIĆ JAKOVINA is due to less data. It appears beta binomial had a stronger effect of the prior. Last.Name TotalPossible nNotIn5 PETROVIĆ JAKOVINA 40 1527 VADIM TUDOR 514 179734 MORVAI 514 167

Signed in, not voting

To save duplication, only logit and beta-binomial are shown. It would seem that signing in but not voting is structural, the typical MEP does this a few to 15% of the votes.

The best MEPs in this respect include an independent, which did surprise me. As a Dutch voter, having a member of SGP there does not surprise me, while strong Christianity is not my political flavor, they do have a trustworthy reputation many other politician can be jealous of. In terms of logit against beta-binomial, the logit now has lower numbers, which suggests that the beta-binomial again has again more pull toowards the population.

beta binomial

Last.Name First.Name National.Party

350 BŘEZINA Jan Independent

175 GROSSETÊTE Françoise Union pour un Mouvement Populaire

21 MAZEJ KUKOVIČ Zofija Slovenska demokratska stranka

333 BRATKOWSKI Arkadiusz Tomasz Polskie Stronnictwo Ludowe

564 BELDER Bastiaan Staatkundig Gereformeerde Partij

55 PAPANIKOLAOU Georgios Nea Demokratia

Mean p5 p95

350 0.004855719 0.0012133168 0.01027123

175 0.005019880 0.0011882656 0.01056993

21 0.006102050 0.0007614780 0.01547558

333 0.006392797 0.0007950631 0.01628781

564 0.006590296 0.0021129110 0.01337183

55 0.006678283 0.0022042544 0.01360759

logit

Last.Name First.Name Mean p5 p95

175 GROSSETÊTE Françoise 0.007681981 0.003254561 0.01387197

350 BŘEZINA Jan 0.007820303 0.003207069 0.01421726

55 PAPANIKOLAOU Georgios 0.008982877 0.003933873 0.01604022

452 DESS Albert 0.009006247 0.003923229 0.01594389

376 CUTAŞ George Sabin 0.009011560 0.003966942 0.01585792

237 CANCIAN Antonio 0.009144514 0.004149473 0.01669278

Finally, the worst politicians in this respect. Note that Jerzy BUZEK was president of the EP 2009-2012, Martin Schultz president of EP since 2012, which probably means they were there but did not vote in that role. Comparing beta-binomial to logit, the beta-binomial again pulls a bit stronger to the population.

Sunday, May 11, 2014

Following last week's short examination, I now wanted to drill down a bit more in the voting behaviour as given in data from votewatch.eu on voting of MEPs.Votewatch's Data describe how often MEPs voted what in the European Parliament. For each MEP the number of votes, percentages Yes, No, Abstain, number of elections and number of elections not voted. A description of the data can be found in this pdf.

Data

Compared to last week there is a slight adaptation in reading data. The same preprocessed spreadsheet is used, but next processing has been adapted.

Group data

In this section data per group is analyzed. The analysis results are mean estimates, their dispersion is a function of the group size. The analysis result has three stages

proportion present,

proportion voted conditional on being present,

proportion votes Yes, No, Abstain, conditional on voting.

This means the proportions do not add to 1. There is an ANOVA calculation in the analysis. However, since it is highly significant for all responses, they are not displayed.
The script itself consists of a wrapper around a glm() call. This was the most easy approach when I did think displaying ANOVA would be a good idea. It is retained because it makes the dependent variable used clearly readable.

test1 <- function(response) {

g1 <- glm(response ~ Group,

data=r1,family=binomial)

g0 <- glm(response ~ 1,

data=r1,family=binomial)

print(anova(g1,g0,test='Chisq'))

g1

}

g1 <- test1(with(r1,

cbind(

nNotIn,

nIn=TotalPossible-nNotIn)))

h1 <- test1(with(r1,

cbind(

nNoVote,

nVote=TotalPossible-nNotIn-nNoVote)))

i1 <- test1(with(r1,

cbind(

nYES,

nNotYES=TotalDone-nYES)))

j1 <- test1(with(r1,

cbind(

nNO,

nNotNO=TotalDone-nNO)))

k1 <- test1(with(r1,

cbind(

nAbstain,

nNotAb=TotalDone-nAbstain)))

It would be nice to display the results graphically. For this a prediction variable is made. In addition, the results are merged with decent labels

ug <- unique(subset(r1,,Group))

lpreds <- lapply(list(g1,h1,i1,j1,k1),

function(x) {

predsg <-

as.data.frame(predict.glm(x,ug,type='response',se.fit=TRUE)[-3])

predsg$Group <- ug$Group

predsg$Q <- dimnames(x$model[[1]])[[2]][1]

predsg

})

preds <- do.call(rbind,lpreds)

rel <- data.frame(

Q=c("nNotIn", "nNoVote", "nYES", "nNO", "nAbstain"),

Vote=factor(1:5,labels=c(

'Not Present',

'Not Voted',

'Yes','No','Abstain')))

preds <- merge(preds,rel)

limits <- aes(ymax = fit + 2*se.fit, ymin=fit- 2* se.fit)

p1 <- ggplot(preds, aes(y=fit, x=Group,col=Vote))

p1 + geom_point(stat="identity") +

geom_errorbar(limits, width=0.25) +

ylab('Proportion') +

coord_flip()

One can see that each of the groups in general votes sooner Yes than No.

As I found the result a bit disappointing, I made a mosaic plot where distribution of results are next to each other.

ag <- aggregate(subset(r1,,c(nYES,nNO,nAbstain,nNoVote,nNotIn)),

by=list(Group=r1$Group),

sum)

lag <- reshape(ag,direction='long',

idvar='Group',

times=names(ag[-1]),

timevar='Q',

v.names='Counts',

varying=list(names(ag[-1])))

lag <- merge(lag,rel)

mosaicplot(xtabs(Counts ~ Group + Vote ,data=lag),

color=rainbow(nlevels(lag$Group)),

main='Votes',

las=2,cex=.7)

By Party

At this point, I decided that the first plot would be interesting if displayed by party. However, there are far too many parties. So, to get a bit of order, the focus is on 'interesting' parties. Interesting in this case is anything not S&D, ADLE/ADLE and EPP because these are far too predictable. This leaves still quite some parties, so they have to be ordered by country too. Since there are a number of parties (e.g. independent) which are present in several countries, these to be labelled by country too.

"People for Real, Open and United Democracy / Conservative Party for Democracy and Success"

] <- "People for Real, Open and United Democracy / ..."

The plot is still a bit large. In the top the UK has a number of parties which do know where the No button is. As do Dutch 'Partij voor de Vrijheid' and French 'Mouvement pour la France'. In contrast, 'Front National' is probably better described as Abstained. Do not think this is a right wing thing, the 'Communist Party of Greece' likes the No button too, though not as much as the parties mentioned earlier.

Wiekvoet

Wiekvoet is about R, JAGS, STAN, and any data I have interest in. Topics range from sensometrics, statistics, chemometrics and biostatistics. For comments or suggestions please email me at wiekvoet at xs4all dot nl.