Sunday, November 18, 2012

Secret Santa - again

Based on comments by cellocgw I decided to look at last week's Secret Santa again. This time, the moment a person, whoever that is, draws his/her own name, the drawing starts again at the first person.

Introduction

A group of n persons draws sequentially names for Secret Santa. Each person may not draw his/her own name. If a person draws his/her own name then all names are returned and the drawing starts again. Questions are such as: How often do you draw names?

We can also extract where the redraws occur. In general there is 0.5 redraw because of the first person, 0.4 because of the second, etc. The numbers do not add to 1, they are not chances.

colSums(countstop)/nrep

[1] 0.54276 0.40735 0.31383 0.24834 0.20415

Calculations

The question is, can we achieve the same with a calculation. For this we obtain the chances of various results. For this we build three matrices. All permutations in pp. Continuation of a sequence is in permitted. Finally, redraw contains the person where a person causes a new draw. Trick here is that if person 2 causes a redraw, then no subsequent persons cause a redraw, hence only 2 is marked in redraw.

pp <- randtoolbox::permut(nn)

redraw <- matrix(FALSE,nrow(pp),nn)

permitted <- redraw

redraw[,1] <- pp[,1] ==1

permitted[,1] <- pp[,1]!=1

for(i in 2:nn) {

permitted[,i] <- pp[,i]!=i & permitted[,i-1]

redraw[,i] <- pp[,i] == i & permitted[,i-1]

}

The sequences start like this.

head(pp)

i i i

[1,] 5 4 3 1 2

[2,] 5 4 3 2 1

[3,] 5 4 1 3 2

[4,] 5 4 2 3 1

[5,] 5 4 1 2 3

[6,] 5 4 2 1 3

head(permitted)

[,1] [,2] [,3] [,4] [,5]

[1,] TRUE TRUE FALSE FALSE FALSE

[2,] TRUE TRUE FALSE FALSE FALSE

[3,] TRUE TRUE TRUE TRUE TRUE

[4,] TRUE TRUE TRUE TRUE TRUE

[5,] TRUE TRUE TRUE TRUE TRUE

[6,] TRUE TRUE TRUE TRUE TRUE

head(redraw)

[,1] [,2] [,3] [,4] [,5]

[1,] FALSE FALSE TRUE FALSE FALSE

[2,] FALSE FALSE TRUE FALSE FALSE

[3,] FALSE FALSE FALSE FALSE FALSE

[4,] FALSE FALSE FALSE FALSE FALSE

[5,] FALSE FALSE FALSE FALSE FALSE

[6,] FALSE FALSE FALSE FALSE FALSE

Chance of succes

The chance of a success drawing is the mean of the last column in permitted. Below a comparison with the simulation result. First the observed proportions.

byrow <- as.data.frame(table(rowSums(countstop))/nrep)

head(byrow)

Var1 Freq

1 0 0.36938

2 1 0.23036

3 2 0.14788

4 3 0.09337

5 4 0.05929

6 5 0.03673

Now the matching calculation. The numbers can be calculated easily.

(p.succes <- mean(permitted[,nn]))

[1] 0.3666667

byrow$n <- as.numeric(levels(byrow$Var1)[byrow$Var1])

byrow$p <- sapply(byrow$n,function(x) p.succes*(1-p.succes)^x)

byrow[,c(1,3,4,2)]

Var1 n p Freq

1 0 0 3.666667e-01 0.36938

2 1 1 2.322222e-01 0.23036

3 2 2 1.470741e-01 0.14788

4 3 3 9.314691e-02 0.09337

5 4 4 5.899305e-02 0.05929

6 5 5 3.736226e-02 0.03673

7 6 6 2.366277e-02 0.02220

8 7 7 1.498642e-02 0.01440

9 8 8 9.491398e-03 0.00988

10 9 9 6.011219e-03 0.00603

11 10 10 3.807105e-03 0.00396

12 11 11 2.411167e-03 0.00240

13 12 12 1.527072e-03 0.00168

14 13 13 9.671458e-04 0.00089

15 14 14 6.125256e-04 0.00053

16 15 15 3.879329e-04 0.00041

17 16 16 2.456908e-04 0.00021

18 17 17 1.556042e-04 0.00014

19 18 18 9.854933e-05 0.00012

20 19 19 6.241457e-05 0.00005

21 20 20 3.952923e-05 0.00004

22 21 21 2.503518e-05 0.00003

23 22 22 1.585561e-05 0.00001

24 28 28 1.023239e-06 0.00001

Where fall the redraws

This is actually a more difficult calculation (or I forgot too much probability). Luckily a bit of brute force comes in handy. To reiterate, in general simulated data shows 0.54 redraws because of the first person etc.

colSums(countstop)/nrep

[1] 0.54276 0.40735 0.31383 0.24834 0.20415

So, what happens in a drawing? The outcomes follow from the matrix redraw. There is 20% chance the first person draws 1, 25% chance the second person draws a 2 etc. Finally, as established, the chance is 36% to have a good draw.

(p.onedraw <- c(colSums(redraw)/nrow(redraw),p.succes) )

[1] 0.20000000 0.15000000 0.11666667 0.09166667 0.07500000 0.36666667

The function below takes these numbers and a locator vector to return a data frame with chances, location of fail and success status in column finish

one.draw <- function(status.now,p.now,p.onedraw) {

la <- lapply(1:(nn+1),function(x) {

status <- status.now

if (x>nn) finish=TRUE

else {

status[x] <- status[x] +1

finish=FALSE}

list(status=status,p=p.onedraw[x],finish=finish)

})

res <- as.data.frame(do.call(rbind, lapply(la,function(x) x$status)))

res$p <- sapply(la,function(x) x$p*p.now)

res$finish <- sapply(la,function(x) x$finish)

res

}

status.now <- rep(0,nn)

names(status.now) <- paste('person',1:5,sep='')

od <- one.draw(status.now,1,p.onedraw)

od

person1 person2 person3 person4 person5 p finish

1 1 0 0 0 0 0.20000000 FALSE

2 0 1 0 0 0 0.15000000 FALSE

3 0 0 1 0 0 0.11666667 FALSE

4 0 0 0 1 0 0.09166667 FALSE

5 0 0 0 0 1 0.07500000 FALSE

6 0 0 0 0 0 0.36666667 TRUE

Where the sequence is not finished, the same chances apply again. For this a second function is build. Same outcomes are combined to restrict the number of outcomes.

No comments:

Post a Comment

Wiekvoet

Wiekvoet is about R, JAGS, STAN, and any data I have interest in. Topics range from sensometrics, statistics, chemometrics and biostatistics. For comments or suggestions please email me at wiekvoet at xs4all dot nl.