Monday, March 8, 2010

Suppose I choose two (different) real numbers, by any process I choose. Then I select one at random (p= .5) to show Nick. Nick must guess whether the other is smaller or larger. Being right 50% of the time is easy. Can he do better?

Of course, it wouldn't be an interesting question if the answer was no. Randall Munroe (author of the xkcd webcomic) offers a solution as follows:

1. Call the number you see x. Calculate p(x) = (e^x)/(1 + e^x)2. Draw u, a random uniform(0,1).3. If u < p(x) then guess the hidden number is smaller than the revealed one. Otherwise, guess that it is smaller.

Munroe shows that the probability of guessing correctly is

N = 0.5 * (1-p(A)) + 0.5 * p(B)

where A is the smaller of the two numbers. In practice, this only gains us an advantage when the observed value is between about -20 and 20-- outside that range, calculation of p(x) results in numbers so close to 0 or 1 that no improvement over a 50% rate can be achieved. If I knew Nick were using this method, I could keep him to a 50% rate by always choosing numbers over 1000 or so. He can improve Munroe's solution by using p(x/100000), though at the cost of a poorer improvement near 0.

We can just barely detect the improvement-- but it took 10,000,000 trials to find CI that reliably exclude the possibility of a 50% success rate.

R

We begin by writing a function to calculate p(x/100000).

xkcdprob = function(x) {1/(1 + exp(-1 * x/100000))}

Then we write a function to perform the experiment n times. Here we use the same process for picking the two numbers, choosing them to be Uniform between 1000 and 2000. In the code below we use the ifelse() function (section 1.11.2) to decide which envelope is shown first. The vectorization in R allows us to avoid loops entirely in writing the code.

Subscribe to SAS and R!

Search the SAS and R Blog

The book (second edition, 2014)

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."—Frances Denny, Journal of the Royal Statistical Society, Series A