BACHELOR THESISLotteries and Testing the Randomness of the Numbers Drawn

2011

M ILAN M RZEK

DECLARATIONI declare that I made the bachelor thesis titled Lotteries and Testing the Randomness of the Numbers Drawn alone and that all materials I used are mentioned in the bibliography.

In Plzen

authors signature

ACKNOWLEDGEMENTI would like to thank my supervisor, Ing. Jan Pospil Ph.D., for his suggestions, pieces of advice and patient guidance throughout the time of developing this thesis.

ii

PrefaceThe subject of the bachelor thesis are the lotteries and testing randomness of numbers drawn. It features derivation of probability formulas regarding the Lotto games, which are further used for analysis of particular Lotto games within the European Union. These games are analysed and compared with respect to the probabilities of winning categories. The next part of the thesis includes a study of the 2 test for testing equidistribution of the sets of balls drawn. Since numbers are drawn without replacement, the 2 statistics is not the usual 2 distribution with N 1 degrees of freedom, where N is the number of imaginary cells. In this case the 2 statistics behaves asymptotically as a sum of independent weighted random 2 variables. Because of this behaviour a special method of computation for p-values has to be used in order to decide whether the tested sets of balls are drawn with equal probability. This modied 2 test is then applied to the available data obtained from the lottery companies within the European Union and the results are presented. Thesis also includes an analysis of the discrepancies found in the article by Genest et al. (2002) published in the Journal of the Royal Statistical Society.

A Content of the CD Bibliography

vi

Chapter 1

Introduction to LotteryLottery is perhaps the most widely known and by far the oldest game of chance. Since the very beginning with slips, woods or simple drawing of lots to nowadays most popular form of lottery with randomly selected balls, the basic structure, technical procedure and the simplicity of this game of chance has remained preserved. The English word lottery has roots in Dutch word loterij, which is derived from the Dutch noun lot meaning f ate. But the roots of lottery itself can be traced back to the second millennium B.C. There is a reference to a game of chance known as the drawing of wood in the early Chinese collection of poems and songs. This game of chance in context appeared to describe drawing of lots. The rst signs of lottery come from the Han Dynasty between 205 and 187 B.C., where ancient Keno slips were discovered. It is believed that proceeds from these lotteries helped to nance the government projects. Also the rst known European occurrence of lottery during the Roman Empire organized by the Roman Emperor Augustus Caesar was used as a way how to raise money, in this case the proceeds went for repairs to the city of Rome. The winners were given prizes in the form of valuable articles. The rst records of lotteries with prizes in the form of money date to 1443-1449 and they come from the Low Countries, which are the historical lands around the low-lying delta of three rivers, the Rhine, Scheldt and Meuse. The Dutch were the rst to shift the lottery prizes to solely monetary prizes and also they were the rst to base the prizes on the actual odds. Thanks to the popularity of lotteries, they were often used as a painless form of taxation. Ofcial lottery in England was designed during the 16th century to raise money for public reparations. Followed by France in the 17th century, where the lotteries became one of the main resources for religious congregations in the 18th century. In colonial America between 1744 and 1776 there were sanctioned over two hundred lotteries and they played a huge role in the nancing of both private and public ventures. The lotteries still remain very popular these days. The most common are the state and national number lotteries, offered and run by states whose regulations allow this type of game of chance. The popularity of lottery is partly due to its transparency and also simplicity. There is no opponent, no dealer, no strategy that can affect the course 1

of the game. All the components are clearly visible: the urn with the balls, the shufing device, the numbers on the players ticket. The players only choose the numbers, buy ticket and wait for the numbers to be drawn, which happens usually once or twice a week. It is very common that the drawings are broadcasted on national TV. Besides the transparency or simplicity, the most important element that helps to the publics fascination with the lottery games are the amounts of winnings. Lotteries usually offer the highest amount of winnings among the legal gambling games available, which every day attracts the players that buy the tickets and dream about their numbers matching the winning combination. The possibility of winning is mathematically very improbable. The high prizes, especially the highest winning categories are of course compensated with the low winning probabilities. These probabilities vary nationally or internationally due to different set of rules or game matrices. Each lottery matrix can be described and numerical probabilities for each matrix can be found. The probabilities are basically the entire ensemble of the lottery game since no real strategy how to win the lottery exists. Some say that the lottery is a tax on people who are bad at mathematics. The following story says something different. In 1992 a group of 28 members organized by 43-year-old businessman Stefan Klincewicz tried to buy all the possible combinations and thus guarantee a jackpot win, which reached 1.7 million. At initial cost of 0.50 for one combination, covering all possible combinations in 6/36 game matrix would cost only 973,896. So the plan was set. The Irish National Lottery noticed an unusual high amount of sold lottery tickets and tried to scupper this plan by limiting the number of tickets any machine could sell, and by turning off terminals, which Klincewiczs team of ticket purchasers was using heavily. Despite all the companys efforts, Klincewiczs team had the winning numbers on the night. Unfortunately two other winning tickets were sold too, so the group could claim only one-third of the jackpot, or 568,682. But many smaller match-5 and match-4 prizes brought its total winnings to approximately 1,166,000. To avoid similar schemes, the National Lottery changed later that year the game matrix to a 6/39 in order to raise the jackpot odds.1 One of the subjects of the mathematical interest connected with the lottery is the so called lottery problem. There are several articles published dealing with the lottery problem like developing Monte Carlo algorithm seeking the smallest possible number of tickets to guarantee at least one winning ticket with m correct matches for any t-subset for lottery ( N , t, m). This particular approach can be found in paper by Braverman and Gueron [1]. Another article on the lottery problem written by Fredi, Szkely and Zubor [2] contains proof that 100 tickets are needed to guarantee 2 correct matches in the Hungarian Lottery. Results of this work were further used by Bougard in the article The lotto numbers L(n, 3, p, 2) [3]. The problem what is the minimum number of tickets so that there is at least one ticket with particular matching combination is investigated also in the article by Jans and Degraeve [4] or in the article A Lotto Systems Problem [5] written by Russel and Grifths. Several strategies for Lotto games are examined such as the numbers that should1 source:

www.independent.co.uk

be due in the article by Heinze [6] or generally the proven strategies for Lotto games in the book [7] by Heinze and Riedwyl. Mathematical models for various playing systems are described in the book The Mathematics of Lottery: Odds, Combinations, Systems by Barboianu [8]. Modelling the probability distribution of prize winnings is another topic described in article by Baker and McHale [9], which delivers a spin-off result, that lottery players may increase the expected value of their tickets by choosing numbers which are less popular with other lottery players. Another researched subject connecting mathematics and lottery is testing the randomness of the numbers drawn. Some methods how to test the randomness of the balls drawn are described in the article by Haigh [10] or Johnson and Klotz [11]. Article written by Genest, Lockhart and Stephens [12] shows one way how to test the randomness of the numbers drawn using 2 properly as opposed to the usual approach to this test using 2 that can be found for example in the book by Woolfson [13]. Another approach also cited in the article [12] is described in the article written by Joe [14]. The aim of this paper is to study and implement the test of randomness introduced by Genest, Lockhart and Stephens in [12] for testing real data obtained from the lottery companies within the European Union. Each lottery game is analysed and compared with respect to the probabilities for its winning categories. In order to analyse the probabilities we established a probability space on which to work in Chapter 2. Chapter 3 describes how to derive formulas for calculating probabilities for various types of games. Next Chapter 4 discusses why to test the randomness of numbers and why to use the method introduced in [12] instead of the usual approach. Chapter 5 shows how to calculate p-values, which differs due the alternative approach to the testing as described in [12]. Chapter 6 contains the analysis of the discrepancies found in the article [12]. The results of the tests for the data obtained from the lottery companies with commentary to the p-values can be found together with the analysis of winning categories in Chapter 7.

Chapter 2

Theory2.1 Dening the Lottery

The most popular form of lottery is that which uses balls with numbers inscribed on them and the rules for giving prizes are based on the quantity of correct numbers predicted by the player that are randomly drawn. Lets dene the following parameters: N the total number of lottery numbers, i.e. numbers that can be drawn are {1, .., N } k the number of balls drawn out of urn without replacement

The whole process of the number lottery game can be described as follows: Player buys a ticket before the draw by marking k predicted numbers on a printed matrix of N numbers on an entry form. The form is scanned electronically and a ticket is printed out and given to the player as a record. Then on the established date and time the draw is performed and the k winning numbers are determined. Both lottery company and player check the winning numbers with the numbers on the bought/sold tickets. If there was a ticket sold that matches some of the winning prize categories, the player is awarded a prize according to the category. Another option how to pick the numbers these days is to use lottery number generator, most lottery companies provide this service. In this case the numbers are pseudorandomly generated by a computer. Each lottery has its own awarding system and numerical parameters. However, we can already distinguish between the various games by referring to them as Lotto k / N . For example we may now refer to a game where 6 winning numbers ranging from 1 to 49 are drawn without a replacement as Lotto 6/49. k/ N represents a certain lottery matrix, the most common within the EU is 6/49, but there are also 5/35, 5/50, 5/90, 6/42, 6/45, 6/48, 6/90, 7/39. See Chapter 7 for EU Lotto games descriptions.

2.2

Probability theory

What we are interested in the lottery game as in every game of chance is some description of possible outcomes. In probability theory we call them events and in our lottery case events are the occurrences of certain numbers or groups of numbers. Machine that performs the drawing generates the outcomes: combinations of k different numbers out of N numbers. We can think of these combinations as the sample space of our experiment, which is drawing k numbers from N numbers without replacement. Sample space is the set of all possible outcomes. All of these events are equally possible to be drawn which is a necessary condition for our probability model. Lets denote the samk elements which is all combinations of k numbers taken ple space . Such set has CN out of N . Game matrix 6/90 5/90 7/39 6/49 6/48 6/42 5/50 5/35 No. of elements 622614630 43949268 15380937 13983816 12271512 5245786 2118760 324632

Table 2.1: Number of elements for lottery matrices within EU national lotteries in decreasing order We consider the eld of events F as being the set of parts of the sample space, so this set is also nite. The eld of events is suitable for a function P given by the classical denition of probability on a nite eld of events with equally possible elementary events. The probability P of event E is a number expressing the chance that event E will occur, in other words it is a ratio between the number of outcomes favourable for E to occur and the number of equally possible outcomes. On a nite eld of events P is a function P : F R and satises these axioms: 1. P( E) 0, E F 2. P() = 1 3. P( E1 E2 ) = P( E1 ) + P( E2 ), for any E1 , E2 F that E1 E2 = With P being probability function we have built a probability space (, F , P) that ensures basic probability model on which to work. 5

Taking for example matrix 5/35, = {(1, 2, 3, 4, 5), (1, 2, 3, 4, 6), ..., (31, 32, 33, 34, 35)}, 5 elements. We can build similar models for working with number of numbers that is C35 drawn up to k for predicting events such as drawing various subsets of numbers. Having the matrix 5/35 and four numbers already drawn the sample space of the probability model for the last number in this particular case would contain 31 elements (35 4) .

Chapter 3

Calculations3.1 Choosing k from N

Let us start with what most people are interested in what are the chances of winning the lottery? Considering that the condition for winning the highest prize is predicting correctly all the k drawn numbers from N numbers in the urn (we play Lotto k/ N ), let this be an event Ek and so probability of winning the lottery is now P( Ek ). We can demonstrate the chance of winning in the following way: Starting with an unmarked matrix on an entry form, there are N numbers we can 1 choose to mark as the rst one and so there is probability of predicting the number N correctly. As soon as we pick the rst number there is N 1 numbers left, which means 1 of predicting the second one correctly. Keeping in mind that there is probability N1 the drawn balls are not returned back, we can see that there are N ( N 1) ways how to choose the rst two numbers. Therefore the probability of predicting the rst two 1 numbers correctly would be . We can continue this way ending up picking N ( N 1) 1 . And this the last k-th number which we will mark correctly with probability Nk+1 way we get: 1 , N ( N 1)...( N k + 1)

P( E) = which can be also written as:

P( E) =

1 N !/( N k )!

However, this P( E) is not the probability of winning the lottery, in fact it is even 7

smaller number than our desired probability, because we are taking into account the order of the numbers, which is not signicant during the draw. It does not matter if we pick up and mark on the entry form the last drawn number as our rst, it will still count as correctly predicted. Therefore we have to divide the denominator by k !, which is the number of possible orders of k numbers in which they could be drawn. Thus the probability of winning the lottery denoted as P( Ek ) is: N! ( N k)!k!

P( Ek ) = 1

N! , which is the ( N k)(k!) number of all possible combinations of k numbers drawn from N numbers. This numk or more generally as: ber can be also written as CN What we have now in the denominator is the number N k N! ( N k)!k!

k is the number of ways of picking k unordered C stands for combinations and CN outcomes from N possibilities. It is also known as choice number and read " N choose k" or as a binomial coefcient or combinatorial number. Now move on to the next possibilities. When drawing k numbers out of N there is only way how to predict them correctly - pick exactly the one unique combination, but for subsets of k there is more than one combination of k numbers that can match k , the subset and therefore there is a higher probability. We have already established CN which is the number of possible combinations for a group of k numbers taken out of N . As written above for predicting k numbers out of k correctly there is of course only one unique combination:

k k

k! k! = =1 (k k)!k! 0!k !

k if n < k. Thus for predicting correctly But there is more than one combination for Cn k n balls of the k balls drawn there is ways how to do that. Moreover there are still n k n losing balls which are drawn from N k numbers and these can be chosen in Nk k Nk ways. Therefore there are in total ways that gives the result of kn n kn picking correctly n balls out of the draw containing k numbers.

We can now write a formula how to calculate a probability of predicting n numbers matching the k balls drawn: k n Nk kn N! . ( N k)!k!

P( En ) =

k , which are the all combinations possiThe number in the denominator is again CN ble. If we put n = k we get exactly the P( Ek )

P( En=k ) =

k k

Nk kk

N! 1 = . ( N k)!k! N !/( N k )!k!

3.2

Adding bonus numbers

Many lotteries draw an additional bonus number, a bonus ball. There are two kinds of these numbers. Either the bonus balls are drawn from a separate urn from the main lottery or they are drawn from the same urn after the main k numbers were drawn.

3.2.1

Drawing from separate set of balls

Let B be the number of bonus numbers and l be the number of bonus numbers drawn out of B and m number of correctly predicted drawn bonus numbers. Let Dm be the event of predicting correctly m of l drawn bonus numbers. For calculating the probability we would use the same scheme as for the main lottery thus: P ( Dm ) = l m Bl lm B . l

For a lottery game of matrix N /k and B/l matrix for bonus numbers we can calculate the probability of matching n numbers of the main lottery and m bonus numbers this way: P( Am,n ) = P( En ) P( Dm ) Now with the above formula we can fully analyse the probabilities of winning in a European lottery called Euro Millions. This lottery uses main game matrix 5/50 and two additional bonus numbers are drawn from separate board containing 10 numbers. The results can be viewed in the following Table 3.1.

When we sum all the probabilities in the above Table 3.1, we obtain the probability of winning anything when buying one lottery ticket. The probability is 4.2%.

3.2.2

Drawing a bonus ball after the rst k numbers were drawn

The other case is when a bonus ball is drawn from the same urn after the main k numbers were drawn. Games based on this scheme are more common than the previous case. Lets establish formula for computing probability of predicting correctly n + b numbers, n being the number of correctly predicted numbers from the k numbers and b is either 0 that is not predicting the bonus number correctly or 1 predicting the number correctly. Let this be an event Ln+b . Of course the probability of matching n numbers + the bonus number will be lower than when matching only n numbers, but also we can not forget that when calculating probability of winning the category, where n numbers must be matched and where category n+1 also exists, we must omit the combinations including the bonus number for the category n, which is in fact n + 0. k Nk Number of combinations matching n numbers out of k is . Now we n kn consider the case where after the k winning numbers there is the bonus number drawn out of the remaining N k numbers. In case we want the number of combinations k Nk kn matching the bonus ball also we have to multiply by , which is the n kn Nk k Nk ratio of combinations that contains the bonus number and we get: n kn 10

P ( L n +1 ) =

k n

Nk kn kn Nk . N k k n Nk that are not matching the kn

In the second case the ratio of combinations bonus number is:

( N k) (k n) , Nk which can be also derived from:

Viking Lotto draws two bonus numbers after the main k numbers were drawn. This two balls play a role in determining the winner of the second highest category, which is matching 5 numbers of six drawn and matching one of the two bonus balls, see page 31 for the list of winning categories and the probability analysis of the Viking Lotto. Since there is only one ball left for matching one of the bonus balls we can use the formula k n Nk kn kn Nk . N k

P ( L n +1 ) =

We only have to double the probability of the chance of matching the bonus ball, since there are two of them in the rest of the balls. Thus the probability of winning the second category of the Viking Lotto can be calculated in following way: 6 5 42 2(6 5) 1 48 6 = 0.0000009779. 48 6

P ( L 5+1 ) =

11

Chapter 4

Testing the Randomness of Numbers Drawn

4.1 Randomness

We have already described the way the lottery game works in Chapter 2. The k numbers are selected from N numbers at random from a rotating drum that ejects them individually without any human inuence. Drawing this way should guarantee the k numbers to be produced without any bias. Taking as an example data for Latvian Latloto game, when we look at the Picture 4.1 the rst thing we notice is that there is a large range of frequencies. When we have a closer look at the Table 4.1 we may observe that the range is from 162 for number 4 to 216 for number 5. This might lead us to consider whether the selection was awed in some way. To check whether all the numbers forming winning combination come up with equal probability we will use the Pearsons standard goodness-of-t test [12]. Table 4.1: Observed frequency of occurrence of balls 1-35 in the ve-number winning combination of the rst n=1320 draws of Latvian 5/35 Lotto spanning January 4th, 1997, and December 29th, 2010

(1) 208 (2) 189 (3) 184 (4) 162 (5) 216 (6) 167 (7) 212

(8) 182 (9) 181 (10) 192 (11) 195 (12) 192 (13) 195 (14) 179

(15) 196 (16) 187 (17) 175 (18) 185 (19) 181 (20) 203 (21) 183

(22) 204 (23) 183 (24) 202 (25) 178 (26) 177 (27) 192 (28) 197

(29) 174 (30) 206 (31) 194 (32) 193 (33) 169 (34) 183 (35) 184

12

250

200

150

100

50

0 1 6 11 16 21 26 31 35

Figure 4.1: Observed frequency of occurrence of balls 1-35 in the ve-number winning combination of the rst n=1320 draws of Latvian 5/35 Lotto spanning January 4th, 1997, and December 29th, 2010. The blue line is at level 188.6, which is the mean.

4.2

Pearsons standard goodness-of-t test

For testing one number at a time, a classic approach is to determine the observed frequency Oi with which the numbers i = 1, . . . , N occurred among the k winning numbers in n lottery draws, and then to attempt to compare these observed counts with the expected counts Ei , which we can express as Ei = nk . N

Then we would be ready to use the traditional Pearson statistics 2 =

(Oi Ei )2 . Ei i =1

In most cases the resulting statistics would be compared with with 2 with N 1 degrees of freedom, denoted by 2 N 1 under the null hypothesis of equiprobability. But in our lottery case the statistics is not the usual 2 distribution with N 1 degrees of freedom, because the observations, or winning numbers are not drawn with replacement. After the number was once selected among the k winning numbers, it is not going back to the drum and thus can not be chosen again in the same draw; the variability of the standard statistics is thereby reduced. 13

When testing the null hypothesis of equiprobability of subsets of winning numbers of size c = 1, . . . , k, the statistics 2 behaves asymptotically as a sum of c independent weighted 2 random variables. There are two ways how to approach this either try to adapt Pearsons 2 statistics in such a way that its limiting distribution remains a simple 2 -distribution, which explained Joe [14], or to use the equation above and nd the weights in its asymptotic distribution according to Genest, Richard and Stephens [12], which we will use in this paper.

4.3

Asymptotic null distribution for subsets of size c = 1, ..., k

We already established the formula for calculating the expected counts for c = 1. In the same way we are able to test if all subsets of size c = 1, ..., k are drawn with equal probability in n lottery draws among the k numbers chosen from the set of N balls. Let Pc denote the collection of such subsets, we expand the statistics according to Genest, Richard and Stephens [12] and we may write 2 =

stands for the expected count for the same subset. The expected count for a subset of size c may be also written as ec = n k c N . c

The idea is that in every draw we have c subsets among k drawn numbers and we divide these by all possible combinations of size c taken from N numbers. We of course multiply this expected count n times since with more than one draw the expected count for each subset will be n times higher. The equation above taken from Genest, Richard N and Stephens [12] is somewhat more general and works with all combinations , k Nc N thus for every subset of size c we have combinations from . kc k

14

It is proved in Appendix A of Genest, Richard and Stephens [12] that the asymptotic distribution of 2 dened above is a linear combination of c independent 2 random variables, i.e.

l =1

w l 2 v,l

where wl = and vl = N l kl kc Ncl kc Nc kc

N l1

N N 2l + 1 . l Nl+1

When k = c, we have w1 = ... = wc = 1 and in this case 2 is asymptotically disN tributed as a 2 random variable with 1 degrees of freedom. This is in fact c a drawing with replacement, because after each k-winning combination is drawn the balls are going back to the drum and they are ready for the next draw which includes N all N numbers again, thus there are combinations to be drawn. c

4.4

Other approaches to testing uniformity

A very popular way for testing uniformity of frequencies for the N Lotto balls is demonstrated for example by Michael M.Woolfson in the book Everyday probability and statistics [13]. The test is done on the rst n = 1130 draws of the UK lottery, where N = 49 and k = 6. In fact it is the very same 2 goodness-of-t test as introduced in chapter 4. The test uses the classic formula for the Pearson Statistics 2 = where Ei = nk . N

(Oi Ei )2 , Ei i =1

But after the test statistics is obtained, it is compared with the 2 table giving probabilities for N 1 degrees of freedom. However as previously stated, this approach should not be used since the numbers are drawn without replacement and because of that the Pearsons statistics does not follow a simple 2 -distribution that can be found in tables.

15

Chapter 5

Computation of p-values5.1 P-value

One way how to decide in statistical signicance testing is on the account of p-value. P-value is the probability of obtaining a test statistics that is at least as extreme as the one that was observed, assuming that the null hypothesis is true. We often reject the null hypothesis if the p-value is less than 0.01 or 0.05, these two numbers are the most common values of signicance level of the test. The signicance level is represented by Greek letter . Signicance level of the test determines the probability of error of the rst kind, or type I error. This is the error of rejecting a null hypothesis when it is actually true. In our lottery case for c = k, we can compute the p-value as follows p-value = P(2N

( c ) 1

> x ),

where x = 2 , N 1 is in this case the c number of degrees of freedom. This is the probability of getting more extreme statistics than the one that was observed. We can use Matlab to obtain this number by simple command which is the test statistics obtained by Pearsons formula. P_VALUE=1-chi2cdf(x,degrees_of_freedom).

This is for the case where the statistics follows a simple 2 distribution, thus for drawing with replacement. But for c < k, where 2 behaves asymptotically as a sum 16

of c independent weighted 2 random variables we can not use this method. To obtain these p-values we will use method of Imhof [15] instead.

5.2

Method of Imhof

2 as dened by the Pearsons classic formula can be also written according to [12] as 2 = where N k Nc kc Yn Yn ,

Yn = (O E)/ nN with O a E being the vectors of Os and Es , where c s Pc , which is the collection of all subsets. Prime symbol indicates the transpose operation. As stated in [12] the null distribution of Yn is normal and has mean 0 and covariance matrix . With the number n of draws approaching +, standard results imply that 2 converges in distribution to is a random vector of length N k Nc kc(N c)

l =1

l Zl2 ,

where Zl ; l = 1, ..., ( N c ), are mutually independent standard normal variables and l are eigenvalues of . As pointed out in [12] l s take only c possible distinct non-zero values l for l = 1, .., c with multiplicity vl . Consequently the asymptotic distribution of 2 is of the form

l =1

w l 2 v,l

with the weights wl being wl = l /ec .

17

A formula how to calculate the probability P ( r 2 hr > x )

r =1 m

is given in the article by Imhof [15] and can be found as (2.1) in the article together with the proof. The formula is P where Fk (, x ) = n1 exp{ x /(2)} (k r )vr .l =k m

r =1

r 2 2v

>x

k =1

( v k 1) !

v k 1 Fk (, x ) vk 1

=k ,

As we can see hr = 2vr (r = 1, ..., m), n =

m 1

vr and p is such that

1 > 2 > ... > p > 0 > p+1 > ... > m . The formula is very convenient to use when all vk are small, but with large vk as in our lottery case it becomes very unstable due to the corresponding derivatives of Fk (, x ) and large factorials. Also it requires the degrees of freedom to be even. We will therefore use a numerically more convenient formula (3.2) from the next section of the article [15] instead: P( Q > x ) = where 1 m 1 2 2 1 r u)(1 + 2 [hr tan1 (r u) + r r u ) ] xu, 2 2 12 (1 + 2 ru ) 1 m1 4 hr

1 1 + 2

sin (u) du, u(u)

(5.1)

(u) = (u) = and

exp

1 m (r r u)2 2 1

2 (1 + 2 ru )

Q=

r =1

r 2 h ; .r 2 r

18

2 is the non-centrality parameter. In our case r

Q=

l =1

w l 2 v,l

2 = 0, which simplies the ( u ) and ( u ) functions the non-centrality parameter r in (5.1), thus we have

P ( w l 2 vl > x ) =l =1

1 1 + 2

sin (u) du, u(u)

where (u) = 1 c 1 [vl tan1 (wl u)] xu, 2 1 2

2 (1 + w2 lu ) 1 c1 4 vl

(u) =

The function u(u) increases monotonically towards +, therefore the integration in formula (5.1) can be carried only over a small nite range 0 u U . In our case choosing U = 1 was sufcient enough.

19

Chapter 6

Discrepancies in the 2 and the Lottery Article

During the work with the article [12] we found some discrepancies regarding the tables 1 and 2 in the article [12] on pages 252 and 253. The test of randomness is demonstrated there on the data for the Canadas Lotto 6/49. Nevertheless, the test statistics for c = 5 in Table 1 and for c = 6 in Table 2 do not correspond to the p-values stated in the tables. The p-values are calculated by the method of Imhof and the formula used is 5.1. However if we put x = 1906878, which is the original value of the test statistics for c = 5 from the Table 1 in the article into the formula we obtain p-value equal to one half. The function f (u) = where (u) = 1 c 1 [vl tan1 (wl u)] xu, 2 2 12 (1 + w2 lu ) 1 c1 4 vl

sin (u) , u(u)

(u) =

can be seen in Figure 6.1. Similarly the same function for c = 6 from the Table 2, where x = 13983809 is portrayed in Figure 6.2 .

20

0.05

0.1

0.15 f(u) 0.2 0.25 0.3 0.35

4 u

7 x 10

83

Figure 6.1: A graph of f (u) =

sin (u) u(u)

for c = 5, k = 6, x = 1906878

0.05

0.1

0.15 f(u) 0.2 0.25 0.3 0.35

0.5

1.5

2.5 u

3.5

4.5 x 10

53

Figure 6.2: A graph of f (u) =

sin (u) u(u)

for c = 6, k = 7, x = 13983809

21

Note that both integrals of these functions are really small numbers. Moreover according to the formula P( Q > x ) = 1 1 + 2 0

sin (u) du, u(u)

they get divided by , which leads to the result 0.5 in both cases. Having the data for the rst n = 1798 draws of the Canadas Lotto 6/49 available, we were able to carry out the same tests as in [12]. The Tables 6.1b and 6.2b show the results. The Tables 6.1a and 6.2a show the original values given by the authors of the article [12]. Table 6.1: Test of equidistribution for subsets of c = 1, ..., 6 balls for Canadas Lotto 6/49 using the rst 1798 draws spanning June 12th, 1982, and April 14th, 2001(a) Original table (b) Corrected table

c 1 2 3 4 5 6

ec 220.1633 22.9337 1.9518 0.1273 0.0056 0.0001

2 54.34 1190.95 18416.4 211899.2 1906878 13982018

p-value 0.104 0.300 0.476 0.479 0.534 0.633

c 1 2 3 4 5 6

ec 220.1633 22.9337 1.9518 0.1273 0.0057 0.0001

2 54.34 1190.95 18416.4 211899.2 1906702 13982018

p-value 0.104 0.299 0.476 0.479 0.534 0.633

Table 6.2: Test of equidistribution for subsets of c = 1, ..., 7 balls for Canadas Lotto 6/49 using the same data of 1798 draws spanning June 12th, 1982, and April 14th, 2001(a) Original table (b) Corrected table

c 1 2 3 4 5 6 7

ec 256.85714 32.10714 3.41565 0.29701 0.01980 0.00090 0.00002

2 57.64 1218.06 18487.51 212471.8 1906599 13983809 85898786

p-value 0.044 0.164 0.357 0.238 0.544 0.853 0.555

c 1 2 3 4 5 6 7

ec 256.85714 32.10714 3.41565 0.29701 0.01980 0.00090 0.00002

2 57.64 1218.06 18487.51 212471.8 1906599 13977896 85898786

p-value 0.044 0.164 0.357 0.238 0.544 0.853 0.555

Surprisingly, the p-values were actually correct, but the test statistics given in the article were wrong. As shown previously the p-values computed based on the test statistics given in the article were not the same as in the tables. On the other hand for the real test statistics obtained from the data for the rst n = 1798 draws the pvalues were matching those in the article. The following Figures 6.3 and 6.4 show the 22

integrand for the correct test statistics.

90 80 70 60 50 f(u) 40 30 20 10 0 0

0.001

0.002

0.003

0.004

0.005 u

0.006

0.007

0.008

0.009

0.01

Figure 6.3: A graph of f (u) =

sin (u) u(u)

for c = 5, k = 6, x = 1906702

3000

2500

2000

f(u)

1500

1000

500

0.2

0.4

0.6

0.8

1 u

1.2

1.4

1.6

1.8 x 10

23

Figure 6.4: A graph of f (u) =

sin (u) u(u)

for c = 6, k = 7, x = 13977896

23

We can also observe two rounding errors in the Table 6.1a. First is for the expected frequency e5 . According to ec = n Nc kc N , k

where c is number of balls in the subsets, n is number of draws, N is number of balls in the lottery and k is number of drawn balls out of N . For c = 5, k = 6, N = 49,n = 1798 we get: e5 = 1798 thus e5 = 0.0057. Another different value is the p-value for subset of two balls in Table 6.1a. The result using the method of Imhof was p-value=0.299390046140098, 49 5 65 49 6

= 79112/13983816 = 0.005657397093898,

which after correct rounding is 0.299.

24

Chapter 7

ProbabilitiesTables in this section show the approximate odds as well as the probabilities of winning for categories of particular lottery games within the European Union. The analysis is absent only for Netherlands, where the eldest running lottery called Staadsloterij1 exists, but unfortunately there were no data available in the electronic form. Another part of the report shows the results of testing for equidistribution for the available data. One way how to interpret small p-values is that such an event is rare to appear. Actually there are some p-values that are less than 5% signicance level, see for example Belgium Lotto for c = 1 in Table 7.4a on on page 29 or Greek Lotto for c = 3 looking at the whole history in Table 7.18 on page 38, which is signicant at the 10% level. These would lead to rejection of the null hypothesis of equidistribution. But at the 5% signicance level we can reject very few of the tested hypothesis of equidistribution. We can observe very low p-values for the Italian Gioco Lotto in Table 7.28 on page 44, which can be possibly explained by the long history during which balls and machines were changed several times. But taking modern lotteries for example Czech Sportka, see Table 7.10 on page 33 or German Lotto 6aus45 in Table 7.15 on page 36 the p-values do not provide us any serious ground for suspecting a lack of uniformity. Worth a notice is the comparison of p-values for the draws including the bonus numbers with those without the bonus number. There is a tendency for lower p-values to occur when taking a bonus number as part of the draw. This test however does not take into account the order of the numbers which is important when determining the prizes. Another remark which is also mentioned in the article [12] is that for large subsets, for example taking the classic matrix 6/49 and c = 6, p-value may be for example of value 0.821 as in Table 7.12a on page 34, but in fact such statistics is the lowest possible for n = 4858 draws. Having 4858 cells with one count and the rest of (49 6 ) cells having zero count, the real p-values in such cases are 1. Although this is not the case for all the lotteries. During the history of German Lotto there appeared two draws that had the same outcome, that means one cell with 2 counts, n 1 cells with one count and the rest were zeros. This fact can be observed in the Tables 7.15 and 7.14a on pages 36 and 35, p-values for c = 6 are actually lower. Similar thing can be also observed in Latvias Lotto where there were1 www.staatsloterij.nl

25

three different outcomes that reappeared during the 1320 draws. It may be seen then as shown in the article [12] that the asymptotic distribution of 2 slowly deteriorates as c increases, but there is no reason to doubt its reliability for c = 1, 2, 3, 4 regarding the classic matrices.

Approximate odds 1:19068840 1:2118760 1:8668 1:202 1:14 1:11

Probability of not winning anything is 83.29%. Probability of not matching any ball is 51.26% (56.95% not including the bonus number). Table 7.12: Test of equidistribution for subsets of c balls for French Loto 6/49 using n = 4858 draws spanning May 19th, 1976, and October 4th, 2008(a) c = 1, ..., 6 (b) c = 1, ..., 7

Probability of not winning anything is 98.14%. Probability of not matching any ball is 37.51% (43.60% not including the bonus number). Table 7.14: Test of equidistribution for subsets of c balls for German Lotto 6/49 using n = 4885 draws spanning June 17th, 1956, and December 29th, 2010(a) c = 1, ..., 6 (b) c = 1, ..., 7

Approximate odds 1:43949268 1:511038 1:11748 1:401 1:18

Gioco del Lotto is different from the classic lotto games and offers different games according to the number of numbers bet. Table 7.28: Test of equidistribution for subsets of c = 1, ..., 5 balls for Italian Lotto 5/90 using n = 48422 draws spanning January 7th, 1939, and December 30th, 2010 2 122.55 4225.90 117983.03 2556683.72 43975271.64

Numbers Match 5 of 5 + Papildskaitlis Match 5 of 5

Approximate odds 1:3246320 1:360702

Probability 0.0000003080 0.0000027724

Probability of not winning anything is 99.9997 %. Probability of not matching any ball is 36.58% (43.90% not including the bonus number). Table 7.30: Test of equidistribution for subsets of c = 1, ..., 5 balls for Latvian Latloto 5/35 using n = 1320 draws spanning January 4th, 1997, and December 29th, 2010 2 29.16 555.41 6383.43 52043.20 324787.60

c 1 2 3 4 5

ec 188.571429 22.184874 2.016807 0.126050 0.004066

p-value 0.514 0.762 0.872 0.818 0.423

During the history of Latvian Latlotto these 3 draws appeared twice: (1,2,3,21,26), on August 17th, 2005 and May 20th 2009. (1,21,23,30,34), on August 31st, 2005 and October 27th, 2010. (10,22,23,31,35), on October 18th, 2000 and January 1st, 2002.

Approximate odds 1:511038 1:2937 1:67 1:11748 1:401 1:18

Maltas lottery offers a bit different scheme of game, where players determine how much numbers they would like to bet: one number in Prima game, two in Ambo game, three in Terno game or four in Quaterno. But for each game numbers must match those ve drawn.

ec 32.600000 2.661224 0.166327 0.007078 0.000154

p-value 0.570 1.000 0.876 0.864 0.563

54

Appendix A

Content of the CDCD contains le with the thesis in folder THESIS, obtained data from the lottery companies in folder DATA. In folder TEST can be found a test of randomness congurated for the Austrian lottery using the Method of Imhof for computing p-values. Directory structure of the CD: