Catching Lottery Cheats with Data Science

Recently I came across the paper “Statistics and the Ontario Lottery Retailer Scandal” written by Dr Jeffrey S. Rosenthal on a statistical study he carried out that uncovered a lottery fraud perpetrated by the clerks of retailer shops selling the lottery tickets at the expense of their clients.

A number of sellers claimed winning tickets of large cash prizes bought by their customers as their own. Rosenthal calculated the probability that winning tickets of big prizes were won by sellers and he then compared his results with the actual number of winning tickets claimed by them; he found out that that number was too high to be the result of chance only. The aftermath was a big scandal reported on the front pages of the main Canadian newspapers, “the lottery company was on the defensive, politicians debated, CEOs were fired, criminal charges were laid, people were sent to jail, and over twenty million dollars were paid.”

Rosenthal’s report made me think about a conversation I had one day in the office canteen with one of my colleagues who claimed he didn’t know anybody who won any prize with the Irish prize bonds… he was immediately disproved by another colleague who claimed that she won a (small) prize once. However, I thought that the statistics of non-winners was still quite impressive and this captured my curiosity; moreover, I am also one of them. I am the holder of 16 units (=100€) of the Prize Bonds fund that I bought toward the end of 2012 with high expectations of winning the 1 million prize pretty soon; of course, I’ve never won anything, so far. Moreover, my dreams of glory were significantly reduced when the Prize Bonds management decided to change the frequency of the 1 million prize from monthly to two-monthly in 2013, reducing drastically my already-tiny chances to win it; now it is reduced even further to quarterly. I secretly think that this was already a fraud perpetrated directly against me, but I have no evidence to prove it so I have to support the idea that I was a victim of bad luck or bad timing.

Never mind the 1 million prize, more than 1.3 million prizes were assigned in the period 2013 – 2015 plus probably another ~ 400K prizes in 2016 (this is an estimate, the official report of 2016 hasn’t been published yet at the time of writing). Why were none of these prizes assigned to me? Is it possible that this is just the results of chance only? According to the Prize Bond regulation, the draws are “conducted using a computer based, software-driven, random number generation system (the “system”)”. How come the “system” has never selected my units?

So I decided to carry out some basic statistical calculations to check it out. First of all, it is important to know how the prize draws work. The Prize Bond fund is made up of a number of units of €6.25; each unit is like a lottery ticket with the only difference that it gives you access to the lottery draws of every week until you withdraw the money and close your account; it is possible to win only one prize in each draw. So I needed to know how many units there are in the Prize Bond fund to calculate the probability that my units are selected and how many prizes were assigned each week. According to the recent Prize Bonds Annual Reports, the fund values at the end of the years from 2013 to 2015 (last report available while I’m writing this blog) and the number of prizes assigned each year are:

It is possible to estimate the number of units in the fund by dividing its values by the value of each unit:

Well, the above numbers are not promising; the fund contains hundreds of millions of units and I own just 16 of them. However, the number of prizes assigned each year is very high and this might significantly improve my chances of winning something.

In order to estimate the probability of not having won any prize so far in a simple manner, I based my analysis on two hypotheses: 1) the number of units in the fund and 2) the number of prizes assigned each week remain constant throughout the year. Given the huge numbers of units in the fund, the impact of these two hypotheses on the results are negligible.

The weekly number of prizes can be estimated by dividing the total number of prizes assigned each year by the number of weeks; results are reported below.

The “non-winning” probability corresponds to the probability of selecting any other unit rather than my 16 units in each weekly draw and for each prize assigned in the draw; this can be calculated as the total number of units available in the fund minus 16 divided by itself. The total number of units is constant week by week, but it decreases by one unit for each prize assigned because each unit can win only one prize in a draw. So the non-winning probabilities of each draw are equal to:

Each draw can be modelled as an independent event because each unit can be selected again at every draw. The joint probability of independent events can be calculated as the product of their probabilities; there are 52 draws every year so the annual probability is equal to the 52nd power of the non-winning probability of each draw; here are the results.

Despite the fact the probability of winning something is not marginal, the results obtained show that there is strong evidence to support the assumption that I am just a victim of bad luck; the fact that I haven’t won anything so far can easily be the results of chance only. I guess I have just two options now: 1) wait and hope for better luck or 2) buy a few more units to increase my chances. Fingers crossed!The total non-winning probability of the period 2013 – 2015 can be calculated by multiplying the probabilities of the three years and it is equal to around 94%; this means that I had 6% chances of winning something during those three years. I do not have any official numbers related to 2016, but if I use the numbers of units and prizes of 2015 as predictors for 2016, the total non-winning probability for the period 2013-2016 can be estimated as about 92.5%.