Handicapping in croquet has a simple aim, to give each player
the same chance of winning a handicap game, but whether this is achieved in
practice is difficult to know. However, when there are groups of games which
share some features, then one can apply statistics to test whether or not they
are shared as expected. The results of matches between clubs allow
one to test how well the handicap system is working, and to quantify the
size of 'home advantage'. If handicapping were perfect one would expect the
match
scores, i.e. the number of wins and losses in each match, to follow a random
binomial distribution, analogous to repeated tossing of a coin, but this
is not the case, and a better analogy is with tossing a weighted coin, modelled
by a skewed binomial distribution with unequal probabilities of heads
and tails.

The binomial expansion in croquet

match scores

binomial probabilities (percent)

a:b = 1:1

a:b = 4:3

a:b = 3:2

a:b = 2:1

7-0

0.781

1.989

2.799

5.853

6-1

5.469

10.445

13.064

20.485

5-2

16.406

23.500

26.127

30.727

4-3

27.344

29.376

29.030

25.606

3-4

27.344

22.032

19.354

12.803

2-5

16.406

9.914

7.741

3.841

1-6

5.469

2.479

1.720

0.640

0-7

0.781

0.266

0.164

0.046

5-0

3.125

6.093

7.776

13.169

4-1

15.625

22.848

25.920

32.922

3-2

31.250

34.271

34.560

32.922

2-3

31.250

25.704

23.040

16.461

1-4

15.625

9.639

7.680

4.115

0-5

3.125

1.446

1.024

0.412

Table 1. Percentage probabilities for different match results
for binomial
distributions with different probability ratios a:b.

An ideal handicap system would make the winning ratio for
individual games equal to 50%, and the distribution of match results
should then follow the values of the successive terms in a binomial expansion,
(a + b)n, where a and b are the probabilities of two mutually, exclusive
events such that (a + b) = 1 (i.e. the win or the loss of a game with a =
b = 50% for correctly bisqued players), and where n is the number of independent
trials (i.e. the number of games in the match). Croquet matches often comprise
seven games when the eight possible match results 7-0, 6-1, etc., should
occur
in the ratios 1:7:21:35:35:21:7:1, i.e. as the percentage probabilities shown
in the upper part of Table 1, column 2, while the probabilities for 5-game
matches are given in the lower part of the Table; the figures given in columns
3-5 are for various skewed binomial distributions met later, where the probability
a <> b. This predicted distribution of match scores is independent of
the scores in individual games. Even very close games, though important to
the players, are immaterial to the match
statistics because over a large enough number of games a team will on average
have as many narrow wins as narrow losses.

There are several reasons why the statistics of matches may
differ from this 1:1 binomial distribution of coin tossing, e.g.:

teams are of three or four players each playing in two
games, all of them chosen from the same group of players and often
likely to be playing better than their assigned handicaps;

form is not an absolute, and individual players often
play better or worse than expected;

the conditions of the courts can vary greatly between
clubs, and 'home advantage' may be considerable, as we shall indeed
find;

matches always include doubles games, when the winning
probability is a composite dependent on four rather than just two
players;

and probably other reasons too.

If one tries to devise models beyond the simple 1:1 binomial
distribution, there are many possibilities. The assumption of independence
does not apply in general, and there is no reason to expect that the departures
from the condition a = b = 50% should be the same for different team members,
or for different teams at different times. The game probabilities still sum
to unity, of course, and the individual games can be regarded as statistically
independent, but the winning probabilities in them should be written as
a1 and
b1, etc. (which can include 'home advantage'), when the distribution
of 7-game match results depends on the expansion of (a1 + b1)
… (a7 + b7). This is not calculable without knowing
the individual values of ai and bi but a practicable
way to make some estimate of the size of systematic departures from the 1:1
binomial
is to assume that all the ai and bi are equal to some
average values, α and β say, and to calculate the coefficients of
a skewed binomial expansion with α <> β. Some of these issues
are explored in what follows.

Statistical testing of models

Match scores,
home team first

SW Fedn
League

SW Inter.
League

SW 'B'
League

Longman
Cup*

7-0

1

2

2

5

6-1

9

4

9

22

5-2

12

14

15

55

4-3

19

14

8

70

3-4

8

14

8

-

2-5

11

4

10

-

1-6

2

0

4

-

0-7

2

0

7

-

total no. matches

64

52

63

152

5-0

7

9

13

58

4-1

11

13

33

133

3-2

20

16

23

214

2-3

12

12

15

-

1-4

6

8

12

-

0-5

4

1

7

-

total no. matches

60

59

103

405

Table 2. Results of croquet matches, home team scores
first. * Home
team
not known for Longman results, scores
aggregated i.e. 7-0 & 0-7, etc., see text.

A large amount of data is needed to apply meaningful statistical
tests, but fortunately the results of matches since 2000 are given on the
SW Federation web site for 3 leagues; the Federation league (handicaps <16
), the Intermediate league (9-18), and the 'B' (originally for 'Beginners')
league (>16). The results for 401 completed 5-game and 7-game matches (all
containing one doubles match) are listed in Table 2. In addition, the results
of 557 Longman Cup matches from 1989 onwards are listed on the CA web site,
in 5-game format (with three doubles) initially, and then in 7-game format
(one doubles) since 2000. The Longman matches are knockout matches (handicaps
3˝ - 20, team total greater or equal to 24) but as the home team is not recorded,
the distribution of scores is only known in aggregate, e.g. the entry in
Table 2 for the score
7-0 includes both results 7-0 and 0-7, etc.; these are predicted to be equal
for the simple 1:1 binomial distribution, but can be seen from Table 1 to
differ greatly for skewed binomial distributions.

In making statistical tests one compares the observed distribution
of data with some likely theoretical model. The procedure is quantified by
a well established statistical test, the X2 (chi-square)
test, which relates the magnitude of a calculated quantity, X2,
to the probability, F, that it could be expected to occur by random statistical
fluctuations of the data. A very high value of the parameter X2 and
a correspondingly very low probability that the observed distribution could
have differed by chance from the assumed model is a powerful argument for
rejecting the model. It is important to remember, however, that the
opposite is not true; a value of X2 corresponding
to a high probability does not validate the initial model. It provides reassurance
that there is nothing inconsistent with it, but there may be other models
which agree with the observations as well, or better. This is the limitation
of all statistical tests, but nevertheless, they are very powerful for revealing inappropriate models.
As an example, Table 3 shows the results for the 7-game Longman matches from
Table 2 analysed using the X2 test. The
total number of matches having each score (row 2) is compared with the model
of the ideal 1:1 binomial distribution (row 3) calculated from the probabilities
given in Table 1, and normalised to the total number of matches, 152. The
numbers are clearly generally similar, decreasing for the extreme match scores,
but
the X2 test allows one to quantify how
significant the similarity is. The values of a deviation parameter, x2 say,
in row 4 are formed by taking the difference between the observed and the
theoretical numbers in each column, squaring it, and then dividing by the
theoretical number;
one expects this value of x2 to be about unity as a result of random
statistical fluctuations for each group we are comparing, and its actual
value quantifies the deviations for that group. On adding together all the
values
of x2, we obtain the sum X2=
7.2, the bottom right hand entry in the Table, a value which characterises
the overall agreement between the observed and the model distributions. The
significance of this final number is that, dependent on the number of groups
which we are comparing, one can find the probability that a particular value
of X2 could
be exceeded by random statistical fluctuations. In our case there
are four groups, the four different match scores, but as the total number
of matches is fixed (152), the numbers could have varied independently in
only three of them, i.e. there are three degrees of freedom. Statistical
tables show that a value of X2 = 7 for
three degrees of freedom has a probability, F, of only about 7%; the odds
are about
14 to 1 against the match data being consistent with the 1:1 binomial
distribution of the model, i.e. that they conform to the assumption of equal
probabilities of winning a game. This result hardly endorses the model, but
neither is it so unlikely as to raise serious doubts about it.

Match Score
(irrespective of winner)

Number of
matches

4-3

5-2

6-1

7-0

observed total number

70

55

22

5

152

model, 1:1 binomial

83.1

49.9

16.6

2.4

(152)

deviation parameter, x2

2.1

0.5

1.8

2.8

X2 = 7.2

Table 3. X2 analysis
of 7-game Longman Cup matches, 2000 - 2003.

Similar analyses have been made of the other sets of match
data given in Table 2, and the results are shown in Table 4. None of the
sets of data conforms well with the 1:1 binomial distribution; all are odds
against, varying from 14:1, to millions to one against. In statistical
tests it is always desirable to have large number in each group in order
to reduce
the effect of fluctuations in the data, and for the South West Federation
data this analysis combines the groups predicted to contain few members (e.g.
the
scores 6-1 and 7-0 for the 7-game matches), with a resultant
reduction in the number of degrees of freedom as shown in the table. One
can go further and combine the whole of the SWF data, but while this leads
to different
values of X2, it does not alter
the conclusion that most of the data are strongly inconsistent with a 1:1
binomial distribution, and that the overall match statistics require us to
reject it
as an appropriate model. The inconsistency appears to arise from an excess
of extreme match scores, but examining these in detail would mean reviewing
the results of individual clubs in separate competitions in different years,
and while the data exist, statistics cannot help, even if the exercise
had any point. We are forced to conclude that the chances of winning individual
games are for some reason not equal, that the results do not follow a 1:1
binomial
distribution, and that we should consider skewed binomials with the probabilities α <> β <> 50%.

The SW Federation data allow one to find whether there is
any home advantage in handicap matches. The data can be arranged in four
groups: (5-game/7-game) matches and home (wins/losses), as in Table 5.
If there
were no systematic departures from equal chances of winning individual games,
the wins and losses for the middle two rows would be the same, and just half
the totals in the right hand column. This is evidently not the case, and
the departures from equality give X2 =
29 for two degrees of freedom, with a likelihood of occuring by chance, F,
of
less than one
in a million! The raw data from Table 5 give an observed home match winning
ratio of 254/401 = 63%, and if one assumes that there is a home advantage
shared equally by all the members of a team, then it is possible to recalculate
the value of X2 by comparison with a skewed
binomial. It can be seen from Table 1 that the winning advantage depends
on the amount of skewedness, and also that it differs slightly for 7-game
and 5-game matches, but a skewedness ratio of 4:3 gives winning probabilities
for 7-game and 5-game matches of 65.31% and 63.21% respectively, close to
the average value observed, and scaling these to the totals for each type
of match in column four gives the numbers shown in italics in Table 5, which
lead to values of X2 = 1.95, and of F
= 38%. This huge reduction in the discrepancy of fit gives strong support
to the idea that match statistics follow a skewed distribution, and that
a home match winning probability of about 64%, i.e. a ratio of wins to losses
of about 1.8, must be allowed for in the statistical analysis (cf. below).
The size of the home advantage is perhaps unexpectedly large, but is not
inherently unlikely. It conforms almost optimally with an individual game
winning advantage of 4:3, or about 57.1%.

Winning probability and bisques

To translate a difference of winning probability into bisques
is not straightforward, but Louis Nel has discussed such matters recently
on the Oxford web site, and it appears that a winning probability of
2:1, i.e. 66.7%, corresponds to a difference of 150 grade points, irrespective
of handicap level, while, for instance, the difference between handicaps
5 and 6 is equivalent to only 27 grade points (a winning probability
of 53%),
with steps between higher handicaps having successively smaller differences
of grade points and winning probability. Using these figures, the 'home
advantage' suggested by the match results analysed above is equivalent to about
two
bisques for each player.

Whether this value of 1.8 for 'home advantage' is typical
of other times and places is not known. It could in principle be extracted
from the full records of the Longman Cup matches, though not without a great
deal of effort to find the home teams, details not given on the CA web site,
but since the SW Federation contains at least as wide a range of sizes and
types of club as do other parts of the country, there seems no reason to doubt
that it is a fair average value over a typical spectrum of players and clubs.
If one accepts that it is a realistic value for the match winning advantage,
and makes the assumption that each player in a team is affected by the same
amount, the winning chance in individual games must be very close to 4:3, or
57.1%, and rather than comparing the match results with a 1:1 binomial model
as in Table 4, one should instead use a 4:3 binomial model, which then gives
the results in Table 6 below.

On comparing Tables 4 and 6, it is clear that the 4:3 skewed
binomial distribution is a much better model for the match data. Most of
the sets of data conform well to it, which suggests that the asymmetric distribution
of match scores shown in Table 4 can reasonably be explained entirely on
the
basis of a 'home advantage'. There are still large discrepancies, however,
for the 'B' league data and also, surprisingly, for the Longman 5-game data.
The 'B' league data fit considerably better
to a model where, after taking account of home advantage in the ratio of
wins to losses, the actual match scores are then assumed to be random,
i.e. to fall equally amongst the different scores: the values of X2and
F for this model are 13 and 7%, and 11 and 6% for the 7-game and 5-game matches
respectively, values which are in themselves unexceptionable, but only on the
basis of an ad hoc model which takes no account of the statistical
realities for independent matches. Probably all one can conclude is that the
'B' league data cannot be regarded as a reliable test of the handicap system,
which will probably not surprise those who have witnessed what can happen in
games between inexperienced and often erratic high-bisquers.

The very high value of X2 =
28 for the Longman 5-game data is a more difficult problem since the data
set is large and, unlike the 'B' league data, believed to be reliable, and
also
since the Longman 7-game data conform well with a credible model. The handicap
range involved and the nature of the competition guarantee that, unlike the
'B' league, these are nearly always experienced competitors, but nevertheless,
there are twice as many extreme match scores (5-0 and 0-5) as one should
expect. One should however, still expect the data to conform to a binomial
distribution,
and a 2:1 binomial proves to be very close to an optimal fit, with X2=
3.1 (two degrees of freedom) and F = 22%. The Longman Cup matches changed
from 5-games to 7-games in 2000 with the result that the part played by doubles
was reduced from 60% of the match to only 14%, and since doubles matches
depend
much more than singles on tactics and psychology, one may wonder whether the
greater doubles component in the 5-game matches had something to do with
the striking difference between the values of X2 for
the two formats, but whether it could credibly increase the average winning
probability for a game from 57%, as deduced for 7-game matches, to the 67%
implied by the 2:1 binomial model is problematic. Unless it proves possible
to identify the home teams for the 5-game Longman matches, it is pointless
to speculate further about the origin of their anomalous value of X2,
but whatever its origin, its size is unexpectedly large, and if translated
into bisques using Nel's figures as before, amounts to about six bisques
per player. One must conclude that there was something seriously wrong with
the handicap system for the Longman Cup matches prior to 2000, and also be
relieved that the present 7-game format appears to be working fairly.

Summary of conclusions

Bearing in mind the proviso made above that statistical tests
cannot provide certain confirmation of an assumed model, but only show that
it may disagree with observation by amounts so large that it is very unlikely
(or even inconceivable) that the disagreement could arise from random statistical
fluctuations, the analysis of all the handicap match data conveniently available
leads to the following conclusions:­

Match data do not conform to winning chances of 50%, as expected for perfect
handicapping.

The SW Federation data suggest that 'home advantage' amounts to a match
winning ratio of 1.8 or 64%, i.e. a game winning ratio of 57%, equivalent
to about two bisques per player.

The 'B' league data for handicaps greater than 16 appear
to be statistically meaningless.

The reliable match data nearly all fit to a skewed 4:3 binomial model
with a game winning ratio of 57%, which suggests that the handicap system
is working properly after allowance is made for home advantage, as in (ii)
above.

The Longman 5-game data have too many extreme match scores to be in credible
conformity with fair handicapping as in (iv) above, though they can be fitted
by a model with a game winning ratio of 67%. Whether this can be related
to the high doubles content of the matches at that time cannot be proved,
but there can be no doubt that the results show a large, systematic breakdown
of the handicap system for those games prior to 2000.

Document received 29th September 2004

Postscript

The results of 35 Longman Cup matches for 2004 are now available on the CA
website, though still without identifying the home teams, unfortunately.
The value of F for the 1:1 or 'even chances' distribution of match scores
is 1.3%, which increases to 8.7% if one compares the data with the 4:3 'home
advantage' skewed distribution; not a very convincing agreement with statistical
expectations. However, one team had several extreme match scores, wins of
7-0, and 6-1 (twice), and if one excludes all their matches from the analysis,
the remaining 30 matches yield a value of F = 43%.