Football Statistics Project

Football Statistics Project

Essay Preview

I have chosen to base my project on football statistics because they
are both readily available and interesting enough for deep analysis.
As a starting point I decided to look at the generally accepted theory
of 'Home Advantage'.

Home advantage, or the tendency for the home team to do better than
they would away, could have several causes. It could be partly
psychological - the home team would almost always have the majority of
the crowd behind them, cheering them on. It could also be to do with
the condition of the pitch - Premiership teams sometimes find it hard
to play on muddy, waterlogged pitches of some lower-division teams.

Another factor is the attitudes of referees and officials. Because
they are intimidated by the home crowd they often give decisions in
favour of the home team, meaning teams may also have a worse
disciplinary record when playing away.

Hypotheses:

1. Teams have a worse disciplinary record away than at home

2. Better attended teams have a greater home advantage

3. More successful teams have a better disciplinary record

[IMAGE]Collecting Data

I found that football statistics were easy to find on the internet. I
obtained mine from two main sites:

http://soccer-stats.football365.com

http://www.bettingzone.co.uk

There is a very small risk that some of the data I collected could be
incorrect. However, I have found alternate sites for the Premiership
statistics (such as www.4thegame.com) which gave the same results. I
also think that a betting site must give accurate statistics because
they are such an important part of gambling

Using Software

I chose to input my data into Microsoft Excel because it makes it much
quicker and easier to manipulate the data.

Hypothesis 1 - Teams have a worse disciplinary record away than at
home
------------------------------------------------------------------

Need Writing Help?

- Statistics Project to Compare House Prices Comparison of House Prices' in two areas: Hypothesis: I believe that the house prices in the Consett area will be more expensive that the house prices in the Washington area. Introduction: I obtained a list of seventy houses for each area. I decided to do seventy, as I felt I would obtain a more accurate reading for my report. I attained my information from the internet, from various estate agents sites, www.harrington-brown.co.uk and www.Pattinson.co.uk .... [tags: Papers]

- Budget and Schedule Expanding SG’s free printing to the Hub is a fairly inexpensive project, and it is easily manageable by Student Government’s annual budget. Currently, SG’s free printing at the Reitz Union is funded from the Computer Lab Printing fund, and the yearly budget comes from the Activity and Service Fee of the student tuition ($14.55/credit hour). The approved printing fund is calculated to be $125,740 for the 2013-2014 fiscal year, and it is approximately 0.66 percent of SG’s $18 million budget [1].... [tags: project proposal]

- Investigating the Relationship Between the Amount of Money a Football Club Receives and its Success In this investigation, I will look at a set of statistics for English football clubs for the 1998 - 1999 season. Using these, I will look at how the amount of money a football club receives affects its success. Measuring 'success' ------------------- It is difficult to measure success, as there is no numerical way to quantify it. In my investigation, I will look at success in terms of achievements on the pitch as opposed to the success of the club as a business.... [tags: Papers]

- American football derives from rugby, a British game. Ivy League colleges were the first to play American Football games. Harvard was the first college to finally develop standardized rules for the game. Football, America’s great pastime and most popular sport. Why do we love it so much. People enjoy the shared experience, the socialization it offers. Others appreciate the technical aspects of the game, the strategy involved, what it takes to win. Some are in it just for the money, playing fantasy football leagues to owning the teams themselves.... [tags: American football, Football, Football helmet]

- Introduction to variables in statistics tutor: The variable which is available in the statistics it is called as statistical variable. It is a feature that may acquire choice in adding of one group of data to which a mathematical enumerates can be allocated. Some of the variables are altitude, period, quantity of profit, region or nation of birth, grades acquired at school and category of housing, etc,. Our statistics tutor defines the different types of statistics variables and the example of these types.... [tags: Statistics]

- Fantasy Football Â– Maths Coursework Â– Statistics My coursework is based on the game Â‘Fantasy FootballÂ’ which is ran by the British newspaper called Â‘The SunÂ’. Fantasy Football is a competition based on building your own Â‘dream teamÂ’ and collecting points to try and have the most points at the end of the season with your team, to win the cash prize. All the players from the English Premiership are used and a scoring system is used to see how well the players are doing and who has picked the best eleven players for their team.... [tags: Papers]

- Statistics Project I have been given instructions to collect data for my GCSE statistics coursework and then to represent them by interpreting them using graphs and attributes, which I think influence the prices of a second hand car. Below is my coursework flowchart that will show the steps I will take to complete my coursework. FLOWCHART ========= 1.Formulate my hypothesis [IMAGE][IMAGE][IMAGE][IMAGE][IMAGE][IMAGE] 8. Ideas for further investigation 7.... [tags: Papers]

- Statistics Project I aim to compare mass-appeal tabloid newspapers and quality newspapers by attempting to find statistical differences. To represent the mass-appeal papers, I chose the Daily Mirror and for the text-quality based newspapers, I chose the Times. Hopefully, there will be some significant statistical differences in the style of journalism which I will be able to comment on. Pre-Test Data Collection: I decided to choose similar pages from both the Times and the Mirror with roughly equal numbers of paragraphs and adverts, pages 4-5, or 4-6, as in the Mirror there were not enough sentences to take samples from.... [tags: Papers]

- Math Statistics Project My main factor I am investigating is going to be weight. For the majority I aim to investigate the effect of weight on height. I am also going to look at the frequency of different weight groups among people. Â· The height will be measured in cm. I will keep it continuous by not asking the people to place their heights into groups, but instead enter their heights. This will be Quantitive data. Â· The weight will be measured in cm. I will keep it continuous by not asking the people to place their weights into groups, but instead enter their weights.... [tags: Papers]

- My task during this statistics coursework is to gather relevant information regarding the memories of people with regard to many different factors. I shall need to create numerous hypotheses that are sensible and are practical to carry out, get information relevant to the hypotheses and present the data in different methods which are relevant to the experiment. Hypothesis My primary hypothesis is that pictures are easier to memorise than words, and words are easier to memorise rather than numbers.... [tags: Papers]

Free Essays795 words (2.3 pages)

Related Searches

cards for each team at home and away. However, in order to give an
overall impression of how good or bad the team's discipline was I
needed to turn these two pieces of data into one measurement. I
decided to use the points system (as on www.4thegame.com). Under this
system a yellow card counts for one point whereas a red card is more
severe and counts for three.

To make this easier to calculate I used formulae in Excel:

[IMAGE]

Because some divisions have different numbers of teams than others,
some teams played more games than others. This means their players had
slightly more opportunities to get booked or sent off, so their points
totals might be higher. To correct for this I divided the points
scores by the number of games each team had to play to give a
'Disciplinary Points Per Game' score. This can then be compared to any
other team in any division.

To give a measure of how much better or worse the team's disciplinary
record is away and at home I decided to divide the away points per
game score by the home. I subtracted one from this and expressed it as
a percentage. This gives a positive percentage if the team has a worse
disciplinary record away and a negative one if it is worse at home.

Pilot Study

In order to find out how well my data would support my hypothesis
about teams having a worse disciplinary record away than at home I
made a bar chart using Excel to show the difference between
disciplinary points per game away and at home.

[IMAGE]

As you can see most teams have a considerably worse disciplinary
record away than at home, as shown by the taller red bars. For this
bar chart I simply ranked the teams in the Premiership and the First
Division from the top of the Premiership (1) to the bottom of Division
1 (44). The names of these teams can be found in the appendix at the
back.

Stratified random sampling

In order to better represent football at other levels of the game I
also collected data for lower divisions (Division 2 and Division 3).
However this gave me far too much data - a total of 92 teams - to
perform statistical tests such as the Wilcoxon Signed Rank Test. In
order to cut down on this I decided to use random sampling to lower
the number of teams involved.

However, if I just randomly selected teams from all of the divisions
put together I might over-represent some divisions over others,
affecting the results. To make this fairer I decided to use stratified
random sampling, with the different divisions as the strata. This way
I was sure to get proportionate numbers of teams from each division.

I chose to take 25% of the teams in each division, to give me 23 sets
of data - a much more manageable figure! I chose the teams by writing
the numbers of the teams in each division e.g. 1-24 on small pieces of
paper. I folded these up, shuffled them and picked them at random
until I had the right number.

Once I had chosen the teams I put them in a new spreadsheet. I
produced another bar chart similar to the one I had produced for the
preliminary test. This illustrates how well my randomly sampled data
supports my hypothesis.

[IMAGE]

As you can see the pattern I noticed in the pilot study is continued
with the data from the other divisions. The teams' away disciplinary
record is in almost all cases worse than at home.

As further evidence of this I found the mean disciplinary points per
game at home and away. At home this was about 1.71 compared to about
2.28 away (to 3 significant figures). This shows a 33% difference
between the two. I will now test whether or not this difference is
statistically significant. I chose to compare the means of the two
sets because this gives more weight to big differences between two
scores than small differences.

Wilcoxon signed-rank test

Although graphs and charts can illustrate trends in data they cannot
prove that my hypothesis is true. In order to prove my hypothesis I
will have to use a statistical test. Because my data is nonparametric
(i.e. I have no reason to believe it will follow a normal
distribution) and I am comparing pairs of data from two categories I
will use the Wilcoxon signed-rank test.

Method:

1. First I found the difference between the home and away
disciplinary points per game for each team by subtracting one from
the other using Excel.

2. Because some of the differences were negative I used the abs()
function in Excel to find the absolute values of the differences.

3. I sorted the data by the absolute differences between the home
and away disciplinary points per game. Ignoring the teams where
the difference was zero, I ranked them in order from the lowest to
the highest. Where several were the same I found the mean between
them.

4. I then looked to see where the differences had originally been
negative and I added the negative sign in front of the rank for
those differences. This gave me the signed rank.

5. Finally I found the greatest absolute sum of the signed rank (in
this case the negative ranks), which is the 'W' value. The number
of teams where the difference is not equal to zero gives the 'N'
value.

A

B

Original

Absolute

Rank of absolute

Signed Rank

Team

Home PPG

Away PPG

(XA-XB)

(XA-XB)

(XA-XB)

Manchester United

1.842105263

2.421052632

-0.57895

0.578947

7

-7

Tottenham Hotspur

1.947368421

1.947368421

0

0

Birmingham City

1.947368421

2.842105263

-0.89474

0.894737

13

-13

Aston Villa

2.263157895

2.105263158

0.157895

0.157895

2

2

Bolton Wanderers

2.105263158

2.368421053

-0.26316

0.263158

4

-4

Portsmouth

1.434782609

1.913043478

-0.47826

0.478261

6

-6

Wolverhampton

1.52173913

2.173913043

-0.65217

0.652174

8.5

-8.5

Norwich

1.47826087

1.47826087

0

0

Wimbledon

1.130434783

1.913043478

-0.78261

0.782609

11

-11

Rotherham United

1.869565217

2.869565217

-1

1

14

-14

Grimsby

2.304347826

1.608695652

0.695652

0.695652

10

10

Crewe Alexandria

0.913043478

1

-0.08696

0.086957

1

-1

Cheltenham Town

1.608695652

1.434782609

0.173913

0.173913

3

3

Huddersfield Town

1.130434783

2.52173913

-1.3913

1.391304

19

-19

Northampton Town

1.826086957

1.826086957

0

0

Bristol City

1.434782609

2.782608696

-1.34783

1.347826

18

-18

QPR

1.695652174

2.782608696

-1.08696

1.086957

16.5

-16.5

Rushden & Diamonds

1.608695652

2.652173913

-1.04348

1.043478

15

-15

Lincoln City

2

2.652173913

-0.65217

0.652174

8.5

-8.5

Bury

1.043478261

2.608695652

-1.56522

1.565217

20

-20

Darlington

2.217391304

2.565217391

-0.34783

0.347826

5

-5

Leyton Orient

1.826086957

2.695652174

-0.86957

0.869565

12

-12

Shrewsbury Town

2.173913043

3.260869565

-1.08696

1.086957

16.5

-16.5

W

-153

|W|

153

N

20

I found that the value of W was 195, and that N, the number of teams
where the difference was not equal to zero, was 20. Looking these up
in a table of critical values (OCR AS/A Level MEI Structured
Mathematics Examination Formulae and Tables, October 2000) I found
that there was only a 5% chance that the difference between home and
away points per game was due to chance alone. This means that there is
a 95% probability that the difference between disciplinary record at
home and away is not due to chance alone. Therefore my hypothesis is
highly likely to be correct.

I proposed this hypothesis because a better attended team would have
more of the crowd behind them when playing at home, giving them a
psychological advantage over their opponents.

As with the disciplinary points system, I used Excel to find the
points per game score for each team both at home and away. This time I
divided the home points per game score by the away and subtracted one
from this, expressing it as a percentage.

A problem arises because some teams have much bigger stadiums than
others. For example, 20,000 might be considered good attendance for a
First Division club, but very poor for a Premiership team. Because of
this I divided the total capacity of each football ground by the
average number of home supporters there to give the average attendance
percentage. I plotted this against the home advantage percentage in a
scatter graph.

Pilot Study

The scatter graph is a useful way of looking for correlation between
two variables. As with the first hypothesis I used the data for the
Premiership and the First Division as a pilot test.

[IMAGE]

As you can see there is no strong correlation between these two
variables. There may be a slight trend for the higher home advantage
percentages to be towards the higher percentages of stadium capacity.
I decided to continue investigating this hypothesis because there
might be clearer correlation in the data from the other divisions.

Spearman's Rank

In order to tell for certain whether or not there is correlation
between home advantage and attendance Because this data is also
nonparametric I will need to use the Spearman's Rank Correlation
Coefficient.

Method:

1. The first step was to rank the teams by both % Home Advantage and
Average % Capacity. As with the Wilcoxon test I found the mean of
tied ranks.

2. I found the difference between these two ranks by subtracting one
from the other using Excel.

3. I then squared the differences between the two ranks.

4. I used the formula below to find rs, the Spearman's Rank
Correlation Coefficient. My workings are illustrated in the table
overleaf.

[IMAGE]

[IMAGE]

d = the difference in the rank of the values of each matched pair

n = the number of pairs

rs = 1 - 6âˆ‘d2

[IMAGE] n3 - n

Team

% PPG Home Advantage

Average % capacity

%PPG Home Advantage Rank

Average % Capacity Rank

d

d2

Manchester United

52%

99.16%

24

1

23

529

Tottenham Hotspur

63%

99.06%

19

2

17

289

Portsmouth

23%

98.72%

35

3

32

1024

West Ham United

10%

96.59%

41

4

37

1369

Birmingham City

53%

96.07%

22

5

17

289

Everton

81%

95.82%

10

6

4

16

Brighton

50%

95.56%

26

7

19

361

West Bromwich Albion

17%

95.46%

37.5

8

29.5

870.25

Liverpool

21%

95.33%

36

9

27

729

Norwich

100%

94.81%

4.5

10

-5.5

30.25

Wolverhampton

-5%

90.25%

44

11

33

1089

Bolton Wanderers

93%

89.73%

8

12

-4

16

Hull City

68%

84.72%

16

13

3

9

Blackburn Rovers

31%

83.61%

29

14

15

225

Aston Villa

250%

81.87%

1

15

-14

196

Nottingham Forest

96%

79.85%

6.5

16

-9.5

90.25

Derby

60%

75.81%

20

17

3

9

QPR

24%

68.97%

34

18

16

256

Hartlepool United

66%

68.38%

17

19

-2

4

Northampton Town

79%

68.09%

11

20

-9

81

Crewe Alexandria

-21%

67.04%

46

21

25

625

Rotherham United

27%

65.33%

31

22

9

81

Rushden & Diamonds

56%

65.26%

21

23

-2

4

Preston North End

90%

64.70%

9

24

-15

225

Watford

73%

64.45%

13

25

-12

144

Cheltenham Town

29%

62.85%

30

26

4

16

Grimsby

17%

58.65%

37.5

27

10.5

110.25

AFC Bournemouth

96%

58.37%

6.5

28

-21.5

462.25

Bristol City

52%

55.36%

24

29

-5

25

York City

75%

46.48%

12

30

-18

324

Boston United

105%

46.20%

3

31

-28

784

Chesterfield

185%

45.85%

2

32

-30

900

Shrewsbury Town

5%

45.70%

42

33

9

81

Colchester United

15%

44.83%

39.5

34

5.5

30.25

Milwall

44%

42.24%

27

35

-8

64

Barnsley

26%

42.09%

32.5

36

-3.5

12.25

Scunthorpe United

32%

40.20%

28

37

-9

81

Darlington

70%

39.17%

15

38

-23

529

Huddersfield Town

100%

38.80%

4.5

39

-34.5

1190.25

Lincoln City

26%

35.94%

32.5

40

-7.5

56.25

Leyton Orient

65%

33.84%

18

41

-23

529

Peterborough United

15%

32.33%

39.5

42

-2.5

6.25

Wigan Athletic

-4%

29.15%

43

43

0

0

Bury

-16%

27.65%

45

44

1

1

Port Vale

52%

19.84%

24

45

-21

441

Wimbledon

71%

10.55%

14

46

-32

1024

âˆ‘d2

15227.5

n

46

n3

97336

1 - ((6âˆ‘d2) / (n3 - n))

0.0609

I found that rs = 0.0609, and that the critical value for rs at 10%
was 0.2456 (OCR AS/A Level MEI Structured Mathematics Examination
Formulae and Tables, October 2000). This means that the data fails the
test for correlation at 10%, meaning there is a greater than 10%
probability that any apparent correlation occurred only by chance.

This is no great surprise to me, as the pilot test showed little or no
correlation. Unfortunately my hypothesis does not seem to be correct.
Perhaps the fact that away supporters are not included might have made
a difference - if a team is well-supported away from home it might
reverse the disadvantage I predicted. I could not find any data on
away supporters so I am unable to investigate this possibility.

Hypothesis 3 - More successful teams have a better disciplinary record
----------------------------------------------------------------------

Pilot Study

My idea for a third hypothesis was that a team struggling at the
bottom of the table facing relegation would lose confidence and become
desperate, causing the players to commit more fouls. On the other
hand, a team was near the top of the table would be confident and more
relaxed, and would not feel the need for desperate challenges etc.

As a pilot test I decided to plot a scatter graph to look for a
relationship between the position of a team within its division and
its disciplinary points per game. As with the other tests I used only
the data for the Premiership and the First Divisio[IMAGE]n.

[IMAGE]

This graph doesn't show an obvious trend, but there is a slight
tendency for the disciplinary points to rise further down the table,
especially in the First Division. The second team in Division 1
(Leicester, shown circled) is clearly an outlier, and perhaps if I
continued the study on the other divisions a clearer pattern would
emerge.

In order to test this hypothesis further I decided to take all of the
data from the Football League and randomly select 3 teams from the top
25% and 3 teams from the bottom 25% of each division. This means the
data is collected using stratified random sampling. However, as the
Premiership has only 20 teams instead of 24 it is slightly
over-represented compared to divisions 1-3.

Most importantly I am not using the data from the middle 50% of the
divisions, so any possible patterns there will be lost. However, there
are two good reasons to sacrifice this data. Firstly, any differences
between successful and unsuccessful teams would be most apparent at
the top and bottom of each division. Secondly I need a more manageable
sample size which I can perform statistical tests on.

I produced two histograms to show any difference between top and
bottom teams.

[IMAGE][IMAGE]

As you can see, slightly more teams in the lower quarters of the
divisions have higher disciplinary points per game, while slightly
more teams in the upper quarters of the divisions have lower
disciplinary points per game. The easiest way to tell this is that the
histogram for the bottom 25% is shifted slightly to the right compared
to the one for the top 25%.

I calculated the median for each set of data to give an idea of the
central tendency for each distribution. I used the mean because I am
comparing the 'average team' in the top 25% with the 'average team' in
the bottom 25%. The median for the upper quarters is 2.12 and for the
lower quarters, 2.41 (answers to 2 decimal places), meaning there is a
14% difference between the two. This suggests that the disciplinary
points per game for the lower teams are generally higher than those of
the upper teams.

In order to tell for certain whether or not there is a significant
difference between the lower and upper quarters of the divisions I
would have to perform a statistical test. In this case I will use the
Mann-Whitney U-Test.

Mann-Whitney U-Test

This is a non-parametric statistical test to show whether or not two
groups of samples are from different populations. In this case it will
show whether or not there is a statistically significant difference
between teams in the top and bottom 25% of each division, comparing
their average disciplinary points per game.

Method:

1. First I ranked the data from both groups in increasing order of
size (see column B in the table overleaf).

2. Next, for each team in group b, I counted how many teams in group
a had a smaller disciplinary points per game total. Teams with
equal disciplinary points per game scored Â½. I did the same for
group a. See column C in the table.

3. I found the total of the column C values for both group a and
group b. I called these two totals Uaand Ub.

4. I chose the smaller value of U and I looked up the critical
values of U at the 5% significance level.

A

B

C

D

Team

Average disciplinary points per game

Number of teams in other group with a lower points per game score

Top (a, blue) or bottom (b, red) group

Hartlepool United

1.369565217

0

a

Bristol Rovers

1.652173913

1

b

Portsmouth

1.673913043

1

a

Cardiff City

1.739130435

1

a

Sheffield United

1.826086957

1

a

Rochdale

1.826086957

4

b

Huddersfield Town

1.826086957

4

b

AFC Bournemouth

1.847826087

3

a

Wolverhampton

1.847826087

3

a

Brighton

1.913043478

6

b

Sheffield Wednesday

1.956521739

6

b

Chelsea

1.973684211

5

a

Stoke

1.97826087

7

b

Liverpool

2.131578947

6

a

Aston Villa

2.184210526

8

b

Scunthorpe United

2.195652174

7

a

Bolton Wanderers

2.236842105

9

b

QPR

2.239130435

8.5

a

Barnsley

2.239130435

9.5

b

Swansea City

2.304347826

10

b

West Ham United

2.315789474

10

b

Arsenal

2.342105263

11

a

Oldham Athletic

2.391304348

11

a

Mansfield Town

2.456521739

12

b

Sum of column C values for group a (Ua)

57.5

Sum of column C values for group b (Ub)

86.5

Results

I found that the lower value of U was Ua (57.5). The critical value
for U at the 5% significance level was 37(Advanced Biology Study Guide
by C J Clegg & D G MacKean, 1996). This meant that Ua was larger than
the critical value of U at the 5% significance level. Therefore the
difference between teams in the top and bottom 25% of each division,
comparing the average disciplinary points per game, is not
significant. There is a greater than 5% probability that the
difference was caused by chance alone.

Again this result is hardly surprising considering the lack of strong
correlation in the pilot test. There could be several reasons why this
hypothesis failed. Perhaps certain teams do well whilst still playing
dirty - maybe this is even a valid tactic for success!

It might also be the case that the disciplinary points scores for some
teams are disproportionately increased by certain players who are
frequently booked or sent off - Patrick Viera of Arsenal for example.
I am unable to find data on individual players so I cannot investigate
this further.

Evaluation
----------

I am quite pleased with the way my investigation went. Although
hypotheses 2 and 3 were not statistically supported by my data, these
raised other interesting questions, which could be investigated. Of
course there are certain limitations to my study. The data I used came
from complete, published tables, and its authenticity is not in doubt.
However, there is nothing to say that the 2002-2003 season was a
typical one, and that my results might have been different for a
different year.

Another important point to consider is that the data for different
teams is not independent. For example, because Manchester United was
top of the Premiership, no other team could possibly be top as well.
In fact, even the points totals of the teams are interdependent - a
team can only be judged in comparison to the other teams it plays. It
is possible that every team played worse in the 2002-2003 season than
in previous or subsequent seasons - it is impossible to tell if this
is true as the points totals for each team are relative to those of
the other teams. Therefore there can be no stand-alone measure of how
good a team is.

It is also important to remember that football is a sport played at
many levels, in hundreds of countries and by many age and social
groups. The English Football League is only a tiny part of this, and
if I conducted my study on different aspects of the game I might obtai
very different results.
Appendix - Team numbers