Statistics coursework

Extracts from this document...

Introduction

GCSE maths coursework – Statistics

Introduction and planning

The data I have been given is from a driving school’s records based on how well 240 of their drivers did in their driving exams. I am going to use this data for statistical analysis, which will prove or disprove my hypotheses.

Hypothesis: a hypothesis is a statement; this statement could be proved true or false from data relevant to the statement. I will be using graphs and statistical data to make my data analysis easy to compare and to verify my hypothesis.

My hypotheses are:

Male drivers perform better than female drivers in their driving tests. I.e. they make fewer mistakes.

Better means to be superior in some personal quality or attainment, so in the case of my main hypothesis I would like to prove that men are better drivers than females. I chose my core hypothesis because I thought it was very interesting and a very controversial statement that interests me. There has also been scientific research carried out to prove or disprove this statement worldwide.

The more driving lessons a pupil has the less minor mistakes they will make in their driving test;

I think this is true because the driver will gain more experience through more driving lessons and there will be negative correlation shown in this hypothesis between these two factors. Correlation is the relationship between two variables.

Drivers are more likely to make minor mistakes in the morning during driving tests than the afternoon;

I think this is true because the traffic is very busy during the morning as many people are going to work or dropping their children off at school therefore drivers with driving tests in the mornings are likely to make more minor mistakes.

Another factor that could affect my results is that if more drivers take their driving test as a certain hour then you would expect more minor mistakes to be made for this hour.

I think that the line graph I used wasn’t the best way to represent this data; so I drew an accompanying pie chart to show the number of drivers taking their driving test at each hour and a table to show the percentage of minor mistakes made at each hour.

To extend my hypothesis for better results I could resample and use equal numbers of drivers who took their driving tests at the same hour. But this would be extremely hard to sample randomly because I would have to have the same number of drivers taking the test at each hour; the drivers also have to be sampled proportional to gender and instructor to keep my sampling methods the same. If I didn’t then my investigation won’t be fair.

To get better and more accurate results I could extend my investigation by taking a larger sample of about 80, and also include the age of the drivers as older drivers are more likely to have slower reaction times and make more minor mistakes, and I could also list the exact times that drivers took their driving tests to improve the results of my hypotheses. I could also use the data handling tool to draw graphs for the whole population instead of using a sample.

Related GCSE Miscellaneous essays

I predict that as age (or Key Stage) increases, spread (range and standard deviation) decreases and the average becomes closer to zero. The data statistics for this has also been created in Autograph and inserted. Results This is how the results look for each Key Stage, with all of the

However if the candidate takes their test in a place that they are not familiar with then they may make more mistakes. DATA-The data I am using to test my hypothesis with is secondary data. The data consists of the gender of the candidate, the number of one hour lessons

In my case I numbered the counties in the north and south of England, there were approx 23 in the south, so I did the following equation: 30(the sample size I needed) / 23(number of counties approx) = 1.2 (approx).

This will be the number of which data which will be included in my sample. I will do this by using the random function in Excel. Systematic Sampling This is a much simpler method of sampling which I will use for my second hypothesis.

The distribution will also be likely to be normal if the mode, median and mean have similar values. Data collection In the RGS athletic sports results database, there are 96 entries for years 7 and 8 each; and 100 entries for years 9, 10 and 11 each.

In other questions such calculations, I tried to round up to the third decimal place to make the calculations easier. If I did not round them up, the number becomes too long and it will become harder to use them in calculations.

My sample size shall be 50 people. I chose 50 because it is big enough to fairly represent the population but not too big that it is too time consuming to draw diagrams with. To do a stratified sample I will split my population into two, one side for males and one side for females and numerically label them in ascending order.