Introduction to Statistics for Psychology
and
Quantitative Methods for Human
Sciences
Jonathan MarchiniCourse Information
There is website devoted to the course at
http://www.stats.ox.ac.uk/marchini/phs.html
This contains
Course timetable/information
Lecture slides (on the website before each lecture)
Lecture notes (a bit more detailed than the slides)
Exercise sheets (tutors may or may not use these)
Formulae booklet and deﬁnitions booklet
Links to past exam papers on the web
PLEASE ENTER THE LECTURE THEATRE QUICKLYLecture 1 : Outline
Why we need Statistics?
- The scientiﬁc process
Different types of data
- Discrete/Continuous
- Quantitative/Qualitative
Methods of looking at data
- Bar charts, Histograms, Dot plots, Scatter plots, Box
plots
Calculating summary measures of data
- Location - Mean, Median, Mode
- Dispersion - SIQR, MAD, sample variance, sample
standard deviationThe role of Statistics in the Scientiﬁc Process
Examine the results
of the statistical test
Use statistics to test
We start with a Propose a Collect
our hypothesis based
question/hypotheis study/experiment Data
on a model of the data
that aims to provide
about a given i.e. take
data to help test our
population of a sample
hypothesis
objects/events from the
population
Study Design
(How can we design
STATISTICS
our study to get the
most information
about our hypothesis)An Example
Psychologists have long been interested in the relationship between stress
and health.
A focused question might involve the study of a speciﬁc psychological
symptom and its impact on the health of the population.
To assess whether the symptom is a good indicator of stress we need
to measure the symptom and stress levels in a sample of individuals from
the population.
It is not immediately clear how we should go about collecting this
sample, i.e. how we should design the study.
We haven’t got very far before we need StatisticsThe general focus of this course
Examine the results
of the statistical test
Use statistics to test
We start with a Propose a Collect
our hypothesis based
question/hypotheis study/experiment Data
on a model of the data
that aims to provide
about a given i.e. take
data to help test our
population of a sample
hypothesis
objects/events from the
population
Study Design
(How can we design
STATISTICS
our study to get the
most information
about our hypothesis)Datasets consist of measured variables
The datasets that Psychologists and Human Scientists collect will usually
consist of one more observations on one or more “variables”.
A variable is a property of an object or event that can take on different
values.
Example Suppose we collect a dataset by measuring the hair colour,
resting heart rate and score on an IQ test of every student in a class. The
variables in this dataset would then simply be hair colour, resting heart
rate and score on an IQ test, i.e. the variables are the properties that we
measured/observed.2 main types of variable
1 Measurement (Quantitative) Data occur when we ‘measure’ things e.g.
height or weight.
2 Categorical (Qualitative) Data occur when we assign objects into
labelled groups or categories e.g. when we group people according to hair
colour or race.
(i) Ordinal variables have a natural ordering e.g. gold/silver/bronze
medal
(i) Nominal variables do not have a natural ordering e.g. genderDiscrete and Continuous Variables
Discrete Data
No. of students late for a lecture
0 1 2 ................................................... 8
There are only a limited set of distinct values/categories
i.e. we can’t have exactly 2.23 students late, only integer values
are allowed.
Continuous Data
Time spent studying statistics (hrs)
3.76 5.67
0
In theory there are an unlimited set of possible values
There are no discrete jumps between possible values.Summary of Data Types
Types of data
Quantitative Qualitative
(Measurement) (Categorical)
Discrete Continuous Discrete
e.g. No. of students e.g.Height, Weight
in a class
Nominal Ordinal
e.g. League position,
e.g. Hair colour, Race,
Medal awarded in
Smoking status
an Olympic eventPlotting Data
One of the most important stages in a statistical analysis can be simply to
look at your data right at the start.
By doing so you will be able to spot characteristic features, trends
and outlying observations that enable you to carry out an appropriate
statistical analysis.
Also, it is a good idea to look at the results of your analysis using a
plot. This can help identify if you did something that wasn’t a good idea
REMEMBER Data is messy No two datasets are the same
ALWAYS LOOK AT YOUR DATAThe Baby-Boom dataset
Forty-four babies (a new record) were born in one 24-hour period at the
Mater Mothers’ Hospital in Brisbane, Queensland, Australia, on December
18, 1997. For each of the 44 babies, The Sunday Mail recorded the time of
birth, the sex of the child, and the birth weight in grams.
Whilst, we did not collect this dataset based on a speciﬁc hypothesis,
if we wished we could use it to answer several questions of interest.
Do girls weigh more than boys at birth?
What is the distribution of the number of births per hour?
Is birth weight related to the time of birth?
Is gender related to the time of birth?
Is there an equal chance of being born a girl or boy?Time Gender Weight Time Gender Weight
Time Gender Weight
5 1 3837 649 1 3746
1105 1 2383
64 1 3334 653 1 3523
1134 2 3428
78 2 3554 693 2 2902
1149 2 4162
115 2 3838 729 2 2635
1187 2 3630
177 2 3625 776 2 3920
1189 2 3406
245 1 2208 785 2 3690
1191 2 3402
247 1 1745 846 1 3430
1210 1 3500
262 2 2846 847 1 3480
1237 2 3736
271 2 3166 873 1 3116
1251 2 3370
428 2 3520 886 1 3428
1264 2 2121
455 2 3380 914 2 3783
1283 2 3150
492 2 3294 991 2 3345
1337 1 3866
494 1 2576 1017 2 3034
1407 1 3542
549 1 3208 1062 1 2184
1435 1 3278
635 2 3521 1087 2 3300Bar Charts
A Bar Chart is a useful method of summarising Categorical Data. We
represent the counts/frequencies/percentages in each category by a bar.
Girl Boy
Frequency
0 4 8 12 16 20 24Histograms
‘A Bar Chart is to Categorical Data as a Histogram is to Measurement
Data’
1500 2000 2500 3000 3500 4000 4500
Birth Weight (g)
Frequency
0 0 5 5 10 10 15 15 20 20Constructing Histograms (an example)
For the baby-boom dataset we can draw a histogram of the birth weights.
To draw the histogram I found the smallest and largest values
smallest = 1745 largest = 4162
There are only 44 weights so I decided on 6 equal sized categories
Interval 1500-2000 2000-2500 2500-3000 3000-3500 3500-4000 4000-4500
Frequency 1 4 4 19 15 1
Using these categories works well, the histogram shows us the shape of the
distribution and we notice that distribution has an extended left ‘tail’.Too few categories Too many categories
1500 2500 3500 4500 1500 2500 3500 4500
Birth Weight (g) Birth Weight (g)
Too few categories and the details are lost. Too many categories and the
overall shape is obscured by too many details
Frequency
0 5 10 15 20 25 30 35
Frequency
0 1 2 3 4 5 6 7Cumulative Frequency Plots and Curves
Interval 1500-2000 2000-2500 2500-3000 3000-3500 3500-4000 4000-4500
Frequency 1 4 4 19 15 1
Cumulative 1 5 9 28 43 44
Frequency
Cumulative Frequency Plot Cumulative Frequency Curve
1500 2000 2500 3000 3500 4000 4500 2000 2500 3000 3500 4000 4500
Birth Weight (g) Birth Weight (g)
Cumulative Frequency
0 10 20 30 40 50
Cumulative Frequency
0 10 20 30 40 50Dot plots
A Dot Plot is a simple and quick way of visualising a dataset. This type
of plot is especially useful if data occur in groups and you wish to quickly
visualise the differences between the groups.
1500 2000 2500 3000 3500 4000 4500
Birth Weight (g)
Gender
Girl BoyScatter Plots
Scatter plots are useful when we wish to visualise the relationship between
two measurement variables.
2000 2500 3000 3500 4000
Birth Weight (g)
Time of birth (mins since 12pm)
0 200 400 600 800 1000 1200 1400