Lecture7-Jan+28th-reliability-validity2 - ...

This is the end of the preview. Sign up
to
access the rest of the document.

Unformatted text preview: Quiz 5 due NOW
Quiz 6 due will be available after class. Due date is next THURSDAY, before class
Exam’s scores will be available by end of week
We’ll discuss results next tuesday The problem of artifacts
Reliability
Validity The problem of artifacts
Uncontrolled human aspects of the research situation that CONFOUND researcher’s
conclusions Participants related artifacts (aka Demand Characteristics)
Cooperative
▪ Tries to give the ‘best performance’ that matches the
presumed hypothesis
Non‐Cooperative
▪ Doesn’t care about study or tries to sabotage results
Defensive
▪ Wants to be portrayed in good light Researchers related artifacts
Observer bias
▪ Over‐ or under‐estimation of what was observed
Expectancy bias
▪ ‘Self‐fulﬁlling prophecy’ Blind experiments
Deception
‘Double blind’ Automation (Standardization)
Computers
Recording instructions Question participants The problem of artifacts
Reliability
Are our measurements precise?
Validity
Are we really measuring what we think we are
measuring? Extent to which measurements are free of random errors Random error: nonsystematic mistakes in measurement ▪ misreading a questionnaire item
▪ observer looks away when coding behavior
▪ nonsystematic misinterpretations of a behavior What are the implications of random measurement errors for the quality of our
measurements? O = T + E + S
O = a measured score (e.g., performance on an exam)
T = true score (e.g., the value we want)
E = random error
S = systematic error O = T + E
(we’ll ignore S for now, but we’ll return to it later) O = T + E
The error becomes a part of what we’re measuring! Do random errors accumulate?
Answer: No. If E is truly random, we are just as likely to overestimate T as we are to
underestimate T. Note: The average of the seven O’s is equal to T important way to reduce the inﬂuence of random errors of measurement is to use
multiple measurements. Operationally deﬁne latent variables via multiple indicators
Use more than one observer when quantifying
behaviors
Multiple observations How do we assess reliability? (a) test‐retest reliability
(b) alternate‐forms reliability
(c) internal consistency reliability
All rely on correlating two variables.
Reliability is measure by correlation coeﬃcient
▪ Always positive. Thus reliability index varies from 0‐1 Test‐retest reliability:
measure something at least twice at diﬀerent time points.
If errors of measurement are truly random, then
the same errors are unlikely to be made more
than once.
If two measurements of the same thing agree, it is
unlikely that they contain random error. Test‐retest reliability:
IMPORTANT: we are assuming that what we are measuring does NOTvary over time!!!!!!
Sometimes people remember previous answers
▪ INFLATED reliability coeﬃcient Other methods? Alternate‐forms reliability
Use two equivalent tests
Correlation should be high Internal consistency
Extent to which items in questionnaire correlate with each other
If measuring same thing, correlation should be
high
Split‐half: based on an arbitrary split (e.g,
comparing odd and even, ﬁrst half and second
half)
Cronbach’s alpha (α): based on the average of all
possible split‐halves Inter‐rater reliability
Percentage of time agreed
Correlation ratings
Kappa Coeﬃcient ASSUMPTION
The entity being measured is not changing. IMPLICATIONS
As you increase the number of indicators, the amount of random error in the averaged
measurement decreases. NOTE
Common indices of reliability range from 0 to 1; higher numbers indicate better reliability (i.e., less
random error). The problem of artifacts
Reliability
Are our measurements precise?
Validity
Are we really measuring what we think we are
measuring? O = T + E + S
O = a measured score (e.g., performance on an exam)
T = true score (e.g., the value we want)
E = random error
S = systematic error Validity
Degree to which measurements are free of both random error, E, and systematic error, S. Systematic errors reﬂect the inﬂuence of any non‐random factor beyond what we’re
attempting to measure. Do systematic errors accumulate?
YES! Systematic errors exert a constant source of inﬂuence on measurements.
We will always overestimate (or
underestimate) T if systematic error is
present! Note: Each measurement is 2 points higher than the true value of 10.
The errors do no average out. Note: Even when random error is present, E averages to 0 but S does
not. Thus, we have reliable measures that have validity problems. How do we ensure validity?
3 questions Are we measuring what we think we’re measuring? Construct Validity Is the cause‐eﬀect relationship really there?
Internal Validity Are our results generalizable?
External Validity How well did we measure what we intended to? Especially important when we are interested in the theoretical construct per se Nomological network
represents interrelations among variables involving
the construct of interest achieve
in school selfesteem Nomological validity
Degree to which our measure behaves in the
way assumed by the
theoretical network + + ‐
distrust
friends ability to
cope Should predict grades in school. achieve
in school + + Should fail to be related ability to
cope selfesteem to variables unrelated
to self‐esteem ‐
distrust
friends like
coffee How well did we measure what we intended to? Nomological validity
▪ Degree to which measure behaves in the way assumed by
theoretical network Face validity
▪ Does it look like it’s measuring the construct? How well did we measure what we intended to? Nomological validity
Face validity Content validity
▪ Does it include all relevant component of the construct
and exclude irrelevant ones? How well did we measure what we intended to? Nomological validity
Face validity
Content validity Convergent validity
▪ Does it correlate with measures that assess same
construct? How well did we measure what we intended to? Nomological validity
Face validity
Content validity
Convergent validity Discriminant validity
▪ Whether it FAILS to correlate with measures that assess
diﬀerent construct Concerns cause and eﬀect relationship
Low internal validity for predictive designs
Correlation does not imply causation High internal validity for explanatory designs
Manipulation changes outcome
Need to rule out eﬀect of extraneous variables
▪ Extraneous vs. Confounding variable Do ﬁnding generalize?
Representative Sample
Report sample characteristics Representative Setting
Diﬃcult in experimental designs
Report setting characteristics High
Internal Validity High
External Validity Causality Generalizablilty Explanatory
designs Predictive
designs Low External Validity Low Internal Validity Cannot have validity if there is no reliability Reliability does NOT guarantee validity ...
View Full
Document