Types of Reliability, Validity, and Consisitency

1. List the different types of reliability and validity. What sort of reliability and validity would you seek in assessing employee performance?

2. What are some examples of nominal, ordinal, interval, and ratio data? What is internal consistency? How can you determine whether your survey has internal consistency?

Solution Preview

Feedback

1. List the different types of reliability and validity. What sort of reliability and validity would you seek in assessing employee performance?

2. What are some examples of nominal, ordinal, interval, and ratio data? What is internal consistency? How can you determine whether your survey has internal consistency?
-----------------------------------------------------------------------------------------------------------------------------------------
Types of reliability
http://changingminds.org/explanations/research/design/types_reliability.htm

'Reliability' of any research is the degree to which it gives an accurate score across a range of measurement. It can thus be viewed as being 'repeatability' or 'consistency'. In summary:

Inter-rater: Different people, same test.
Test-retest: Same people, different times.
Parallel-forms: Different people, same time, different test.
Internal consistency: Different questions, same construct.
Inter-Rater Reliability
When multiple people are giving assessments of some kind or are the subjects of some test, then similar people should lead to the same resulting scores. It can be used to calibrate people, for example those being used as observers in an experiment.

Two major ways in which inter-rater reliability is used are (a) testing how similarly people categorize items, and (b) how similarly people score items.

This is the best way of assessing reliability when you are using observation, as observer bias very easily creeps in. It does, however, assume you have multiple observers, which is not always the case.

Inter-rater reliability is also known as inter-observer reliability or inter-coder reliability.

Examples
Two people may be asked to categorize pictures of animals as being dogs or cats. A perfectly reliable result would be that they both classify the same pictures in the same way.

Observers being used in assessing prisoner stress are asked to assess several 'dummy' people who are briefed to respond in a programmed and consistent way. The variation in results from a standard gives a measure of their reliability.

In a test scenario, an IQ test applied to several people with a true score of 120 should result in a score of 120 for everyone. In practice, there will be usually be some variation between people.

Test-Retest Reliability
An assessment or test of a person should give the same results whenever you apply the test.

Test-retest reliability evaluates reliability across time.

Reliability can vary with the many factors that affect how a person responds to the test, including their mood, interruptions, time of day, etc. A good test will largely cope with such factors and give relatively little variation. An unreliable test is highly sensitive to such factors and will give widely varying results, even if the person re-takes the same test half an hour later.

Generally speaking, the longer the delay between tests, the greater the likely variation. Better tests will give less retest variation with longer delays.

Of course the problem with test-retest is that people may have learned and that the second test is likely to give different results.

This method is particularly used in experiments that use a no-treatment control group that is measure pre-test and post-test.

Examples
Various questions for a personality test are tried out with a class of students over several years. This helps the researcher determine those questions and combinations that have better reliability.

In the development of national school tests, a class of children are given several tests that are intended to assess the same abilities. A week and a month later, they are given the same tests. With allowances for learning, the variation in the test and retest results are used to assess which tests have better test-retest reliability.

Parallel-Forms Reliability
One problem with questions or assessments is knowing what questions are the best ones to ask. A way of discovering this is do two tests in parallel, using different questions.

Parallel-forms reliability evaluates different questions and question sets that seek to assess the same construct.

Parallel-Forms evaluation may be done in combination with other methods, such as Split-half, which divides items that measure the same construct into two tests and applies them to the same group of people.

Examples
An experimenter develops a large set of questions. They split these into two and administer them each to a randomly-selected half of a target sample.

In development of national tests, two different tests are simultaneously used in trials. The test that gives the most consistent results is used, whilst the other (provided it is sufficiently consistent) is used as a backup.

Internal Consistency Reliability
When asking questions in research, the purpose is to assess the response against a given construct or idea. Different questions that test the same construct should give consistent results.

Internal consistency reliability evaluates individual questions in comparison with one another for their ability to give consistently appropriate results.

Average inter-item correlation compares correlations between all pairs of questions that test the same construct by calculating the mean of all paired correlations.

Average item total correlation takes the average inter-item correlations and calculates a total score for each item, then averages these.

Split-half correlation divides items that measure the same construct into two tests, which are applied to the same group of people, then calculates the correlation between the two total scores.

Cronbach's alpha calculates an equivalent to the average of all possible split-half correlations and is calculated thus:

a = (N . r-bar) / (1 + (N-1) . r-bar)

Where N is the number of components,
and r-bar is the average of all Pearson correlation coefficients
----------------------------------------------------------------------------------------------------------------------------------------------
TYPES OF VALIDITY
Here is an overview on the main types of validity used for the scientific method.
by Experiment-Resources.com (2009)
http://www.experiment-resources.com/types-of-validity.html

"Any research can be affected by different kinds of factors which, while extraneous to the concerns of the research, can invalidate the findings" (Seliger & Shohamy 1989, 95).

Let's take a look on the the most frequent uses of validity in the scientific method:

EXTERNAL VALIDITY
External validity is about generalization: To what extent can an effect in research, be generalized to populations, settings, treatment variables, and measurement variables?
External validity is usually split into two distinct types, population validity and ecological validity and they are both essential elements in judging the strength of an experimental design.

INTERNAL VALIDITY
Internal validity is a measure which ensures that a researcher's experiment design closely follows the principle of cause and effect.

"Could there be an alternative cause, or causes, that explain my observations and results?"

TEST VALIDITY
Test validity is an indicator of how much meaning can be placed upon a set of test results.

&#9702;Concurrent validity measures the test against a benchmark test and high correlation indicates that the test has strong criterion validity.

&#9702;Predictive validity is a measure of how well a test predicts abilities. It involves testing a group of subjects for a certain construct and then comparing them with results obtained at some point in the future.

Content Validity
Content validity is the estimate of how much a measure represents every single element of a construct.

Construct Validity
Construct validity defines how well a test or experiment measures up to its claims. A test designed to measure depression must only measure that particular construct, not closely related ideals such as anxiety or stress.

&#9702;Convergent validity tests that constructs that are expected to be related are, in fact, related.

&#9702;Discriminant validity tests that constructs that should have no relationship do, in fact, not have any relationship. (also referred to as divergent validity)

FACE VALIDITY
Face validity is a measure of how representative a research project is 'at face value,' and whether it appears to be a good project.

Read more: http://www.experiment-resources.com/types-of-validity.html#ixzz1InMXiNLv
--------------------------------------------------------------------------------------------------------------------------------------------
What sort of reliability would you seek in assessing employee performance?

Test-retest: Same people, different times, Internal consistency: Different questions, same construct, Inter-rater reliability thus evaluates reliability across different people and ensures evaluative consistency.

At the nominal scale, i.e., for a nominal category, one uses labels; for example, rocks can be generally categorized as igneous, sedimentary and metamorphic. For this scale, some valid operations are equivalence and set membership. Nominal measures offer names or labels for certain characteristics.

Variables assessed on a nominal scale are called categorical variables; see also categorical data.

'Reliability' of any research is the degree to which it gives an accurate score across a range of measurement. It can thus be viewed as being 'repeatability' or 'consistency'. In summary:

Inter-rater: Different people, same test.
Test-retest: Same people, different times.
Parallel-forms: Different people, same time, different test.
Internal consistency: Different questions, same construct.
Inter-Rater Reliability
When multiple people are giving assessments of some kind or are the subjects of some test, then similar people should lead to the same resulting scores. It can be used to calibrate people, for example those being used as observers in an experiment.

TYPES OF VALIDITY
Here is an overview on the main types of validity used for the scientific method.
by Experiment-Resources.com (2009)
http://www.experiment-resources.com/types-of-validity.html

"Any research can be affected by different kinds of factors which, while extraneous to the concerns of the research, can invalidate the findings" (Seliger & Shohamy 1989, 95).

Let's take a look on the the most frequent uses of validity in the scientific method:

EXTERNAL VALIDITY
External validity is about generalization: To what extent can an effect in research, be generalized to populations, settings, treatment variables, and measurement variables?
External validity is usually split into two distinct types, population validity and ecological validity and they are both essential elements in judging the strength of an experimental design.

INTERNAL VALIDITY
Internal validity is a measure which ensures that a researcher's experiment design closely follows the principle of cause and effect.

"Could there be an alternative cause, or causes, that explain my observations and results?"

TEST VALIDITY
Test validity is an indicator of how much meaning can be placed upon a set of test results.