Note: Not all of the different types of reliability apply to the way that questionnaires are typically used. Internal consistency (whether all of the items measure the same construct) is not usually reported in studies of questionnaires; nor is inter-rater reliability (which would measure how similar peoples' responses were if the interviews were repeated again, or different raters listened to the same interview). Therefore, make adjustments as needed.

Not all of the different types of reliability apply to the way that questionnaires are typically used. Internal consistency (whether all of the items measure the same construct) is not usually reported in studies of questionnaires; nor is inter-rater reliability (which would measure how similar peoples' responses were if the interviews were repeated again, or different raters listened to the same interview). Therefore, make adjustments as needed.

Reliability refers to whether the scores are reproducible. Unless otherwise specified, the reliability scores and values come from studies done with a United States population sample. Here is the rubric for evaluating the reliability of scores on a measure for the purpose of evidence based assessment.

Evaluation for norms and reliability for the XXX (table from Youngstrom et al., extending Hunsley & Mash, 2008; *indicates new construct or category)

Criterion

Rating (adequate, good, excellent, too good*)

Explanation with references

Norms

Adequate

Multiple convenience samples and research studies, including both clinical and nonclinical samples[citation needed]

Validity describes the evidence that an assessment tool measures what it was supposed to measure. There are many different ways of checking validity. For screening measures, diagnostic accuracy and w:discriminative validity are probably the most useful ways of looking at validity. Unless otherwise specified, the validity scores and values come from studies done with a United States population sample. Here is a rubric for describing validity of test scores in the context of evidence-based assessment.

Evaluation of validity and utility for the XXX (table from Youngstrom et al., unpublished, extended from Hunsley & Mash, 2008; *indicates new construct or category)

Shows Convergent validity with other symptom scales, longitudinal prediction of development of mood disorders,[3][4][5] criterion validity via metabolic markers[2][6] and associations with family history of mood disorder.[7] Factor structure complicated;[2][8] the inclusion of “biphasic” or “mixed” mood items creates a lot of cross-loading

Used both as self-report and caregiver report; used in college student[8][12] as well as outpatient[9][13][14] and inpatient clinical samples; translated into multiple languages with good reliability

Treatment sensitivity

Good

Multiple studies show sensitivity to treatment effects comparable to using interviews by trained raters, including placebo-controlled, masked assignment trials[15][16] Short forms appear to retain sensitivity to treatment effects while substantially reducing burden[16][17]

Clinical utility

Good

Free (public domain), strong psychometrics, extensive research base. Biggest concerns are length and reading level. Short forms have less research, but are appealing based on reduced burden and promising data

Here is a shell data file that you could use in your own research. The variable names in the shell corresponds with the scoring code in the code for all three statistical programs.

Note that our CSV includes several demographic variables, which follow current conventions in most developmental and clinical psychology journals. You may want to modify them, depending on where you are working. Also pay attention to the possibility of "deductive identification" -- if we ask personal information in enough detail, then it may be possible to figure out the identity of a participant based on a combination of variables.

When different research projects and groups use the same variable names and syntax, it makes it easier to share the data and work together on integrative data analyses or "mega" analyses (which are different and better than meta-analysis in that they are combining the raw data, versus working with summary descriptive statistics).