Research and Statistics Dictionary

Definitions for a wide spectrum of empirical and statistical constructs

A

Absolute risk increase (ARI) - ARI is the absolute difference between the control event rate (CER) and the experimental event rate (EER), |CER-EER|. ARI is necessary for calculating number needed to harm (NNH). The formula for NNH is (1/ARI).

Absolute risk reduction (ARR) - ARR is the absolute difference between the experimental event rate (EER) and the control event rate (CER), |EER-CER|. ARR is necessary for calculating number needed to treat (NNT). The formula for NNT is (1/ARR).

Accuracy in measurement - Accuracy in measurement relates the validity, interpretability, and utility of a variable or outcome.

Alpha Level - The alpha value is the criterion set by researchers that assumes statistical significance as well as the chance they are willing to take for committing a Type I error.

﻿Alternate/Parallel forms reliability﻿ - A type of reliability evidence where two forms of a survey instrument are written to cover the exact same content areas. The instruments should be highly correlated if alternate/parallel forms reliability is assumed.

Analysis of covariance (ANCOVA) - ANCOVA is a between-subjects statistical test that adjusts the outcome variable when comparing three or more independent groups.

Analysis of variance (ANOVA) - ANOVA is a between-subjects statistical test used when comparing three or more independent groups on a continuous outcome. The statistical assumptions of independence of observations, normality, and homogeneity of variance must be met in order to conduct ANOVA. If a significant main effect is found, p < .05, then post hoc tests must be used to explain the main effect.

Bayes' Theorem - A statistical method used to predict the probability of an outcome based on the pretest probability of the outcome based on clinical factors and 2) the ability of diagnostic tests.

Beta Level - The beta value is used to denote how much of a chance a researcher is willing to take for committing a Type II error. The beta value is also used to calculate statistical power (1 - beta value = statistical power).

Between-subjects - A statistical method for comparing independent groups on an outcome.

Biserial - A statistical test of magnitude and direction of association between an ordinal variable and a continuous variable.

Blinding - A process used in experimental designs, such as randomized controlled trials (RCT), whereby study participants, clinicians, and/or researchers are not aware if a treatment or a placebo is being administered. Blinding is used to deter observation biases in experimental research. A single-blinded study is when study participants are now aware if they are being given the treatment or the placebo. A double-blinded study is when study participants and clinicians do not know who is receiving the treatment or placebo. Triple-blinded studies are the most rigorous in that the study participants, clinicians, and the researchers are unaware of the treatment or placebo and outcomes are assessed in an independent fashion.

Blocked randomization - A method of randomization where study participants are randomly allocated to groups in small blocks of four to six "blocks" at a time. This method ensures equally sized groups but is only feasible in smaller studies.

Bloom's Taxonomy - A framework for understanding different levels of "knowing" related to a content area or performance.

Bonferroni - A post hoc test used when testing multiple hypotheses at the same time. The test is often used to correct for familiywise error rates (increased Type I error rates). Take the alpha value being used, often .05, and divide it by the number of statistical tests being conducted. That value is the new alpha value used to assume statistical significance. For example, you are testing eight hypotheses with an alpha of .05, .05/8 = .006. The p-values for each test will have to now be below .006 to achieve statistical significance.

Bootstrap - A method for validating statistical findings. Thousands of random samples are taken in a non-exclusive fashion and statistics are rerun on each sample. The method yields 95% confidence intervals for statistical findings.

C

Case-control - A type of observational research design that establishes associations between outcome variables and potential predictor variables. It is considered a weaker type of observational design due to the amount of bias caused by retrospectively selecting cases and controls and analyzing data. Case-control designs are well-suited for studying rare outcomes and generating hypotheses. Odds ratios are used to establish associations in case-control designs.

Case series - A type of observational research design that yields the lowest level of empirical evidence. A series of observations are taken from a given population in a retrospective fashion and analyzed. Case series designs are useful for studying rare outcomes and hypothesis generation.

Categorical variable - A type of variable that denotes group membership, possession of a characteristic or trait, or categorization of phenomena using numerical values.

Chi-square - A between-subjects statistical test where two independent groups are compared on a dichotomous categorical outcome.

Chi-square Goodness-of-fit - A statistical test where the expected dispersal of proportions in levels of a categorical variable is compared to the observed dispersal of proportions in levels of a a categorical variable.

Clustered random sampling - A probability sampling method where naturally existing clusters of a population are targeted for random selection.

Cochran-Mantel-Haenszel - A statistical test where conditional independence of the association between two categorical variables can be assessed.

Cochran's Q - A within-subjects statistical test where three or more observations of a dichotomous categorical outcome are compared across time or within-subjects.

Codebook - A document that contains information regarding the codification scheme used for variables in a database.

Comparator - In order to establish a treatment effect, the treatment must be compared to a control group. Also known as a control group, be it inactive, active, or another interventions.

Concurrent validity - A type of validity evidence where a survey instrument correlates with a current measure or outcome, simultaneously.

Construct specification - The first step of creating a survey instrument. The researcher creates a construct specification with an explicit operational definition for the construct of interest, subsequent content areas with valid citations, and a table of allocated percentages to reflect the content areas within the test.

Construct validity - A type of validity evidence that all forms of validity evidence fall under. As more types of validity evidence are generated and a measure becomes more prevalent in the literature, more construct validity evidence is assumed to exist.

Content validity - A type of validity evidence where a survey instrument represents the knowledge or content base associated with a construct of interest.

Continuous variable - A type of variable where a "true zero" exists and measures of magnitude and distance can be assessed.

Control Event Rate (CER) - The proportion of control participants that had the outcome of interest. CER is used to calculate other epidemiological measures.

Control variable - A type of variable that is entered into a multivariate model because it can affect the association between predictor and outcome variables. The way to control for a variable is to enter it into a multivariate model along with pertinent predictor variables.

Convenience sampling - A non-probability sampling method where observations are chosen by researchers that have access to data.

Convergent validity - A type of validity evidence where a survey instrument positively correlates with a measure that is theoretically or conceptually similar.

Correlation - A bivariate measure of relationship or association between two variables of any scale of measurement.

Count variable - A type of variable that is the frequency of times that an outcome occurs. These types of continuous outcomes are naturally skewed.

Counterbalanced design - A type of quasi-experimental design where multiple treatments can be tested at the same time.

Cox regression - A statistical test of survival analysis where independent groups are compared on the "time-to-event" or temporal aspects of developing a dichotomous categorical outcome, when controlling for demographic, predictor, control, and confounding variables. Cox regression is the multivariate extension of the Kaplan-Meier curve.

Cross-Over Randomized Design - A type of randomized design where participants are randomly allocated to either a treatment group or a control group and after receiving the intervention, are given a "washout" period. Then, the treatment group participants receive the control intervention and the control group receives the treatment intervention. This is a powerful design because participants serve as their own controls.

Diagnostic accuracy - A diagnostic testing measure that provides the overall proportion of correct diagnoses yielded from a diagnostic test.

Diagnostic testing - A method for establishing the ability of a diagnostic test to detect disease states or healthy people in a precise and accurate fashion.

E

Effect size – Commonly known as the hypothesized difference between independent groups (between-subjects) or observations of an outcome across time (within-subjects). It is also the hypothesized magnitude of an association (correlation) or expected beta weight (multivariate).

Exclusion criteria - When choosing exclusion criteria in a research study, you are focused on types of individuals at risk of being lost to follow-up, do not possess necessary demographic or clinical characteristics related to a study, or may experience adverse effects of treatment.

Experimental Event Rate (EER) - The proportion of treatment participants that had the outcome of interest. EER is used to calculate other epidemiological measures.

Expert Review - Step two of the eight-step validated methodology of creating survey instruments. The construct specification is given to a panel of experts and suggestions, revisions, and additions are integrated into the construct specification. The panel should be made up of experts in the area of interest.

F

Face validity - A type of validity evidence where a survey instrument appears, at face value, to measure what it is supposed to measure.

FINER - A popular mnemonic for formulating research questions. The mnemonic stands for feasible, interesting, novel, ethical, and relevant.

Fisher's Exact test - A between-subjects statistical test where two independent groups are compared on a dichotomous categorical outcome. This statistic is used when there are less than five observations in any of the cells in the 2x2 table.

Fixed-effects ANOVA - A type of multivariate test where continuous outcomes are assessed concurrently across multiple levels of several categorical predictor variables.

Frequency – A descriptive statistic used to describe categorical data. A frequency is the number of times that something occurs or does not occur.

Friedman's ANOVA - A non-parametric statistical test used for three or more observations of an ordinal outcome across time or within-subjects. It is also used when the statistical assumptions of repeated-measures ANOVA cannot be met.

G

Goal - A goal is something that a person or group of people want to achieve or obtain. There are always objectives associated with completing goals.

Greenhouse-Geisser – A statistical correction made with repeated-measures ANOVA due to violation of the assumption of sphericity.

H

Hierarchical regression - A type of regression where predictor, demographic, control, and confounding variables are entered into a model sequentially to understand their contribution of unique variance to the outcome.

Homogeneity of variance - A statistical assumption that is assessed when comparing independent groups on a continuous outcome. It is tested using the Levene's test.

Hypothesis testing - An empirical methodology that exists coincident with inferential statistics that serves as the basis for either rejecting a null hypothesis that an association exists or not rejecting a null hypothesis because an association does exist.

I

Incidence – The number of new cases that occur moving forward in time.

Inclusion criteria - Inclusion criteria specifically indentify the characteristics of participants that are going to be objects of analysis. Demogrpahic, clinical, geographical, and temporal characteristics are most often used to define inclusion criteria. With criteria, a deductive approach is used to define the specific population that is being studied.

Incremental validity - A type of validity evidence where a survey instrument adds unique variance to the current level of measurement and understanding for a construct. Stepwise regression is used to yield evidence of incremental validity.

Internal Consistency Reliability - A type of reliability evidence where an assumption is made that all items should be intercorrelated because they are written to cover one construct that has several component areas. If all the items correlate above a certain level (Cronbach's alpha > .75), the evidence of internal consistency reliability can be assumed. Other forms of internal consistency reliability include response sets with dichotomous responses, Likert responses, and continuous responses.

Inter-Rater Reliability - A type of reliability evidence where independent raters of the same phenomenon, characteristic, or behavior are assessed on their agreement on a certain performance, event, or occurrence.

Interval variable - A type of variable that does not possess a "true zero" but can be used with parametric statistics, given the meeting of statistical assumptions. Interval level measurement can provide measures of distance, but not magnitude.

Item stem - The part of a survey item that acts a stimulus to elicit a reaction from participants.

J

Jack-knife - A method for validating statistical findings. Each participant is removed from the sample sequentially and statistical analyses are repeated until every derivation of the sample is assessed.

K

Kaplan-Meier - A statistical test or survival analysis where independent groups are compared on the "time-to-event" or temporal aspects of a dichotomous categorical outcome.

Kappa - A measure of inter-rater reliability where ratings are given at a dichotomous categorical level of measurement.

Known-groups validity - A type of validity evidence where a survey instrument differentiates between independent groups. Between-subjects statistics are used to establish this type of validity evidence.

Logarithmic transformation – A statistical method used when continuous distributions are skewed due to outliers. The natural log (ln) of each observation is taken and inferential statistics are conducted. The mean values are no longer interpretable, but the p-value associated with the inferential statistic can be interpreted.

Logistic regression - A type of regression where the outcome is a dichotomous categorical outcome variable. Adjusted odds ratios with 95% confidence intervals are the primary inferences yielded from the analysis.

M

Mann-Whitney U - A between-subjects statistical test where two independent groups are compared on an ordinal outcome. The Mann-Whitney U test is also used when the assumption of homogeneity of variance is violated for an independent-samples t-test. Also, the Mann-Whitney U test is used as a post hoc test for significant main effects for Kruskal-Wallis tests.

McNemar's - A within-subjects statistical test where two observations of a categorical outcome are compared across time or within-subjects.

Mean - A descriptive statistic that is the average of a continuous distribution. All values in the distribution are added together and that value is divided by the number of observations. It is used to give context to the findings of inferential statistics and p-values.

Median - A descriptive statistic that shows what observation occurs in the middle of a continuous distribution. It is used to give context to the findings of ordinal level variables and some non-parametric statistics.

Mixed-effects ANOVA - A type of multivariate statistical test where continuous outcomes are compared between independent groups (between-subjects) across time or within-subjects. Essentially, researchers are testing to see if independent groups change at a different pace across time in regards to the outcome.

Mode - A descriptive statistic that shows what observation in a continuous distribution occurs the most often.

Multicollinearity - A phenomenon in regression modeling where predictor variables are highly correlated to each other and artifically inflate multivariate associations. Variance Inflation Factor (VIF) and Tolerance are used to assess multicollinearity in regression models.

Multinomial logistic regression - A type of regression where the outcome is a polychotomous categorical outcome variable. Adjusted odds ratios with 95% confidence intervals are the primary inferences yielded from the analysis.

Multiple regression - A type of regression where the outcome is a continuous variable. Statistical assumptions of normality, homoscedasticity, and linearity must be met before interpreting a multiple regression model.

N

Negative binomial regression - A method of regression used when predicting for count outcomes. The variance of the count outcome is higher than the mean of the outcome.

Negative Predictive Value (NPV) - The proportion of people that test negative with a diagnostic test and do not have the disease state as rated by the "gold standard."

Nested case-control - An observational design embedded into a prospective cohort study where baseline measures and specimens are available for analysis.

Nominal variable - A type of variable where things are "named" or "categorized" using numerical values. Computers and software programs do no understand what variables mean at a qualitative level. Therefore, nominal variables use logical numerical headings to denote group participation or possession of characteristic or trait.

Nomograms - A statistical technique where the predicted probabilities of a regression model are mapped across a scoring system to give patients and clinicians a relative understanding of the risk of developing an outcome given certain risk factors.

Nonequivalent control group design - A type of quasi-experimental design where randomization occurs at the point of intervention, meaning that groups are assigned to treatments.

Non-inferiority trial - A type of experimental design where a treatment is tested to assess if it is "just as good" as the "gold standard."

Non-Parametric Statistics - A "family" of inferential statistical tests that are used with categorical and ordinal outcomes and when the statistical assumptions of parametric statistics (normality and homogeneity of variance) are violated.

Normal probability plot - A regression diagnostic used to assess the assumptions of normality and homoscedasticity in multiple regression. It is also known as a P-P plot. The plot is the cumulative frequency of the distribution of standardized residuals yielded from the model against the residuals associated with a normal probability graph scale.

Normality - A statistical assumption for continuous level measurements where the distribution resembles the "bell-curve," normal distribution, or Gaussian distribution. Skewness and kurtosis statistics are used to assess the assumption of normality. Skewness and kurtosis statistics below an absolute value of 2.0 are considered normal. Values above 2.0 assume a non-normal distribution.

Number Needed to Harm (NNH) - The number of people that have to be treated to cause a bad outcome in the future. Higher NNH values are preferable.

Number Needed to Treat (NNT) - The number of people that need to be treated to prevent a bad outcome in the future. Lower NNT values are preferable.

O

Objective - An action or behavior that upon completion, leads towards the completion of a goal. Bloom's Taxonomy is an excellent framework for writing goals and objectives.

One-sample median test - A statistical test where an expected median value is compared to an observed median value in a population.

One-sample t-test - A statistical test where an expected mean value is compared to an observed mean value in a population.

One-sided hypothesis - A type of hypothesis that postulates a "directional" effect in either an increasing or decreasing fashion. These types of hypotheses should be used rarely. Most journals require the more rigorous two-sided hypothesis.

Ordinal outcome - A type of outcome measured at the ordinal level. Proportional Odds Regression is used to assess this type of multivariate method.

Ordinal variable - A type of variable where a sense of order but not distance or magnitude is assessed. Likert-type scales are considered ordinal.

Outcome variable - A type of variable that can be observed and measured in a valid fashion.

P

Parallel Randomized Design - A type of experimental design where study participants are randomly allocated to either the treatment or control group and stay in that group throughout the entirety of the study.

Parametric Statistics - A "family" of inferential statistical tests that are used with continuous outcomes and when statistical assumptions like normality and homogeneity of variance are met.

Pearson's r - A statistical test of magnitude and direction of association between two continuous variables.

Phi-coefficient - A statistical test of magnitude and direction of association between two categorical variables.

PICO - A popular mnemonic for writing research questions. The mnemonic stands for population, intervention, comparator, and outcome.

Point Biserial - A statistical test of magnitude and direction of association between a categorical variable and a continuous variable.

Poisson regression - A method of regression used when predicting for count outcomes. The mean of the count outcome is higher than the variance of the outcome.

Positive Predictive Value (PPV) - The proportion of people that test positive for a diagnostic test and have the disease state as rated by the "gold standard." As prevalence for a disease state increases, the PPV will increase.

Precision in measurement - Precision in measurement relates to the reliability, consistency, and confidence associated with a variable or outcome.

Predictive validity - A type of validity evidence where a survey instrument can predict for future outcomes.

Purposive sampling - A non-probability sampling method where researchers target specific groups in a population for sampling.

R

Random assignment - A method used in experimental designs where study participants that have been randomly selected from the population are randomly allocated to either the treatment group or the control group.

Random-effects ANOVA - A type of multivariate test where multiple within-subjects effects are tested across time.

Random selection - A method used in experimental designs where all members of a given population have an equal chance of being selected for participation in the study.

Randomized Controlled Trial (RCT) - An experimental research design where study participants are randomly selected from the population and randomly assigned to treatment groups. Analyses must be conducted in a "intention-to-treat" fashion.

Randomized research designs - Experimental designs where study participants are randomly assigned to treatment groups so that they are assumed to be comparable or possess equipose at baseline before an intervention.

Rank Biserial - A statistical test of magnitude and direction of association between an ordinal variable and a categorical variable.

Ratio variable - A type of variable where a "true zero" exists and measures of magnitude and distance can be assessed.

Reliability - The consistency, stability, and precision of variables and measurement.

Repeated-measures t-test - A within-subjects statistical test where two observations of a continuous outcome are compared across time or within-subjects.

Repeated-measures ANOVA - A within-subjects statistical test where three or more observations of a continuous outcome are compared across time or within-subjects.

Residual analysis - A type of regression diagnostic that provides a measure of model fit. Residuals are the difference between the observed value and the predicted value yield from a multivariate model.

Response set - A part of a survey item where participants give ratings based on their reactions or perceptions to the item.

Retrospective - A type of research design where the outcome of interest occurred in the past.

S

Sample size – The absolute number, or n, of individuals selected for participation in the study. Samples are derived from populations and inferential statistics allows us to make generalizations back to the population.

Scheffe's test - A post hoc test that is considered one of the strongest against deterring increased experimentwise error rates when adjusting for multiple comparisons.

Self-report - A survey methodology where respondents give their perceptions or beliefs related to a construct.

Sensitivity - The ability of a diagnostic test to detect disease states. When a diagnostic test has a sensitivity greater than 95%, the test can "rule out" disease states.

Simple random sampling - A probability sampling technique where every member of the population has an equal chance of being chosen for participation in a study.

Simultaneous regression - A multivariate model where all relevant predictor, demographic, control, and confounding variables are entered concurrently into the model to assess multivariate associations.

Skewness - A descriptive statistic that tests for normality of a continuous distribution.

Spearman's rho - A statistical test of magnitude and direction of association between two ordinal variables.

Specificity - The ability of a diagnostic test to identify healthy people. When a diagnostic test has a specificity greater than 95%, the test can "rule in" disease states.

Split-group - A method for validating statistical findings. The sample is randomly divided in half and psychometrics are run on the respective halves (the derivation sample and the confirmatory sample). The groups can be randomly split into percentages of 50/50, 60/40, or 70/30. If the findings on each sample are similar, then evidence of split-group validity is assumed.

Split-half reliability - An internal consistency measure of reliability where two independent halves of a survey instrument are significantly correlated.

Standard deviation – A descriptive statistic that gives a relative context for understanding how far away individual observations are away from the mean in a continuous distribution.

Stepwise regression - A multivariate model where an algorithm chooses the best combination of predictor, demographic, control, and confounding variables to predict for an outcome.

Stratified random sampling - A probability sampling technique where defined subgroups or strata of a population can be targeted for representation in a random sample.

Survey - A cross-sectional methodology where participants are asked to respond to questions regarding a construct of interest.

Survey modes of administration - The methods by which participants are given a survey instrument. There are five primary modes: One-on-one, group, telephone, postal mail, and electronic.

Survey parts - The structure of survey as it is presented to participants. The six parts of a survey are the title, introduction, instructions, survey items, demographics, and closing statement.

Survey types - The type of survey used is dependent upon the research question being asked. There are six types of surveys: Test, rating scale, performance, checklist, psychological instrument, and inventory.

Systematic review - The highest level of applied clinical evidence. The results of multiple randomized controlled trials are aggregated according to rigorous inclusion and exclusion criteria and interpreted as a pooled effect. Meta-analysis is the statistical technique used with systematic reviews. The results of a meta-analysis are reported in a forest plot. Heterogeneity of the effect is reported with the I2 statistic. A funnel plot is used to assess publication bias in the respective randomized controlled trials.

T

Test-retest reliability - A type of reliability evidence that assesses the consistency and stability of a survey instrument outcome across time.

Tukey's HSD test - A post hoc test that is popular in the social sciences when adjusting for multiple comparisons and increased experimentwise error rates. HSD stands for Honestly Significant Difference.

Two-sided hypothesis - A type of hypothesis that does not stipulate if an effect will be either positive or increasing nor negative or decreasing. These types of hypotheses are required in most scientific journals because they are more rigorous.

Type I error - An error that occurs in hypothesis testing when the null hypothesis is rejected when it should not be rejected (false positive). The alpha value or significance value chosen in a study is the chance that a researcher is willing to take for committing a Type I error and is most often times set at .05.

Type II error - An error that occurs in hypothesis testing when the null hypothesis is not rejected when is should be rejected (false negative. The beta value represents the chance that a researchers is willing to take for committing a Type II error. Type II errors typically occur when there are not observations of an outcome.

Type III error - An error that occurs in hypothesis testing when the wrong question is answered by a statistical test. Type III errors often occur due to miscommunication between statisticians and researchers and also when mining data and not adjusting for multiple comparisons.

U

Unequal allocation randomization - A method of randomization where participants are randomly allocated to treatment groups in an uneven fashion. This method of randomization is used when the intervention itself is of particular interest. Ratios of allocation of 2:1, 3:1, and 4:1 to treatment groups are acceptable. Unequal allocation randomization is used to study more serious disease states that require stronger treatments. The side effects of promising treatments can also be studied using unequal allocation.

V

Validity - The utility, interpretability, and accuracy of outcomes and measurement.

AppNotch team will notify you when your app gets approved in Google Play and Apple iTunes App Stores. Once your app goes live,
enter your App Store URLs in this property window, publish your Weebly site and your app store icons will be visible in this page.