Statistics

This website is designed to provide a guide to the fundamentals of using statistics to describe and analyze your data. The Foundational Material below give an overview of the essentials: topics that everyone doing any kind of statistical analysis should be familiar with. Click on the titles to expand the boxes and view the material. The topics are arranged in a logical order for learning, so the material in each section assumes that you are already familiar with the material in earlier sections. More specifically:

The Descriptive Statistics and Hypothesis Testing topics form a pair.

The Normal Distribution, Non-normal Distributions and Nonparametric Statistics topics form a trio, and assume that you have already read about Hypothesis Testing.

The Which Statistical Test topic assumes that you have read the previous three topics.

To learn about specific statistical tests, expand the Statistics link to the right and then expand the Types of Analysis link. There you will find a list of the most common types of statistical analysis for hypothesis testing, linked to individual pages for each type of analysis.

Throughout the website, Microsoft Excel files are used to provide sample data files for you to analyze. All analyses are performed in JMP.

INTRODUCTION

Descriptive statistics are used to summarize the features of a sample of data in a concise way. The best type of descriptive statistic to use depends on three things:

1. Whether the variable being described is categorical or numeric. Categorical variables are those in which the data are descriptions and can be assigned to a small number of discrete categories. Sometimes there are just two categories (male vs. female; dead vs. alive), sometimes several (British, French, or German; red, yellow, blue, or green). Numeric variables are those in which the data are numbers and are of two main types. Continuous numeric variables are those that (if measured precisely enough) can take an infinite number of values. In contrast, discrete numeric variables are those that can take a limited number of values (usually integers). Blood cholesterol concentration is an example of a continuous variable; red blood cell count is an example of a discrete variable. (A third type of numeric variable, not dealt with here, is an ordinal variable, where the numbers are ranks rather than measurements.)

2. If the data are numeric, and so can be plotted as a histogram (a frequency distribution), whether the distribution is symmetrical or asymmetrical about the mean.

3. Whether you want to describe a data sample or describe the degree of uncertainty in your estimates. What does “degree of uncertainty in your estimates” mean? A crucial thing to recognize about data samples is that they are exactly what they say – samples (i.e., a subset of all the data that you might have gathered if you had sufficient time, money, and other resources). In statistical terminology, the entire possible set of data that you might have gathered is called the population, and under most circumstances you are trying to obtain data samples that are unbiased representations of that population. In other words, you want to be able to make reasonable claims about a population from the sample you have collected. The sample itself is of no special interest because samples will differ from one another (often quite substantially if your sample sizes are small) by chance alone. Thus, when you calculate, say, the mean value of a sample, you are hoping that this will be an accurate estimate of the mean value of the entire population. Uncertainty refers to how accurate your estimate of the population mean is, given the sample you have collected.

DESCRIBING A DATA SAMPLE FOR CATEGORICAL DATA

Categorical data are best described using proportions or percentages. For example, if you had a sample of Drosophila fruit flies and were able to categorize individuals into three genotypes at a particular genetic locus, you would then be able to calculate the proportion of individuals of each genotype in the sample. Proportions allow you to compare samples of different size in a way that raw numbers would not, as the following example shows. The total number of flies in sample B is smaller, so it is not surprising that the number of individuals of each genotype is also smaller. The proportions, on the other hand, are almost identical. Note that if you include all data categories proportions sum to 1, a useful check that you have done your calculations correctly. Note also that, for precision, proportions are normally reported to at least two, and usually three, decimal places.

The most complete way to describe a sample of numeric data is to simply provide a list of all the individual data points – the raw data. However, raw data are usually of little interest to other researchers and would rarely, if ever, be published in a paper. A better approach is to put the data into categories (sometimes called “bins”) to create a frequency distribution, and then display that frequency distribution in the form of a histogram. Examples are shown below, with a symmetrical, bell shaped, normal distribution on the left and an asymmetrical, or “skewed” distribution on the right. When skewed distributions have most of the data on the left and a long tail to the right they are called “right skewed”; when they are the other way around they are called “left skewed”.

Sometimes it is valuable to see the entire frequency distribution in a published paper, but usually the data are summarized in an even more concise way by providing information on the location of the distribution and spread of the data. The location is an indicator of where most of the data can be found; i.e., what a typical, individual data point is like. The spread is an indicator of how much variation there is among individual data points. The formal statistical term for spread is dispersion: you will see this term in the output of some statistical analyses in JMP.

The two most commonly used indicators of location are:

the arithmetic mean (the sum of all the values in your data sample divided by the sample size)

the median (the middle value when the data are ranked from smallest to largest).

The two most commonly used indicators of dispersion are:

The standard deviation. If you describe the location of a distribution using the mean, you would normally describe the dispersion using the standard deviation. The formula for calculating the standard deviation need not concern us here, but it is worth remembering that if a frequency distribution is normal (bell-shaped and symmetrical) about two-thirds of the data (68%) fall within one standard deviation either side of the mean and about 95% of the data fall within two standard deviations. A related measure of dispersion is the variance, which is simply the square of the standard deviation. Another related measure is the coefficient of variation (CV), which is the standard deviation divided by the mean, expressed as a percentage. The coefficient of variation is a useful measure of comparison when means differ greatly (because the standard deviation typically increases as the mean increases) or when the units of measurement of two variables are different.

The interquartile range. If you describe the location of a distribution using the median, you would normally describe the dispersion using the interquartile range. Quartiles are values that divide a data set into quarters. The first quartile (also called the 25thpercentile) indicates the bottom 25% of the data, the second quartile (the 50thpercentile) is the median, and the third quartile (the 75thpercentile) indicates the bottom 75% of the data (or, alternatively, the top 25%). The interquartile range lies between the first quartile and the third quartile; i.e., it encompasses the middle half of the data. Related to quartiles (and with confusingly similar names) are percentiles and quantiles, which are a finer way of dividing up a frequency distribution. They are normally reported in 10% intervals (10%, 20%, 30%, etc.). The 10thpercentile (the 0.1 quantile) indicates that 10% of the data are equal to or below that value, and so on.

If you can summarize a frequency distribution using the mean and standard deviation or using the median and interquartile range which should you use? Typically, if the data have an approximately normal distribution (symmetrical, and bell shaped, with little or no skew) the mean and standard deviation are reported, whereas if the data distribution is skewed, the median and interquartile range are reported. Note that if the distribution is symmetrical (whether bell shaped or not) the mean and median will have exactly the same (or almost exactly the same) value.

DESCRIBING THE UNCERTAINTY IN YOUR ESTIMATES

The most useful and most commonly reported measure of uncertainty is the uncertainty in your estimate of a population mean. If you have collected just one sample, your estimate of the population mean is simply your sample mean, so whenever you report a sample mean you should always report the degree of uncertainty. The two most commonly reported measures of uncertainty are:

The standard error (SE). More fully, this is called the standard error of the mean (SEM). It can be calculated as the sample mean divided by the square root of the sample size.

The confidence interval (CI). The confidence interval is a range of values above and below the sample mean that is likely to contain the true population mean. Typically, the 95% confidence interval is reported, although you may also see the 90% or 99% interval. Confidence intervals are harder to calculate by hand than the standard error but are usually reported in the output of computer programs like JMP. An approximate rule of thumb is that the 95% confidence interval spans a range that is two standard errors above and below the sample mean. For example, if the sample mean is 1.50 and the standard error is 0.30, the 95% confidence range is 0.90 - 2.10. We can then say that “we are 95% confident that the true population mean lies between 0.90 and 2.10”. Note that, although it sounds like a very similar statement, we should notsay that “there is a 95% probabilitythat the true population mean lies between 0.90 and 2.10”. In statistical terminology, “confidence” and “probability” are not the same thing.

First notice that in the Columns list on the left Photosynthetic Rate and Water Use Efficiency have blue triangles next to them, indicating that they are continuous numeric variables.

From the Analyze menu select Distribution and click or drag both variables into the Y, Columns box, then hit OK.

The new window that appears shows two histograms and a set of summary tables. Click on the red arrow next to Distributions and select Stack to turn the window horizontally.

To create the histograms, JMP groups the data into a relatively small number of categories or “bins” (normally about 10). To get a more detailed breakdown of the data you can manually change the number of bins by clicking on the red arrow next to each variable name and selecting Histogram Options – Set Bin Width. Change the bin width for Photosynthetic Rate to 1 and for Water Use Efficiency to 0.025. The window will now look like this:

First notice that Water Use Efficiency has a more-or-less symmetrical, bell shaped distribution that is approximately normal, whereas Photosynthetic Rate is right skewed.

Above each histogram is an Outlier Box Plot, which indicates the following:

The left and right edges of the horizontal rectangle indicate the interquartile range.

The vertical bar inside the rectangle indicates the median.

The horizontal lines to the left and right of the interquartile range are called whiskers. The ends of these lines, marked by vertical bars, indicate the range of non-outlier data, defined as 1.5x the interquartile range beyond each of the two quartile values (i.e., beyond the edges of the horizontal rectangle). For Photosynthetic Rate the left whisker is actually shorter than this because there are no data that far out.

The diamond inside the rectangle indicate the sample mean(at the horizontal center of the diamond, joining the vertical points) and the 95% confidence range (the horizontal points of the diamond). Notice that the mean and median coincide for the symmetrical distribution (Water Use Efficiency) but not for the skewed distribution (Photosynthetic Rate).

The horizontal red bracket indicates the shortest half of the data; i.e. the narrowest range of values that encompasses 50% of the individual data points. Again, notice that this is centered almost exactly on the mean for the symmetrical distribution but is to the left of the mean (and median) for the skewed distribution.

Any data points beyond the whiskers are considered to be outliers and are shown individually as black dots. Outliers are sometimes removed from the data set before further analysis, but only if there is a good reason to do so (for example, if you strongly suspect that a data value is an error, or if it is well beyond the edge of the rest of the distribution and so is extremely atypical). You should notremove outliers simply to make the data look neater or more symmetrical. You should also neverremove outliers because you get a statistically significant result when they are removed but not when they are left in, unless you show the outliers in your data and report both statistical analyses to the reader.

To the right of each histogram is a table of quantile values, including the median and the 25% and 75% quartiles. The latter three values are typically reported if the data distribution is strongly skewed.

To the right of each quantile table is a set of summary statistics appropriate for describing a data distribution that is approximately normal, along with the overall sample size (N). The most common statistics to report are either the sample mean and the standard error or the sample mean and 95% confidence interval.

INTRODUCTION

Statistical tests such as chi-squared tests, t-tests, regression, and analysis of variance were developed to enable scientists to test hypotheses about their data. They are called inferential statistics because they allow us to make inferences about entire populations from the data samples we have collected. This page describes the logic of statistical hypothesis testing, but also provides a brief guide to other, equally important, aspects of a statistical analysis. Intelligent interpretation of your data goes further than simply accepting or rejecting a hypothesis.

We will illustrate the principles of hypothesis testing using a fictitious clinical study investigating whether a new pharmaceutical has the ability to lower blood cholesterol concentration (BCC). In this study of 40 patients, 20 were randomly assigned to be given the pharmaceutical and the other 20 were given a placebo. (In a real clinical study sample sizes would hopefully be larger than this.) At the end of the study, BCC in mg/dL was measured in all 40 patients. The raw data and the mean value for each group are shown below.

‌

‌

As you can see, the BCC of the patients receiving the pharmaceutical was, on average, 5 mg/dL lower than that of patients receiving the placebo. Does this mean that the pharmaceutical worked?

The first, and crucial thing to recognize about these data is that they are samples from a much larger set of patients who might have been in this clinical trial. The important question is not whether these two samples differ from one another, but whether the pharmaceutical would lower BCC in the entire population of possible patients. In other words, we would like to be able to make accurate inferences about the overall effect of the pharmaceutical on the entire population from our knowledge of the samples we have studied. This is difficult, because samples will vary from one another by chance alone, and it’s possible that the small difference observed between the two samples above is due to chance, rather than to a real effect of the pharmaceutical. Statistical hypothesis testing involves trying to distinguish between these two possibilities.

In the example above we are trying to distinguish between two hypotheses:

The null hypothesis is that the pharmaceutical has no effect on BCC. More precisely, the mean BCC of the entire population of possible patients receiving the pharmaceutical is the same as the mean BCC of the entire population of possible patients receiving the placebo, and the observed difference between our two samples is simply due to chance.

The alternative hypothesis is that the pharmaceutical does have an effect on BCC, and that the difference in mean BCC in our two samples reflects a real difference in mean BCC of the two populations.

Statistical hypothesis testing involves testing the null hypothesis. In the example above, if our analysis indicates that the difference between the pharmaceutical and placebo is probably due to chance, we have failed to reject the null hypothesis. Can we accept the null hypothesis? Statisticians don’t like using the word “accept” because it sounds too certain. Nonetheless, “accept” is frequently used instead of “failed to reject” by many people, and it’s not too unreasonable to do so as long as you recognize that it is a casual, shorthand term.

In contrast, if the analysis indicates that the difference between the pharmaceutical and placebo is probably not due to chance, we can provisionally reject the null hypothesis and provisionally accept the alternative hypothesis that the pharmaceutical does affect BCC. “Provisionally” is an important qualifier here because future research (a different study with a larger sample of patients for example, or one that better controlled for confounding variables) might force us to reconsider our conclusions.

TYPE I AND TYPE II ERRORS, STATISTICAL POWER, AND SAMPLE SIZES

As the above explanation of null vs. alternative hypotheses indicates, we can never be 100% certain that the null hypothesis is correct and the alternative hypothesis is false (or vice versa): we can only assign probabilities to our conclusions, and there will always be some uncertainty about whether these conclusions are correct.

Our conclusions could be incorrect in one of two ways. In statistical terminology, these are called Type I and Type II errors:

A Type I error is the rejection of the null hypothesis when in fact it is true. If you reject the null hypothesis and accept the alternative hypothesis under these circumstances, your result is a false positive. In the example above, your data would suggest that the pharmaceutical actually did reduce BCC when in fact it did not.

A Type II error is the failure to reject the null hypothesis when in fact it really is false. Under these circumstances your result is a false negative. In the example above, your data would suggest that the pharmaceutical did not reduce BCC when in fact it did.

How do we minimize the chances of making one of these errors? Type I error rates are determined by the significance level, α, that we choose for our statistical test. Normally ais set at 0.05. With this α, if the null hypothesis were true, we would incorrectly reject it (commit at Type I error) 5% of the time (five times in 100 or 1 time in 20). Setting the α at 0.05 is the reason why statistical tests are said to be significant if the p-value is less than 0.05. If p < 0.05 there is a relatively small probability (less than 1 in 20) of getting the results we did by chance alone, so we can be reasonably confident that the observed effect is real, rather than a false positive.

To lower the Type I error rate, we could simply reduce α to 0.01 (an error rate of 1 in 100) or 0.001 (an error rate of 1 in 1000). In some ways this is a good thing to do: in the above example it might avoid the possibility of treating patients with a pharmaceutical that didn’t work but that might have harmful side effects. Unfortunately, Type I and Type II error rates trade off against one another: reducing the Type I error rate automatically increases the Type II error rate. This might be a bad thing to do. In the above example it might result in the failure to authorize a pharmaceutical that had beneficial effects on blood cholesterol concentration. Thus, statistical hypothesis testing always involves trying to strike the right balance between the consequences of committing a Type I vs. a Type II error. By convention, statisticians have decided that the right balance is to set the Type I error rate to 0.05.

Statistical power refers to the ability of a test to avoid committing a Type II error. Tests that have a low probability of committing a Type II error (i.e. a high probability of correctly rejecting a false null hypothesis) are said to have high power. All other things being equal, power can be increased by increasing sample size. However,the increase in power with sample size is not linear: doubling your sample size from 20 to 40 is likely to substantially increase your power, but doubling it again from 40 to 80 is likely to have a much smaller benefit. It might also be possible to increase power by controlling for confounding variables or by reducing the amount of experimental errorin your measurements. However, if the data you are taking has lots of natural variability(i.e., variability unrelated to measurement error) your power may still be low.

ONE-SIDED VS. TWO-SIDED STATISTICAL TESTS

Computer programs normally provide three p-values for many statistical tests. For example, for a t-test, which compares sample means like the clinical study described above, JMP indicates these as Prob > |t|, Prob > t, and Prob < t.

Prob > |t| is for a two-tailed test, which assesses whether the second mean is either significantly greater than orsignificantly less than the first mean. Under most circumstances, you should report this p-value because you normally have noa priorireason to exclude one of these possibilities, so both must be considered. In the example above, the pharmaceutical might actually increase BCC rather than reducing it, and your statistical test needs to consider this possibility as well as the (obviously desirable) alternative.

Prob > t is for one of the two possible one-tailed tests, in this case assessing only whether the second mean is significantly greater than the first mean. You would rarely be justified in reporting this type of test, because you would rarely be able to argue that there was absolutely no possibility of getting the other outcome (the second mean being significantly smaller than the first).

Prob < t is for the other possible one-tailed test, in this case assessing onlywhether the second mean is significantly less than the first mean. Again, you would rarely be justified in reporting this type of test.

To summarize, you will almost never be in the position of knowing with certainty that only one of two possible outcomes of a statistical comparison between two values is possible. Therefore, you should almost always report the p-value associated with a two-tailed statistical test.

BEYOND P-VALUES: OTHER IMPORTANT INFORMATION IN A STATISTICAL TEST

The p-value of a statistical test is an indicator of how confident you can be that the patterns in your data samples are real (i.e., that they accurately patterns in the populations they came from) and not just due to chance. However, other information about your data is just as important and should be studied along with the p-value:

The sample size. Sample size is important for four related reasons. First, the larger your sample, the more accurately it will reflect the population being sampled. Second, the uncertainty in your estimate of a population mean (or the slope of a regression line) decreases as sample size increases. (See Foundational Material: Descriptive Statistics for details.) Third, when sample sizes are small, statistical tests have low power, so you are more likely to commit a Type II error (i.e., fail to identify a real effect). Fourth, when sample sizes are large, you are better able to identify real, but small effects. In general, this is a good thing, but very small effects, even if statistically significant, might not be biologically meaningful. Thus, a second important piece of information is the effect size.

Effect size can refer to one of three things, depending on the type of analysis you have performed.

If you are studying categorical variables, it refers to a difference in proportions. For example, if one sample consisted of 10% red flowers and 90% white flowers and a second consisted of 50% red flowers and 50% white flowers, the effect size would be the difference in the proportion of red (or white) flowers between the two samples. In this instance, the effect size is large: the second sample contains a much larger proportion of red flowers than the first.

If you are studying numeric variables and comparing sample means, the effect size is the difference between two means. In clinical studies it is often expressed as a percentage difference between the control mean and the treatment mean. For example, in the fictitious study above, patients receiving the pharmaceutical had, on average, a BCC that was 5 mg/dL lower than patients receiving the placebo. Thus, in percentage terms the effect size was 100% x 5/204 = 2.5%. If the reduction had been 30 mg/dL the effect size would have been 14.7%; if the reduction had been 0.2 mg/dL the effect size would have been 0.1%. Studies with very large sample sizes might be able to detect an effect size of 0.1% and show it to be statistically significant. Whether it would be biologically or clinically meaningful is another matter: such a small effect might not be worth the financial cost of a prescription or the risk of harmful side effects.

If you are fitting a linear regression to your data, the effect size is the steepness of the slope (i.e., how much the Y variable changes per unit change in the X variable). All other things being equal, steeper slopes indicate larger effect sizes, but since the steepness of a slope depends on the units of measurement of the X and Y variables, care must be taken when comparing regression lines fitted to different types of data.

The degree of confidence in your estimates. As described above, sample means are estimates of population means, and the degree of confidence in these estimates is usually indicated by their standard errors or their 95% confidence intervals (see Foundational Material: Descriptive Statistics for details). Standard errors and confidence intervals are also usually provided with estimates of the slopes and intercepts of regression lines. Even if effect sizes are large, they may not be statistically significant if the degree of confidence in these estimates low (i.e., if standard errors or confidence intervals are large). As a rule of thumb, the difference between two sample means will usually be statistically significant if their standard errors do not overlap and will not be statistically significant if they do. The three figures below illustrate this in graphical terms. The left and center figures have the same effect size, but in the center figure the effect is not statistically significant, because the large (and overlapping) standard errors indicate a high degree of uncertainty in our estimates of the population means. The right figure has a much smaller effect size than the center figure, but the effect is statistically significant because error bars are much smaller (indicating a low degree of uncertainty in our estimates of the population means) and don’t overlap.

‌

The amount of explained variation. Some types of statistical analysis (including linear regression and analysis of variance) attempt to describe the data by fitting a mathematical equation to it. (In statistical terms this is called model fitting, which is why one of the main options in the Analyze menu of JMP is called Fit Model.) For example, in linear regression, the model fitted to the data is simply the equation for a straight line (Y = a + bX), where a is the intercept and b is the slope (the parameters of the equation). The best model (best straight-line equation) is the one that explains the largest amount of variation in the Y variable, as measured by how close the individual data points are to the fitted line.Among other things, the statistical output of a linear regression analysis provides an R2 value, which is the proportion of the total variation in the Y variable explained by the model. For example, an R2 value of 0.62 means that 62% of the variation in the Y variable has been explained by variation in the X variable. The R2 value is useful because it tells you (a) how well you understand the causes of variation in your Y variable and (b) how accurately you can predict the value of a Y variable from knowing the value of the X variable: the larger the R2 value, the greater the accuracy of this prediction. As with effect sizes, there is no necessary relationship between the R2value and the p-value: studies with very large sample sizes may have very small p-values, but if the data are highly variable, the R2 value may still be very low. Similar considerations apply to the various forms of analysis of variance, although many ANOVA models have two or more X variables rather than just one.

ADVANCED TOPIC: TEST STATISTICS, NULL DISTRIBUTIONS, AND P-VALUES

What exactly is it about a statistical analysis that allows us to conclude (a) that the difference between sample means is probably due to chance and that the populations as a whole do not differ in their mean values vs. (b) that the difference is probably not due to chance and that the populations do differ?

The answer is that a statistical analysis involves a three-step process:

Calculation of a test statistic from the data.

Comparing the size of that test statistic with how large you would expect it to be if the null hypothesis were true (i.e., comparing your calculated test statistic to its null distribution).

Assigning a probability (a p-value) to that test statistic. A p-value is the probability of obtaining a test statistic at least as large as the one you got if the null hypothesis is true.

A detailed explanation of this process is beyond the scope of this website: you will need to take a formal statistics class for that. However, a brief, intuitive explanation of each step may be helpful.

Calculation of the test statistic.

In the example above we can see that the difference between two sample means is 5 mg/dL. The most common statistical test that compare two sample means is the two-sample t-test, which is described in detail elsewhere on this website. A t-test calculates a test statistic called (perhaps not surprisingly) a t-value, which is a measure of the difference between the two samples that takes into account not just the mean values, but also the amount of variation within each of the samples and the sample size. Other statistical tests use different test statistics. For example, a contingency table analysis uses the chi-squared value as its test statistic, and analysis of variance uses something called the F-ratio.

Comparing the test statistic to the null distribution

Suppose the null hypothesis is true; i.e., the mean BCC of the population taking the pharmaceutical is the same asthe mean BCC of the population taking the placebo. As with all populations, there will be variation in individual BCC values from patient to patient in each of the two groups. Suppose that the distribution of BCC values in each population is approximately symmetrical and bell shaped (i.e., has a normal distribution) as shown below.

Now imagine, as a thought experiment, that you take a sample from the pharmaceutical population and a sample from the placebo population, calculate the t-value, and then repeat this sampling procedure many times, calculating the t-value each time. If there is no difference between sample means the t-value will be zero, and as the difference between sample means gets larger, so does the t-value. At the end of this experiment you will have a large number of t-values, whose distribution you can plot as a histogram representing a probability distribution. You will find that this t-distribution has a symmetrical bell shape centered on a mean of zero, approximately as shown below. It looks very similar to a normal distribution but in fact is subtly different for reasons that need not concern us here. (Note that this figure is just a cartoon: it should not be used to obtain p-values for an actual statistical test.)

What this histogram shows is that, if the null hypothesis is true, there is a high probability that a t-value calculated from two samples will have a value close to zero (because there is a high probability that the two sample means will be very similar). Conversely, there is a low probability that the t-value will be very large or very small (because there is a low probability that the two sample means will be very different).

Suppose that the t-value for your actual data was +0.73. A value of +0.73 lies close to the center of the distribution above and has a relatively high probability of occurring if the null hypothesis is true. Contrast that with a scenario where the t-value for your data was +4.95. A value of +4.95 lies close to the end of the right-hand tail of the distribution and has a low probability of occurring if the null hypothesis is true.

Assigning a p-value to the test statistic.

As you can see, we are now in a position to assign a probability (a p-value) to the t-value calculated from our two samples – a probability that indicates the chance of getting a t-value as large or larger than we did if the null hypothesis is true. If this probability is large (i.e., if the t-value is towards the center of the t-distribution), our data are consistent with the null hypothesis. If this probability is small (i.e., if thet-value is in the tail of the t-distribution), our data are inconsistent with the null hypothesis. By convention (and purely by convention, which is one reason why we should not be obsessed with p-values to the exclusion of other information in our data), statisticians have decided that if p ³0.05 we have failed to reject the null hypothesis whereas if p < 0.05 we can provisionally reject the null hypothesis and accept the alternative hypothesis.

INTRODUCTION

A detailed explanation of the properties of the normal distribution is beyond the scope of this web page. Rather, the goal here is to highlight several key features of the distribution that are important to understand when performing and interpreting a statistical analysis of your data.

Four basic properties of the normal distribution are:

It is a continuous probability distribution. In other words, it describes the probability of obtaining a particular value of a continuous numeric variable. Continuous numeric variables are those that (if measured precisely enough) can take an infinite number of values. In contrast, discrete numeric variables are those that can take a limited number of values (usually integers). Blood cholesterol concentration is an example of a continuous variable; red blood cell count is an example of a discrete variable.

It has a symmetrical bell shape. The fact that it is symmetrical indicates that the mean, the median, and the mode of the distribution all have the same value. This is not the case for non-symmetrical distributions. The bell shape indicates

that there is a single peak (mode) rather than two or more. Distributions with a single peak are called unimodal, those with two peaks are called bimodal.

that the probability of obtaining a particular value from the distribution is highest in the center (i.e., at the mean) and declines towards the tails.

It can be precisely defined by two parameters: its mean and its standard deviation. Since the mean can take any real value and the standard deviation can take any positive value, the normal distribution is actually an infinite set of distributions, each one defined by its particular mean and standard deviation.

Approximately 68% (i.e., about two-thirds) of the area under a normal distribution lies within one standard deviation of the mean, and approximately 95% of the area lies within two standard deviations of the mean.

RELEVANCE OF THE NORMAL DISTRIBUTION TO BIOLOGICAL DATA

For biologists, the normal distribution is important to know about for two main reasons:

Many numeric biological variables (both continuous and discrete) have a frequency distribution that is approximately normal. This means that they can be described and summarized using the same parameters (the mean and standard deviation) used to define a normal distribution.

Many of the most commonly used methods of statistical hypothesis testing assume that your dependent variables have a normal distribution: t-tests, linear regression, and analysis of variance all make this assumption. Correlation analysis makes the assumption that both variables have a normal distribution. Thus, it is important to be able to determine whether this assumption is true, and what to do if it is not.

DO YOUR DATA HAVE A NORMAL DISTRIBUTION?

There are three ways to determine whether your data have a normal distribution:

Create a histogram and inspect it visually to assess whether it is symmetrical and bell-shaped.

Create a normal quantile plot and inspect it visually to assess whether the data points fall on a straight line.

Perform a goodness-of-fit statistical test on the data.

All three methods can be performed easily in JMP. We will illustrate them using data on the photosynthetic physiology of oak seedlings planted as part of a habitat restoration project in the Twin Cities.

Before starting the analysis, change Plant from a continuous to a categorical variable. The numbers in the column identify individual oak seedlings and JMP interpreted these numbers to mean that Plant is continuous numeric variable. However, in reality it is categorical: the numbers are simply an identification code and could just as easily have been letters. Double click on the variable name box at the top of the column and in the pop-up window change Data Type to Character (the Modeling Type will automatically change to Nominal), then hit OK. The symbol next to Plant in the Columns list on the left will change from a blue triangle to a red histogram. Although we will not be using Plant as a variable in our analysis it is always a good idea to properly code all variables in a JMP file.

From the Analyze menu select Distribution and click or drag Water Use Efficiency into the Y, Columns box.

Click or drag Ecotype into the By box. Ecotypes are genetically different populations of a species that are adapted to their local environment. In this study there were two samples of oak seedlings: one grown from acorns collected in Iowa (the Des Moines ecotype), the other from acorns collected in Minnesota (the Twin Cities ecotype). When assessing whether your data have a normal distribution it is always a good idea to test each data sample separately rather than combining them, because if each sample has a normal distribution but the sample means differ substantially, the combined distribution will be bimodal, not bell-shaped. The By option in JMP performs a separate analysis for each category of data in the chosen variable (in this case the Des Moines and Twin Cities ecotypes of the Ecotype variable).

Hit OK

A new window will appear, containing two histograms and two tables of summary statistics. Click on the red arrow next to Distributions Ecotype=Des Moines and select Stack to turn both of them horizontal.

Click on the two red arrows next to Water Use Efficiency and select Normal Quantile Plot

Click again on the same two red arrows and select Continuous Fit – Normal, then click on the red arrows next to Fitted Normal and select Goodness of Fit. The window will now look like this (I have slightly compressed the Normal Quantile plots vertically to make the window fit onto one computer screen):

The figures on the left show histograms for each data sample on the bottom, box plots in the middle (we will ignore these for now), and normal quantile plots on the top.

Visually, the two histograms appear bell shaped, with the Des Moines data looking slightly asymmetrical and the Twin Cities data looking very close to perfectly symmetrical.

Normal quantile plots show the actual data on the X-axis and the quantile the data are expected to be in (if they have a normal distribution) on the Y-axis. (See Describing and Summarizing Data under Foundational Material for more information about quantiles.) If the data have a normal distribution, the data points will fall on a straight line. Visually, the quantile plot for the Des Moines data is more or less straight at the bottom but curves away slightly at the top (reflecting the slightly skewed shape of the histogram). The plot for the Twin Cities data is very close to a straight line.

The Quantiles and Summary Statistics tables in the middle of the window provide basic descriptive statistics for each data set, including the 95% confidence range for the mean in the second of these tables.

Under the Fitted Normal header the Parameter Estimates table provides estimates and 95% confidence ranges for the mean (μ) and standard deviation (σ) of the data. The Goodness-of-fit Test able provides a statistical test (the Shapiro-Wilk W test) of the goodness-of-fit of the data to a normal distribution. The p-values (Prob<W) for both data samples are > 0.05, indicating that we can accept the null hypothesis that the data do not differ significantly from a normal distribution.

Together the three approaches indicate that the variable Water Use Efficiency does have a normal distribution. However, if we repeat the analysis using the variable Photosynthetic Rate we will obtain the following results.

In contrast to Water Use Efficiency, for Photosynthetic Rate we see the following:

The two histograms are approximately bell-shaped (unimodal) but are not symmetrical: most of the data are to the left and there is a long tail to the right. The data are said to be skewed, specifically skewed right.

In the normal quantile plots the data do not fall on a straight line; they are strongly curved.

The two Shapiro-Wilk W tests are both significant (p < 0.05), which means we should reject the null hypothesis that the data have a normal distribution and accept the alternative hypothesis that the distributions are not normal.

What should we do under these circumstances? Three alternative options are described under Foundational Material: Non-normal Distributions and Data Transformations

This page follows on from the Foundational Material: The Normal Distribution page. It is recommended that you read that page before proceeding further.

If your dependent variables do not have a normal distribution, this violates one of the main assumptions of many of the most common methods of statistical hypothesis testing, including t-tests, correlation, regression, and analysis of variance. What should you do under such circumstances. There are three options, which are listed below and then discussed in more detail one at a time.

Proceed with the test.

Transform the data.

Perform a nonparametric test

1. Proceed with the test

This may seem like a strange option if the data do not meet one of the test’s basic assumptions. There are two (related) justifications for doing so. First, the Shapiro-Wilk test is quite sensitive to deviations from normality, especially when sample sizes are large. In other words, data with a distribution that is approximately normal often show a significant difference from normality when this test is used. Second, t-tests, correlation, regression, and analysis of variance, are all relatively robust to moderate deviations from normality, especially when sample sizes are large. In other words, the output from these tests is generally trustworthy even if the data do not have a normal distribution, as long as:

the deviations from normality are not too extreme

for tests comparing means (such as t-tests and ANOVA) the deviations from normality are similar in the groups being compared.

This does not mean that it is always safe to ignore deviations from normality. If deviations are large (e.g., extreme left or right skews or strongly bimodal distributions) or if one group of data is left skewed and the other is right skewed, statistical tests that assume normality are likely to give unreliable results.

2. Transform the data

Transforming the data involves creating a new variable from the original variable using a mathematical equation. If the new variable has a normal distribution, or a better approximation to a normal distribution than the original variable, you can perform the desired statistical test on the new (transformed) variable. The type of transformation used depends on the way in which the original variable deviated from a normal distribution. The main types of deviation are illustrated graphically below.

Bimodal distributions

Bimodal distributions have two peaks. It is not possible to transform a bimodal distribution into a normal distribution with a simple mathematical equation, so if your data have this shape the best option is to perform a nonparametric test. However, if the bimodality is weak (i.e., if the valley between the two peaks is shallow) and your sample size is large, it may be reasonable to proceed with the test that does assume normality.

Kurtosis

Kurtosis refers to the degree of curvature of a distribution. Leptokurtic distributions are more sharply curved than normal, with longer, thinner tails and a steeper peak. Platykurtic distributions are less sharply curved than normal with thicker tails and shallower peaks. As with bimodal distributions, it is possible to transform a leptokurtic or platykurtic distribution into a normal distribution with a simple mathematical equation. However, statistical tests that assume normality are relatively robust to moderate levels of kurtosis, especially when sample sizes are large, so it is usually reasonable to proceed with such tests. On the other hand, if the degree of kurtosis is strong, it may be better to perform a nonparametric test.

Skew

Skewed distributions are asymmetrical, so that the mean, median, and mode are typically all different from one another. Left skewed distributions have more of the data on the right-hand side and a long tail to the left. Right skewed distributions have the opposite shape.

It is always a good idea to try to transform skewed distributions so that the transformed variables are normal, or approximately normal. Fortunately, there are several easy options for doing so.

For right skewed data, the goal of the transformation is to bring in the values on the right-hand tail to make the distribution more symmetrical. This can often be done using a log transform

Y’ = log(Y)

or a square root transform

Y’ = square root (Y)

Where Y is the original variable and Y’ is the transformed variable. All data points in the sample are transformed, not just the ones on the tail. For the log transform either the natural logarithm (ln) or the base 10 logarithm can be used. The transformation can either be done in Excel prior to importing the data into JMP or in JMP itself. In Excel, simply create a new column of data using the formula =log(A1) or =sqrt(A1), where A1 refers to the cell containing the original data point. In JMP a transformed variable can be created as follows:

In the Cols menu select New Columns…

In the pop-up window give the new column a name in the Column Name box, select Numeric from the Data Type menu, Continuous from the Modeling Type menu, choose where you want the new column to appear in the menu below the Number of columns to add box, then select Formula from the Column Properties menu.

In the second pop-up window, click on the variable you want to transform from the Columns box, then create the mathematical transformation you want using the symbols at the top and the list of functions on the left. Log functions are listed under Transcendental.

Hit OK in the second window and when it disappears hit OK in the first window. The transformed variable should now be visible in your JMP data file.

Which is better, a log transform or a square-root transform? It depends entirely on your data set. It is perfectly legitimate to try both and then examine the two transformed variables to see which one best approximates a normal distribution. However, it is not legitimate to perform your statistical test on both transformed variables and then choose the transformation that gives you a significant result. Your choice of transformation should be based solely on which gives the better fit to a normal distribution prior to carrying out any statistical tests.

For left skewed data, the goal of the transformation is to bring in the values on the left-hand tail to make the distribution more symmetrical. Three options can be tried:

the square transform: Y’ = Y2

the antilog transform: Y’ = eYor Y’ = 10Y

flipping the left skew to a right skew and then trying a log or square root transform

For the last of these options, flipping the skew can be accomplished by choosing a constant slightly larger than the largest value in your sample and subtracting all sample values from this constant. For example, if your data ranged from 0.5 to 14.5 you might subtract every value from a constant of 15.

3. Perform a nonparametric test

Nonparametric tests are so called because they can be performed on data whose distributions are not defined by a known set of parameters. (In contrast, the normal distribution is precisely defined by an equation that incorporates two parameters – the mean and standard deviation.) However, this does not mean that they are reliable whatever your data look like: nonparametric tests do make some assumptions about data distributions, and if your data violate these assumptions nonparametric tests can be just as unreliable as parametric tests. The most commonly used nonparametric tests are described on the page titled Foundational Material: Nonparametric Statistics

INTRODUCTION

This page follows on from the Topic: Dealing With Non-Normal Distributions page. It is recommended that you read that page before proceeding further.

If your data deviate substantially from a normal distribution, and if they cannot be transformed to approximate normality, you may want to perform a nonparametric test. However, it is important to recognize that such tests also make assumptions about your data and, just as with parametric tests, violation of these assumptions may make the results of your analyses unreliable.

Nonparametric tests are usually based on ranks, rather than your actual data values. The data are ordered from the smallest value to the largest and then given rank values of 1, 2, 3, etc. This approach is particularly useful when you have skewed data or outliers that need to be retained in the analysis, because extreme values on a continuous scale simply become the next ranked value up on a ranked scale. The logic behind nonparametric tests is that two samples of data overlap substantially, the rank values of the two samples will be about the same. In contrast, if the two samples overlap very little the ranks of one sample will be much lower than the ranks of the other. An example of the former is show below in the table on the left; an example of the latter is shown on the right.

A NONPARAMETRIC ALTERNATIVE TO THE ONE-SAMPLE T-TEST

The Wilcoxon signed-rank test is an alternative to the one-sample t-test: it assesses whether the median value of a data sample is significantly different from a specified value. It assumes that the data have a symmetrical distribution, so just like at-test it may be unreliable if your data are skewed. An alternative test, the Sign test, that does not assume a symmetrical distribution is available in some statistical packages, but not JMP. Unfortunately, because of its calculation method, the Sign test has very little statistical power compared to a one-sample t-test; i.e., it frequently fails to reject the null hypothesis when in fact that hypothesis is false.

The following example assumes you are already familiar with the one-sample t-test. If you have not already read the page on this topic it is recommended that you do so before proceeding further. We will download and use the same data file as was used to perform a one-sample t-test: Walleye energy.xlsx

The analysis can be performed in JMP as follows:

From the Analyze menu, select Distribution, and in the window that appears drag or click the Proportion open water energy variable into the Y, Columns box, then hit OK.

A new window showing the data in histogram form will appear. If you want, you can turn this histogram sideways by clicking on the red arrow beside Distributions and selecting Stack.

As well as the histogram, the window also provides summary statistics on the distribution of the data in the Quantiles and Summary Statistics tables.

Click on the red arrow next to Proportion open-water energy and select Test Mean. A new window will appear where you can enter your expected mean value in the Specify Hypothesized Mean box. In this case the expectation is 0.5 (half the diet is expected to come from open-water feeding). This expectation constitutes the null hypothesis for your statistical test. Also click on the Wilcoxon Signed Rank box.

Below the Test Mean header and to the right of the t-test information is the test statistic and three p-values for the Signed-Rank test. As with the one-sample t-test, we would normally focus on the first of these p-values (Prob > |t|), which is a two -tailed test and assesses whether the observed median is either significantly greater than or significantly less than the expected value of 0.5. In this case, just as with the one-sample t-test, the p-value for the nonparametric test is < 0.05, allowing us to reject the null hypothesis of no difference and accept the alternative hypothesis that the true median proportion for the population is different from 0.5. The test statistic for a Wilcoxon signed-rank test has the symbol S (equivalent to the t-value for a t-test). In a paper, you would report the results of this test as follows: “The proportion of the diet obtained from open water feeding was significantly different from 0.5 (Wilcoxon signed-rank test: S = -52.50, p = 0.0001).”

A NONPARAMETRIC ALTERNATIVE TO THE TWO-SAMPLE T-TEST

The Wilcoxon rank-sum test is an alternative to the two-sample t-test; it gives equivalent results to another alternative - the Mann-Whitney U-test - which is not available in JMP. It assumes that the distributions of the two samples have the same shape, so results may be unreliable if, for example, one sample has a right skew and the other has a left skew.

The following example assumes you are already familiar with the two-sample t-test. If you have not already read the page on this topic it is recommended that you do so before proceeding further. We will use the same data file as was used to perform a two-sample t-test: Fish Creek Seedlings 2015.xlsx

The analysis can be performed in JMP as follows:

Import the data file into JMP and check that the Ecotype variable is categorical (indicated by a red histogram) and the Max Height variable is continuous (indicated by a blue triangle) in the column list on the left.

From the Analyze menu select Fit Y by X and click or drag Ecotype into the X, Factor box and Max Height into the Y, Response box.

Hit OK. A new window will appear with all the data points in two vertical columns, one for each ecotype.

Click on the red arrow next to Oneway Analysis… and select Display Options – Histograms to show the distributions of the two samples.

Click on the same red arrow and select Nonparametric – Wilcoxon Test. The window will now look like this:

The important pieces of information are Sand its associated p-value (Prob>|Z| < 0.0001) in the 2-Sample Testtable. Sis the test statistic for the Wilcoxon test (equivalent to the t-value in a t-test); Z is an alternative test-statistic that can be reported instead of S if sample sizes (n) are moderate or large. It is a transformation of S that has an approximately normal distribution if n > 10 for each of the two samples.

In a paper you would report the results of this test as follows: “Local and southern seedlings differed significantly in maximum height (Wilcoxon rank-sum test: S= 19349, p < 0.0001).

A NONPARAMETRIC ALTERNATIVE TO THE PAIRED T-TEST

The Wilcoxon Signed-Rank test described above is an alternative to the paired t-test. . It assumes that the data have a symmetrical distribution, so just like a t-test it may be unreliable if your data are skewed.

The following example assumes you are already familiar with the paired t-test. If you have not already read the page on this topic it is recommended that you do so before proceeding further. We will use the same data file as was used to perform a paired t-test: Fish Creek roots 2014.xlsx

The analysis can be performed in JMP as follows:

Import the data into JMP and check that both columns of root biomass data have been coded as continuous variables, as indicated by the blue triangles in the column list to the left.

From the Analyze menu, select Specialized Modeling- Matched Pairs. In the window that appears, click or drag both Root Biomass variables into the Y, Paired Response box, then hit OK.

In the new window that appears, click on the red arrow beside Matched Pairs and select Wilcoxon Signed Rank. The window will now look like this:

The important pieces of information are the test statistic Sand its associated p-value for a two tailed test (Prob>|S| < 0.0001) in the Wilcoxon Signed Ranktable. Sis equivalent to the t-value in a t-test. In a paper you would report the results of this test as follows: “Root biomass in the soil samples was significantly greater at 0-7 cm depth than at 7-14 cm depth (Wilcoxon signed-rank test: S= -370.5, p < 0.0001).

A NONPARAMETRIC ALTERNATIVE TO ONE-FACTOR ANOVA

The Kruskal-Wallis test is a nonparametric alternative to one-factor ANOVA. It assumes that the distributions of all groups have the same shape, so results may be unreliable if, for example, one sample has a right skew and the other has a left skew. It also has little statistical power when sample sizes are small; i.e., it frequently fails to reject the null hypothesis when in fact that hypothesis is false.

The following example assumes you are already familiar with one-factor ANOVA. If you have not already read the page on this topic it is recommended that you do so before proceeding further. We will use the same data file as was used to a perform one-factor ANOVA: SW Oaks Photosynthesis 2017.xlsx

The analysis can be performed in JMP as follows:

Import the data file into JMP and before starting the analysis recode both the Transect and Plant # variables. On the Excel spreadsheet both were entered as numbers, so JMP interpreted them as continuous variables. However, in reality both are categorical: the numbers are just an identification code and could just as easily have been letters. For each variable in turn, double click on the variable name box at the top of the column and in the pop-up window change Data Type to Character (the Modeling Type will automatically change to Nominal), then hit OK. The symbol next to the variable in the column list should then change to a red histogram.

From the Analyze menu select Fit Y by X and click or drag Transect into the X, Factor box and Photosynthetic Rate into the Y, Response box.

Hit OK. A new window will appear with all the data points in four vertical columns, one for each transect.

Click on the red arrow next to Oneway Analysis… and select Nonparametric – Wilcoxon Test and also Nonparametric – Nonparametric Multiple Comparisons – Steel-Dwass All Pairs

The Kruskal-Wallis test calculates a test statistic H (roughly equivalent to F in a one-factor ANOVA), which has a distribution very similar to the chi-squared distribution with k- 1 degrees of freedom, where k is the number of groups (in this case k = 4, so df = 3). Thus, you should make note of the ChiSquare value (18.79) and the associated p-value (Prob>ChiSq = 0.0003) in the 1-Way Test, ChiSquare Approximation table.

In this case the p-value is less than the conventional threshold of 0.05, so we can reject the null hypothesis and accept the alternative hypothesis that, overall, there is significant variation among the four populations. However, the Kruskal-Wallis test alone does not identify which individual populations are responsible for this overall effect.

The Nonparametric Comparisons… table shows the results of all pairwise comparisons between groups using the Steel-Dwass method. This is equivalent to the Tukey-Kramer HSD test used in one-factor ANOVA; i.e., it compares all possible pairs of groups while controlling for the increased probability of obtaining false positive results when multiple comparisons are performed. The paired comparisons are listed in order of increasingly small p-values, with non-significant differences at the top and significant differences at the bottom. Only the differences between groups 4 and 1 (p = 0.0042) and groups 4 and 2 (p = 0.0006) are statistically significant.

A NONPARAMETRIC ALTERNATIVE TO CORRELATION

Pearson’s product-moment correlation coefficient (Pearson’s r) assumes that both variables have a normal distribution. A nonparametric alternative is called the Spearman’s rank correlation (Spearman’s ρ). Use of this test is described on the Correlation and Linear Regression page.

INTRODUCTION

“To call in the statistician after the experiment is done may be no more than asking him to perform a postmortem examination: he may be able to say what the experiment died of.”

Sir Ronald Fisher

Several things determine which statistical test is best for a particular data set. A guide to identifying the appropriate test for your data is provided below. However, a crucial thing to recognize is that the appropriate test depends on your experimental design (or sampling design if you are doing an observational study). Thus, it is strongly recommended that you think about how to analyze your data, and which statistical tests you will use, before the data are collected. There are two reasons for this. First, if you don’t, you risk collecting data that are very difficult to analyze properly (putting the results of your whole study at risk). Second, a good study design will often give you more statistical power (the ability to reject a false null hypothesis), with less time, effort, money - and experimental organisms if you are using them - than a poor one.

KEY QUESTIONS ABOUT YOUR DATA

Four key issues determine which statistical test is appropriate for your data:

1. How many variables do you have?

Are you studying a single variable and comparing the data you have collected with the data you expect to observe based on some prior knowledge (either empirical or theoretical)? Alternatively, are you studying the relationship between two or more variables?

2. Are your variables categorical or numeric?

Categorical variables are ones that group data into a small number of discrete categories. Sometimes there are just two categories (male vs. female; dead vs. alive), sometimes several (British, French, or German; red, yellow, blue, or green). Numeric variables are ones in which the data points are numbers. They can be of two kinds: discrete or continuous. Discrete numeric variables are ones that can take only a certain set of values - for example, the number of leaves on a tree or the number of bacterial colonies on a petri dish. Both these variables can take only integer values, although the number of possible values is very large. Continuous numeric variables are ones that, in principle, can take an infinite number of values if measured precisely enough - for example, body mass, height, nitrogen concentration in a water sample, or cholesterol level in the bloodstream. Despite this distinction, discrete and continuous numeric variables are usually analyzed statistically in the same way.

3. Do your data meet the assumptions of the test?

The most important assumption of many of the most commonly used statistical tests (t-tests, correlation, regression, analysis of variance) is that your data have a normal distribution. A second, often overlooked, assumption is that the amount of variation in your data samples (usually measured by calculating the variance) is approximately equal among samples. Some tests, while not assuming a normal distribution, do assume that the distribution is symmetrical about the mean.

4. Are your data paired?

Paired data are data in which there is a logical one-to-one correspondence between two sets of measurements. If you measured the body length and the body mass of a sample of 40 individuals the data would be paired: each body length measurement would logically correspond to the body mass measurement of each individual. Other examples of paired data include: before and after measurements on the same individual (such as when studying the effects of a pharmaceutical or a lifestyle change on some outcome such blood cholesterol concentration); control vs. treatment effects when tissue samples from each study organism are divided into two subsamples, one receiving the treatment and one being the control; and comparisons between sibling pairs in a multi-family study.

A GUIDE TO THE APPROPRIATE TEST

If you can answer the four questions outlined above, you are in a good position to identify the best statistical test for most of the basic types of study design. It often also helps to think about how your data would look when plotted on an X-Y graph. The following terms are used in the guide below to indicate the best way of plotting your data.

A histogram is a bar graph where the heights of the bars represent counts (frequencies).

A means plot is a bar graph where the heights of the bars represent mean values, often with error bars attached. Means plots can also be visualized as dot plots, where a point, rather than a bar, is used to indicate the mean.

A scatterplot where each individual data point has an X value and a Y value.

After you have identified the appropriate test using the guide below, you can read its dedicated web page to learn more about the logic underlying that test, to how to carry out the test using JMP, and how to interpret the results of the analysis (the output provided by JMP).

1. Studying a single variable and comparing the data you have collected with the data you expect to observe based on some prior knowledge.

If your data are categorical (i.e., if you are comparing observed frequencies to expected frequencies) use a chi-squared goodness of fit test (histogram).

If your data are numeric and are normally distributed (i.e., if you are comparing an observed mean to an expected mean) use a one-sample t-test. If your data are not normally distributed use a Wilcoxon signed-rank test or a sign test. (means plot).

2. Investigating the relationship between two or more variables

If both variables are categorical (i.e., if you are comparing one set of observed frequencies with another) perform a chi-squared contingency table analysis. (histogram).

If both variables are numeric, have a normal distribution, and the data are paired, use correlation or linear regression (scatterplot). See the web page dedicated to these techniques for the differences between then. If your data are not normally distributed use Spearman’s rank correlation.

If you have one or more categorical X variables, one or more numeric Y variables, and if your data are normally distributed use a t-test or analysis of variance (means plot). See 3. below for more details about which of the various types of analysis is most appropriate, and what to do if your data do not have a normal distribution.

If your X variable is numeric and your Y variable is categorical use logistic regression (scatterplot), although the scatterplot would have two horizontal bands of data, rather than a typical cloud of points).

3. Types of t-test and ANOVA

When you have one or more categorical X variables and a numeric Y variable (or variables), the categories of your X variable are normally called groups, levels, or treatments. These three terms can be used interchangeably, but treatment is usually restricted to the groups in an experimental (as opposed to observational) study. In an experiment comparing a pharmaceutical with a placebo, your X variable (which would also be called the independent variable in this context) would have two levels; if the experiment had two pharmaceuticals and a placebo, your X variable would have three levels. The number of X variables, the number levels in each X variable, and the number of Y variables determines which kind of test is most appropriate.

In most types of ANOVA all your X variables are categorical. If you also have a numeric X variable, see option d) below.

Options a) through d) below all assume you are including just one Y variable in your analysis. If you have collected data on several different Y variables, you would normally perform a separate analysis for each variable. If you have measured the same Y variable at several time points on the same set of individuals see option e). There are ways to incorporate multiple different Y variables into a single analysis (MANOVA or “multivariate analysis of variance) but these are beyond the scope of this website.

All types of t-test and ANOVA assume that your data have a normal distribution. Simple alternative tests that do not make this assumption are suggested if they are available.

If you have one X variable with two levels use a t-test, of which there are two types:

If your data are paired, use paired t-test. Use a Wilcoxon signed rank test or sign test if your data do not have a normal distribution.

If your data are unpaired use a two sample t-test. Use a Wilcoxon rank-sum test if your data do not have a normal distribution.

If you have one X variable with three or more levels, use one-factor ANOVA. Use a Kruskal Wallis test if your data do not have a normal distribution.

If you have two X variables, use either two-factor ANOVA or nested ANOVA. See the Introduction to ANOVA page for an explanationof the differences between these two types of analysis.

If you have one or more categorical X variables and also a numeric X variable, use analysis of covariance (ANCOVA). An example of this type of experimental design would be a study comparing the effects of two pharmaceuticals on blood cholesterol concentration that also measured the body mass index (BMI) of all the patients in the study. Here, pharmaceutical type would be the categorical X variable and BMI would be the continuous X variable.

If you have one or more X variables, two or more Y variables of the same type, and if your data are paired, use repeated measures ANOVA. An example of this type of experimental design would be a study comparing the effects of two pharmaceuticals on blood cholesterol concentration (BCC) where the BCC of each patient was measured at three time points: at the start of the study, after six months, and after twelve months. Each set of measurements at a particular time point would constitute one Y variable (so three Y variables in total), and the data would be paired because there is a logical correspondence between the first, second, and third measurement for each individual patient.