When we look at the distribution of sample means, there are always two possible explanations when we are deciding if our treatment has been effective: Our first is to say that there is no effect at all and any change is due to simple sampling error. Our second is to say that there is a real effect that was caused by some type of treatment

When we are looking at a distribution of sample means, there will certainly be some means that are larger and some that are smaller. In order to compare these outlier means to all of the others in the distribution, we use a Z-score. This allows us to compare things that we otherwise wouldnt have been able to.

A Z-distribution is a very special type of distribution. This distribution will always have a mean of 0 and a standard deviation of 1, no matter what. This is very useful because it allows us to compare different means and various values that, in their raw form, would not allow us the capacity to compare. A normal distribution, the kind with the bell curve, will naturally happen with large sample sizes, a property attributable to the central limit theorem. A z-distribution with a normal curve has, again, a special name- the Unit Normal Distribution.

A proportion more extreme value, more commonly called a p-value, is defined a way of describing how extreme a value in a distribution really is. The actual value we use is the proportion of the graph that is more extreme than our given score. This is helpful when we are analyzing our distribution is any various number of ways.

A one-tailed p-value is the proportion of the distribution that is more extreme in one tail. Or in other words, the amount of the distribution that is more extreme in a given direction, positive or negative.
When using JMP to create these distributions, there is an option to check "greater than value" or "less than value". This allows us to see the proportion of the graph that we want to see, whether than be the proportion greater than or less than.

A two-tailed p-value is the proportion of the distribution that is more extreme in both tails. Or in other words, the amount of the distribution that is more extreme in both directions, positive and negative.
When using JMP to create these distributions, there is the option to choose "Between Value 1 and Value 2" or "Outside Value 1 and Value 2". This allows us to see the proportion of the distribution that lies within our two tails or outside of the two tails.

A hypothesis test, as defined in Julian's slides, is a statistical method that uses sample data to evaluate a hypothesis about a population. This is the main point of experimental statistics, to explore relationships between variables. It is the cornerstone of the scientific method, to form a hypothesis and test it.

Statistical hypotheses are what we are exploring and trying, ultimately, to prove/disprove or, at the very least, find the relationship between. There are a couple different hypotheses that are special that we will be working with: the null and the alternative. We will learn more about them in the next section, but do know that these two have a relationship that is described as "mutually exclusive and exhaustive"- basically, the null and alternative cover every possible explanation and don't overlap, meaning one of them is always right.

The first step in hypothesis testing is to form a Statistical Hypothesis. In other words, what is our best guess at the experiment? There are two outcomes -the Null Hypothesis and the Alternative Hypothesis. These two are both mutually exclusive and exhaustive. In other words, our conclusion must be one or the other, and one has to be true while the other is false.

The Null Hypothesis is the hypothesis that we must fall to by default. It is designated by the notation H0 (read "H-nought," or "H-zero"). This hypothesis is where, in the population, there is no change, difference, or relationship. In other words, what we did to our sample will not have an effect to the population.

This image (uploaded from Dr. Parris's Slides), explains the relationship between the data. Since they are equal, there is no difference the between the two groups.

On the other hand, the Alternative Hypothesis is where, in the population, there is a change, difference, or relationship. In simpler terms, what we experimented on does have an effect to the population. However, it is worth noting that this hypothesis we conclude to is the summation of sampling error AND real effect. This is denoted as H1.

Corresponding to the alpha level is the Critical Region. While the alpha level is only a number, the critical region is the area where beyond the majority of our normal distribution. This area is set where we defined our alpha level. If our outcome hits anywhere in the critical region, we are able to say that our sample is different from the population due to both sample error and effect. Under a two-tailed test, the region is +/-1.96.

Two parts to hypothesis testing are data collection and data statistics. A test statistic is a numerical summary of the degree to which a sample is unlike the samples predicted by null hypothesis - in other words, it summarizes how extreme a sample result is, as compared to others. To put our test statistics to use, we can use a Z-test. The formula used is:

Z-test Statistic formula

The test is used to find the deviation between the observed sample mean and the mean of a sampling distribution, assuming that the null hypothesis is true. Basically, after standardizing the numbers, we see how deviated our sample mean is from other sample means when the null is true.

If the Z-test statistic shows that our mean does not land in the critical region, it's reasonable to conclude that sampling error alone could have return us with the sample mean. This means that we do not have enough evidence to reject the null hypothesis (note that this does not mean we automatically accept/reject the alternate hypothesis).
Recall that we get the p-value to see the proportion of a distribution that's more extreme than a given score. Two different types of p-value assessment are one-tail and two-tail. If it's a one-tail p-value, we look at one extremity. On the other hand, if it's a two-tail p-value, we look at both extremities.
How do we decide whether to reject or accept the hypothesis? We look at whether p-value is larger or less than the alpha level. A p-value that's less than the alpha level (p<0.05) means our sample is in the critical region; in other words, it's statistically significant. (Fair warning: never drop "statistically". "statistically significant" is a very specific claim!)
The flow chart gives a final touch on how we use a p-value to assess a hypothesis test: