2 1 Hypothesis Testing 1.1 Introduction Hypotheses Suppose we are going to obtain a random i.i.d. sample X = (X 1,..., X n ) of a random variable X with an unknown distribution P X. To proceed with modelling the underlying population, we might hypothesise probability models for P X and then test whether such hypotheses seem plausible in light of the realised data x = (x 1,..., x n ). Or, more specifically, we might fix upon a parametric family P X θ with unknown parameter θ and then hypothesise values for θ. Generically let θ 0 denote a hypothesised value for θ. Then after observing the data, we wish to test whether we can indeed reasonably assume θ = θ 0. For example, if X N(µ, σ ) we may wish to test whether µ = 0 is plausible in light of the data x. Formally, we define a null hypothesis H 0 as our hypothesised model of interest, and also specify an alternative hypothesis H 1 of rival models against which we wish to test H 0. Most often we simply test H 0 : θ = θ 0 against H 1 : θ = θ 0. This is known as a twosided test. In some situations it may be more appropriate to consider alternatives of the form H 1 : θ > θ 0 or H 1 : θ < θ 0, known as one-sided tests. Rejection Region for a Test Statistic To test the validity of H 0, we first choose a test statistic T(X) of the data for which we can find the distribution, P T, under H 0. Then, we identify a rejection region R R of low probability values of T under the assumption that H 0 is true, so P(T R H 0 ) = α for some small probability α (typically 5%). A well chosen rejection region will have relatively high probability under H 1, whilst retaining low probability under H 0. Finally, we calculate the observed test statistic t(x) for our observed data x. If t R we reject the null hypothesis at the 100α% level. p-values For each possible significance level α (0, 1), a hypothesis test at the 100α% level will result in either rejecting or not rejecting H 0. As α 0 it becomes less and less likely that the null hypothesis will be rejected, as the rejection region is becoming smaller and smaller. Similarly, as α 1 it becomes more and more likely that the null hypothesis will be rejected. For a given data set and resulting test statistic, we might, therefore, be interested in identifying the critical significance level which marks the threshold between us rejecting and not rejecting the null hypothesis. This is known as the p-value of the data. Smaller p-values suggest stronger evidence against H Error Rates and Power of a Test Test Errors There are two types of error in the outcome of a hypothesis test:

3 Type I: Rejecting H 0 when in fact H 0 is true. By construction, this happens with probability α. For this reason, the significance level of a hypothesis test is also referred to as the Type I error rate. Type II: Not rejecting H 0 when in fact H 1 is true. The probability with which this type of error will occur depends on the unknown true value of θ, so to calculate values we plug-in a plausible alternative value for θ = θ 0, θ 1 say, and let β = P(T / R θ = θ 1 ) be the probability of a Type II error. Power We define the power of a hypothesis test by 1 β = P(T R θ = θ 1 ). For a fixed significance level α, a well chosen test statistic T and rejection region R will have high power - that is, maximise the probability of rejecting the null hypothesis when the alternative is true. Testing for a population mean.1 Normal Distribution with Known Variance N(µ, σ ) - σ Known Suppose X 1,..., X n are i.i.d. N(µ, σ ) with σ known and µ unknown. We may wish to test if µ = µ 0 for some specific value µ 0 (e.g. µ 0 = 0, µ 0 = 9.8). Then we can state our null and alternative hypotheses as H 0 :µ = µ 0 ; H 1 :µ = µ 0. Under H 0 : µ = µ 0, we then know both µ and σ. So for the sample mean X we have a known distribution for the test statistic Z = X µ 0 σ/ n Φ. So if we define our rejection region R to be the 100α% tails of the standard normal distribution distribution, ) ( ) R = (, z 1 α z 1 α, { } z z > z 1 α, we have P(Z R) = α under H 0. We thus reject H 0 at the 100α% significance level our observed test statistic z = x µ 0 σ/ n R. The p-value is given by {1 Φ( z )}. 3

4 Ex. 5% Rejection Region for N(0, 1) Statistic φ(z) R z.1.1 Duality with Confidence Intervals There is a strong connection in this context between hypothesis testing and confidence intervals. Suppose we have constructed a 100(1 α)% confidence interval for µ. Then this is precisely the set of values {µ 0 } for which there would be insufficient evidence to reject a null hypothesis H 0 : µ = µ 0 at the 100α%-level. Example A company makes packets of snack foods. The bags are labelled as weighing 454g; of course they won t all be exactly 454g, and let s suppose the variance of bag weights is known to be 70g². The following data show the mass in grams of 50 randomly sampled packets. 464, 450, 450, 456, 45, 433, 446, 446, 450, 447, 44, 438, 45, 447, 460, 450, 453, 456, 446, 433, 448, 450, 439, 45, 459, 454, 456, 454, 45, 449, 463, 449, 447, 466, 446, 447, 450, 449, 457, 464, 468, 447, 433, 464, 469, 457, 454, 451, 453, 443 Are these data consistent with the claim that the mean weight of packets is 454g? 1. We wish to test H 0 : µ = 454 vs. H 1 : µ = 454. So set µ 0 = Although we have not been told that the packet weights are individually normally distributed, by the CLT we still have that the mean weight of the sample of packets is approximately normally distributed, and hence we still approximately have Z = X µ 0 σ/ n Φ. 3. x = 451. and n = 50 z = x µ 0 σ/ n = For a 5%-level significance test, we compare the statistic z =.350 with the rejection region R = (, z ) (z 0.975, ) = (, 1.96) (1.96, ). Clearly we have z R, and so at the 5%-level we reject the null hypothesis that the mean packet weight is 454g. 4

6 Example 1 Consider again the snack food weights example. There, we assumed the variance of bag weights was known to be 70. Without this, we could have estimated the variance by s n 1 = 1 n 1 n i=1 Then the corresponding t-statistic becomes very similar to the z-statistic of before. (x i x) = t = x µ 0 s n 1 / n =.341, And since n = 50, we compare with the t 49 distribution which is approximately N(0, 1). So the hypothesis test results and p-value would be practically identical. Example A particular piece of code takes a random time to run on a computer, but the average time is known to be 6 seconds. The programmer tries an alternative optimisation in compilation and wishes to know whether the mean run time has changed. To explore this, he runs the reoptimised code 16 times, obtaining a sample mean run time of 5.8 seconds and bias-corrected sample standard deviation of 1. seconds. Is the code any faster? 1. We wish to test H 0 : µ = 6 vs. H 1 : µ = 6. So set µ 0 = 6.. Assuming the run times are approximately normal, T = X µ 0 s n 1 / n t n 1. That is, X 6 s n 1 / 16 t 15. So we reject H 0 at the 100α% level if T > t 15,1 α/. 3. x = 5.8, s n 1 = 1. and n = 16 t = x µ 0 s n 1 / n = We have t = /3 <<.13 = t 15,.975, so we have insufficient evidence to reject H 0 at the 5% level. 5. In fact, the p-value for these data is 51.51%, so there is very little evidence to suggest the code is now any faster. 6

7 3 Testing for differences in population means 3.1 Two Sample Problems Samples from Populations Suppose, as before, we have a random sample X = (X 1,..., X n1 ) from an unknown population distribution P X. But now, suppose we have a further random sample Y = (Y 1,..., Y n ) from a second, different population P Y. Then we may wish to test hypotheses concerning the similarity of the two distributions P X and P Y. In particular, we are often interested in testing whether P X and P Y have equal means. That is, to test H 0 : µ X = µ Y vs H 1 : µ X = µ Y. Paired Data A special case is when the two samples X and Y are paired. That is, if n 1 = n = n and the data are collected as pairs (X 1, Y 1 ),..., (X n, Y n ) so that, for each i, X i and Y i are possibly dependent. For example, we might have a random sample of n individuals and X i represents the heart rate of the i th person before light exercise and Y i the heart rate of the same person afterwards. In this special case, for a test of equal means we can consider the sample of differences Z 1 = X 1 Y 1,..., Z n = X n Y n and test H 0 : µ Z = 0 using the single sample methods we have seen. In the above example, this would test whether light exercise causes a change in heart rate. 3. Normal Distributions with Known Variances N(µ X, σ X ), N(µ Y, σ Y ) Suppose X = (X 1,..., X n1 ) are i.i.d. N(µ X, σ X ) with µ X unknown; Y = (Y 1,..., Y n ) are i.i.d. N(µ Y, σ Y ) with µ Y unknown; the two samples X and Y are independent. Then we still have that, independently, ( ) X N µ X, σ X n 1 ( ) Ȳ N µ Y, σ Y n From this it follows that the difference in sample means, ( ) X Ȳ N µ X µ Y, σ X + σ Y, n 1 n 7

9 which is an unbiased estimator for σ. We can immediately see that s n 1 +n is indeed an unbiased estimate of σ by noting S n 1 +n = n 1 1 n 1 + n S n n 1 n 1 + n S n 1 ; That is, s n 1 +n is a weighted average of the bias-corrected sample variances for the individual samples x and y, which are both unbiased estimates for σ. Then substituting S n1 +n in for σ we get and so, under H 0 : µ X = µ Y, ( X Ȳ) (µ X µ Y ) t n1 +n 1/n1 + 1/n, S n1 +n T = S n1 +n X Ȳ t n1 +n 1/n1 + 1/n. So we have a rejection region for a hypothesis test of H 0 : µ X = µ Y vs. H 1 : µ X = µ Y at the 100α% level given by for the statistic { R = t t > t n1 +n,1 α }, t = s n1 +n x ȳ. 1/n1 + 1/n Example The same piece of C code was repeatedly run after compilation under two different C compilers, and the run times under each compiler were recorded. The sample mean and biascorrected sample standard deviation for Compiler 1 were 114s and 310s respectively, and the corresponding figures for Compiler were 94s and 90s. Both sets of data were each based on 15 runs. Suppose that Compiler is a refined version of Compiler 1, and so if µ 1, µ are the expected run times of the code under the two compilations, we might fairly assume µ µ 1. Conduct a hypothesis test of H 0 : µ 1 = µ vs H 1 : µ 1 > µ at the 5% level. Until now we have mostly considered two-sided tests. That is tests of the form H 0 : θ = θ 0 vs H 1 : θ = θ 0. Here we need to consider one-sided tests, which differ by the alternative hypothesis being of the form H 1 : θ < θ 0 or H 1 : θ > θ 0. This presents no extra methodological challenge and requires only a slight adjustment in the construction of the rejection region. We still use the t-statistic t = s n1 +n x ȳ, 1/n1 + 1/n 9

10 where x, ȳ are the sample mean run times under Compilers 1 and respectively. But now the one-sided rejection region becomes R = {t t > t n1 +n,1 α }. First calculating the bias-corrected pooled sample variance, we get s n 1 +n = = 300. (Note that since the sample sizes n 1 and n are equal, the pooled estimate of the variance is the average of the individual estimates.) So t = s n1 +n = 10 = x ȳ = 1/n1 + 1/n /15 + 1/15 For a 1-sided test we compare t = 3.16 with t 8,0.95 = and conclude that we reject the null hypothesis at the 5% level; the second compilation is significantly faster. 10

11 4 Goodness of Fit 4.1 Count Data and Chi-Square Tests Count Data The results in the previous sections relied upon the data being either normally distributed, or at least through the CLT having the sample mean being approximately normally distributed. Tests were then developed for making inference on population means under those assumptions. These tests were very much model-based. Another important but very different problem concerns model checking, which can be addressed through a more general consideration of count data for simple (discrete and finite) distributions. The following ideas can then be trivially extended to infinite range discrete and continuous r.v.s by binning observed samples into a finite collection of predefined intervals. Samples from a Simple Random Variable Let X be a simple random variable taking values in the range {x 1,..., x k }, with probability mass function p j = P(X = x j ), j = 1,..., k. A random sample of size n from the distribution of X can be summarised by the observed frequency counts O = (O 1,..., O k ) at the points x 1,..., x k (so k j=1 O j = n). Suppose it is hypothesised that the true pmf {p j } is from a particular parametric model p j = P(X = x j θ), j = 1,..., k for some unknown parameter p-vector θ. To test this hypothesis about the model, we first need to estimate the unknown parameters θ so that we are able to calculate the distribution of any statistic under the null hypothesis H 0 : p j = P(X = x j θ), j = 1,..., k. Let ˆθ be such an estimator, obtained using the sample O. Then under H 0 we have estimated probabilities for the pmf ˆp j = P(X = x j ˆθ), j = 1,..., k and so we are able to calculate estimated expected frequency counts E = (E 1,..., E k ) by E j = n ˆp j. (Note again we have k j=1 E j = n.) We then seek to compare the observed frequencies with the expected frequencies to test for goodness of fit. Chi-Square Test To test H 0 : p j = P(X = x j θ) vs. H 1 : 0 p j 1, p j = 1 we use the chi-square statistic X = k (O i E i ). E i=1 i If H 0 were true, then the statistic X would approximately follow a chi-square distribution with ν = k p 1 degrees of freedom. k is the number of values (categories) the simple r.v. X can take. p is the number of parameters being estimated (dim(θ)). For the approximation to be valid, we should have j, E j 5. This may require some merging of categories. 11

12 χ ν pdf for ν = 1,, 3, 5, 10 χ ν(x) χ 1 χ χ 3 χ 5 χ x Rejection Region Clearly larger values of X correspond to larger deviations from the null hypothesis model. That is, if X = 0 the observed counts exactly match those expected under H 0. For this reason, we always perform a one-sided goodness of fit test using the χ statistic, looking only at the upper tail of the distribution. Hence the rejection region for a goodness of fit hypothesis test at the at the 100α% level is given by R = { } x x > χ k p 1,1 α. 4. Proportions Example Each year, around 1.3 million people in the USA suffer adverse drug effects (ADEs). A study in the Journal of the American Medical Association (July 5, 1995) gave the causes of 95 ADEs below. Cause Number of ADEs Lack of knowledge of drug 9 Rule violation 17 Faulty dose checking 13 Slips 9 Other 7 Test whether the true percentages of ADEs differ across the 5 causes. Under the null hypothesis that the 5 causes are equally likely, we would have expected counts of 95 5 = 19 for each cause. 1

13 So our χ statistic becomes x = (9 19) 19 + (17 19) 19 + (13 19) 19 = = = (9 19) 19 + (7 19) 19 We have not estimated any parameters from the data, so we compare x with the quantiles of the χ 5 1 = χ 4 distribution. Well 16 > 9.49 = χ 4,0.95, so we reject the null hypothesis at the 5% level; we have reason to suppose that there is a difference in the true percentages across the different causes. 4.3 Model Checking Example - Fitting a Poisson Distribution to Data Recall the example from the Discrete Random Variables chapter, where the number of particles emitted by a radioactive substance which reached a Geiger counter was measured for 608 time intervals, each of length 7.5 seconds. We fitted a Poisson(λ) distribution to the data by plugging in the sample mean number of counts (3.870) for the rate parameter λ. (Which we now know to be the MLE!) x O(n x ) E(n x ) (O=Observed, E=Expected). Whilst the fitted Poisson(3.87) expected frequencies looked quite convincing to the eye, at that time we had no formal method of quantitatively assessing the fit. However, we now know how to proceed. x O E O E (O E) E The statistic x = (O E) E = should be compared with a χ = χ 9 distribution. Well χ 9,0.95 = 16.91, so at the 5% level we do not reject the null hypothesis of a Poisson(3.87) model for the data. 4.4 Independence Contingency Tables Suppose we have two simple random variables X and Y which are jointly distributed with unknown probability mass function p XY. We are often interested in trying to ascertain whether X and Y are independent. That is, determine whether p XY (x, y) = p X (x)p Y (y). 13

14 Let the ranges of the r.v.s X and Y be {x 1,..., x k } and {y 1,..., y l } respectively. Then an i.i.d. sample of size n from the joint distribution of (X, Y) can be represented by a list of counts n ij (1 i k; 1 j l) of the number of times we observe the pair (x i, y j ). Tabulating these data in the following way gives what is known as a k l contingency table. y 1 y... y l x 1 n 11 n 1 n 1l n 1 x n 1 n n l n. x k n k1 n k n kl n k n 1 n... n l n Note the row sums (n 1, n,..., n k ) represent the frequencies of x 1, x,..., x k in the sample (that is, ignoring the value of Y). Similarly for the column sums (n 1, n,..., n l ) and y 1,..., y l. Under the null hypothesis H 0 : X and Y are independent, the expected values of the entries of the contingency table, conditional on the row and column sums, can be estimated by ˆn ij = n i n j, 1 i k, 1 j l. n To see this, consider the marginal distribution of X; we could estimate p X (x i ) by ˆp i = n i n. Similarly for p Y (y j ) we get ˆp j = n j n. Then under the null hypothesis of independence p XY (x i, y j ) = p X (x i )p Y (y j ), and so we can estimate p XY (x i, y j ) by ˆp ij = ˆp i ˆp j = n i n j n. Now that we have a set of expected frequencies to compare against our k l observed frequencies, a χ test can be performed. We are using both the row and column sums to estimate our probabilities, and there are k and l of these respectively. So we compare our calculated x statistic against a χ distribution with kl {(k 1) + (l 1)} 1 = (k 1)(l 1) degrees of freedom. Hence the rejection region for a hypothesis test of independence in a k l contingency table at the at the 100α% level is given by R = { } x x > χ (k 1)(l 1),1 α. 14

15 Example An article in International Journal of Sports Psychology (July-Sept 1990) evaluated the relationship between physical fitness and stress. 549 people were classified as good, average, or poor fitness, and were also tested for signs of stress (yes or no). The data are shown in the table below. Poor Fitness Average Fitness Good Fitness Stress No stress Is there any relationship between stress and fitness? Under independence we would estimate the expected values to be Poor Fitness Average Fitness Good Fitness Stress No stress Hence the χ statistic is calculated to be X = i (O i E i ) ( ) ( ) = E i = This should be compared with a χ distribution with ( 1) (3 1) = degrees of freedom. χ,0.95 = 5.99, so we have no significant evidence to suggest there is any relationship between fitness and stress. 15

3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

Chapter 7 Notes - Inference for Single Samples You know already for a large sample, you can invoke the CLT so: X N(µ, ). Also for a large sample, you can replace an unknown σ by s. You know how to do a

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

Introduction to Hypothesis Testing Point estimation and confidence intervals are useful statistical inference procedures. Another type of inference is used frequently used concerns tests of hypotheses.

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

How to Conduct a Hypothesis Test The idea of hypothesis testing is relatively straightforward. In various studies we observe certain events. We must ask, is the event due to chance alone, or is there some

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

The Neyman-Pearson lemma In practical hypothesis testing situations, there are typically many tests possible with significance level α for a null hypothesis versus alternative hypothesis. This leads to

6. Jointly Distributed Random Variables We are often interested in the relationship between two or more random variables. Example: A randomly chosen person may be a smoker and/or may get cancer. Definition.

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

Hypothesis Testing or How to Decide to Decide Edpsy 580 Carolyn J. Anderson Department of Educational Psychology University of Illinois at Urbana-Champaign Hypothesis Testing or How to Decide to Decide

3.6: General Hypothesis Tests The χ 2 goodness of fit tests which we introduced in the previous section were an example of a hypothesis test. In this section we now consider hypothesis tests more generally.

3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

Statistics 641 - EXAM II - 1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the

HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1 PREVIOUSLY used confidence intervals to answer questions such as... You know that 0.25% of women have red/green color blindness. You conduct a study of men

Probability February 14, 2013 Debdeep Pati Hypothesis testing Power of a test 1. Assuming standard deviation is known. Calculate power based on one-sample z test. A new drug is proposed for people with

4.6 I company that manufactures and bottles of apple juice uses a machine that automatically fills 6 ounce bottles. There is some variation, however, in the amounts of liquid dispensed into the bottles

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

Introduction to Hypothesis Testing OPRE 6301 Motivation... The purpose of hypothesis testing is to determine whether there is enough statistical evidence in favor of a certain belief, or hypothesis, about

Sample Multiple Choice Questions for the material since Midterm 2. Sample questions from Midterms and 2 are also representative of questions that may appear on the final exam.. A randomly selected sample

Using pivots to construct confidence intervals In Example 41 we used the fact that Q( X, µ) = X µ σ/ n N(0, 1) for all µ. We then said Q( X, µ) z α/2 with probability 1 α, and converted this into a statement

Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property

Textbook Reference: Chapter 9.5 Measuring the Power of a Test An economic problem motivates the statement of a null and alternative hypothesis. For a numeric data set, a decision rule can lead to the rejection

Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

Chapter 8 Hypothesis Testing Hypothesis In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing

2. DATA AND EXERCISES (Geos2911 students please read page 8) 2.1 Data set The data set available to you is an Excel spreadsheet file called cyclones.xls. The file consists of 3 sheets. Only the third is

Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

THERE ARE TWO WAYS TO DO HYPOTHESIS TESTING WITH STATCRUNCH: WITH SUMMARY DATA (AS IN EXAMPLE 7.17, PAGE 236, IN ROSNER); WITH THE ORIGINAL DATA (AS IN EXAMPLE 8.5, PAGE 301 IN ROSNER THAT USES DATA FROM

Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. Data is organized in a two way table Explanatory variable (Treatments)

CHI-SQUARE: TESTING FOR GOODNESS OF FIT In the previous chapter we discussed procedures for fitting a hypothesized function to a set of experimental data points. Such procedures involve minimizing a quantity

Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then

11-2 Goodness of Fit Test In This section we consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way frequency table). We will use a hypothesis

Math 143 Inference for Means 1 Statistical inference is inferring information about the distribution of a population from information about a sample. We re generally talking about one of two things: 1.

Testing: is my coin fair? Formally: we want to make some inference about P(head) Try it: toss coin several times (say 7 times) Assume that it is fair ( P(head)= ), and see if this assumption is compatible

Chapter 9, Part A Hypothesis Tests Slide 1 Learning objectives 1. Understand how to develop Null and Alternative Hypotheses 2. Understand Type I and Type II Errors 3. Able to do hypothesis test about population

2 x 2 Case Chi Square for Contingency Tables A test for p 1 = p 2 We have learned a confidence interval for p 1 p 2, the difference in the population proportions. We want a hypothesis testing procedure

Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

AP Statistics 2002 Scoring Guidelines The materials included in these files are intended for use by AP teachers for course and exam preparation in the classroom; permission for any other use must be sought