Point and Interval Estimates

Transcription

1 Point and Interval Estimates Suppose we want to estimate a parameter, such as p or µ, based on a finite sample of data. There are two main methods: 1. Point estimate: Summarize the sample by a single number that is an estimate of the population parameter; 2. Interval estimate: A range of values within which, we believe, the true parameter lies with high probability. Example. Suppose I wanted to estimate the mean height of all female students at UNC. I took a sample in this class and the sample mean was x = 65.5 (inches). So the obvious thing to do is to take that as an estimate for the population mean. But I didn t have to use the sample mean. I could have taken the sample median (65) or the sample mode (63). It makes sense to ask which is better. 1

2 What properties make a good point estimator? 1. It s desirable that the sampling distribution be centered around the true population parameter. An estimator with this property is called unbiased. 2. It s desirable that our chosen estimator have a small standard error in comparison with other estimators we might have chosen. The sample mean is exactly unbiased (whereas the sample median may not be), and also, if the true population is normal, the sample mean has a smaller standard error than the sample median. Both of these would indicate that the sample mean is preferable to the sample median as an estimator of the population mean. However there are other properties that could nevertheless make the median preferable (e.g. it s more resistant to outliers). 2

3 In the case of a binomial proportion, the obvious point estimator is the sample proportion. For example, consider our example about President Obama s popularity rating (class posted 03/05/09 Chapter 6 material). In this example, 68% of respondents gave Obama a positive rating after he had been in office for one month (the answer could be different if we repreated the poll now). The most natural interpretation of this is that 68% or 0.68 is a statistic which serves as an estimator of the true but unknown proportion of people who would have approved of Obama if the whole population had been surveyed. It seems obvious that we would use the sample proportion as an estimator of the population proportion, but we don t have to. 3

4 Now let s turn to interval estimates. The simplest way to introduce this is through an example. Example. In a college of 25,000 students, the administration would like to know for what proportion of students both parents had completed college. A sample of 350 students was drawn at random and in that sample, 276 of the students said that both their parents had completed college. The sample proportion is = (or 78.9%), so by the same logic as in the last example, it makes sense to use that number also as an estimator of the population proportion (in this case the population is all 25,000 students at this college). But, how accurate is that? 4

6 Sample Proportion ± 1.96 Standard Error ( ) First let s calculate Pr{ 1.96 < X < 1.96} when X is a standard normal random variable (mean 0, standard deviation 1). From the normal table, for z = 1.96 we have left-tail probability For z = 1.96 we have left-tail probability The difference is =0.95. But given that the sample proportion has an approximately normal distribution, this means that the probability that the sample proportion lies within 1.96 standard errors of the true mean is also Or in other words, the probability that the interval (*) includes the true mean is This is what we mean by saying that the interval we calculated is a 95% confidence interval. 6

7 A side comment. Earlier in the course, we said that there is a 95% chance that a normal random variable lies within two standard deviations of the mean (this is part of the empirical rule first discussed in Chapter 2, because although at that time we didn t use the words normal distribution, that s actually what the empirical rule refers to). So why have we now replaced the number 2 with 1.96? Actually, 1.96 is more accurate. If we repeat the above normal probability calculations with z = ±2 instead of z = ±1.96, the probability becomes = That s still quite close to 95%, and 1.96 is quite close to 2, so in practice, it doesn t make much difference whether we use 2 or 1.96 standard deviations. But at this stage of the course, we re trying to be more precise about things than we were earlier on, hence the change. 7

8 Confidence interval for a population proportion To construct a confidence interval to measure the proportion p of a population that has a particular characteristic (e.g. supporters of President Obama): Step 1: Take a sample of size n, calculate ˆp (pronounced p-hat) as the sample proportion of people who have that characteristic (e.g. saying they support Obama in a survey) Step 2: Calculate the standard error SE = ˆp(1 ˆp) n. Note: The formula should really be p, so we use ˆp instead. p(1 p) n, but we don t know Step 3: The 95% confidence interval (ˆp 1.96 SE, ˆp+1.96 SE). 8

9 Example from text. In one question of the GSS in 2000, 1154 people were asked whether they would be willing to pay higher prices to support the environment. 518 said yes. Find a 95% confidence interval for p, the true proportion in the whole population who would be willing to pay higher prices to support the environment. 9

10 Step 1: ˆp = = to 3 decimal places. Step 2: The standard error is = Step 3: = to 3 decimal places. The 95% confidence interval is (0.420,0.478). In practice, we wouldn t usually express this to three decimal places and simply say that we believe the true proportion of people who support the proposition (i.e. who would be willing to pay higher prices to protect the environment) is between 42% and 48%. 10

11 A side comment. In another GSS survey people were asked whether they would support legislation to force industry to adopt more environment-friendly policies. This time close to 80% answered yes. Yet it seems likely that tighter regulation on industry will result in higher prices for consumers (this will certainly be true if you ask the industry representatives). Another case where the wording of a question arguably influences the answer to a much greater extent than standard error calculations indicate. 11

12 Sample size condition For these calculations to be valid (standard error formula including ˆp in place of p, normal approximation for the distribution of ˆp) we require the sample size to be reasonably large. In practice, it is sufficient that nˆp 15 and n(1 ˆp)

13 Confidence intervals with other confidence coefficients So far we have worked with 95% confidence intervals, signifying that there is supposed to be a 95% probability that the interval includes the true population parameter. However, there s nothing special here about the probability 95% we could equally well work with 90%, or 99%, or any other probability we care to specify. If we want a 99% confidence interval, we make the same calculation but replace 1.96 by If we want a 90% confidence interval, we make the same calculation but replace 1.96 by See Table 7.2, page

14 Example. In an earlier example, we considered a sample of 350 students from a college (with a total student population of 25,000) and asked for what fraction of students it was true that both their parents had been to college. In that case, the sample proportion was.789. Suppose we wanted a 90% confidence interval..789 (1.789) The standard error is 350 =.0218 multiply by 1.645, the margin if error is to three decimal places. Thus the 95% confidence interval is (.753,.825). 14

15 Interpretation of a confidence interval Continue the previous example: Does the answer mean there is a 90% chance that the true proportion is between.753 and.825? Strictly speaking, such a statement doesn t make sense we re talking about a finite population of parents; either they went to college or they didn t; what does it mean to talk of a 90% chance? What the 90% confidence statement really means is that in many repetitions of the procedure, the interval will cover the true value 90% of the time. 15

16 Example: Let s suppose the true proportion is 80%. I ran 10 simulations experiments where I generated a binomial random variable with n = 350, p = 0.8. The results were: I now constructed a 90% confidence interval for each of the ten hypothetical samples. The results were: X Lower Upper X Lower Upper Bound Bound Bound Bound The interval is slightly different each time, and in fact, in 9 out of the 10 cases the interval covered the true value

17 Conclusion: The interval is random. the confidence coefficient (in this case, 90%) represents the long-run probability that the interval would cover the true value in many repetitions of the sampling procedure.

The Math Suppose there are n experiments, and the probability that someone gets the right answer on any given experiment is p. So in the first example above, n = 5 and p = 0.2. Let X be the number of correct

Margin of Error When Estimating a Population Proportion Classwork In this lesson, you will find and interpret the standard deviation of a simulated distribution for a sample proportion and use this information

Math 251, Review Questions for Test 3 Rough Answers 1. (Review of some terminology from Section 7.1) In a state with 459,341 voters, a poll of 2300 voters finds that 45 percent support the Republican candidate,

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

Sampling Distribution of a Sample Proportion From earlier material remember that if X is the count of successes in a sample of n trials of a binomial random variable then the proportion of success is given

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

11. CONFIDENCE INTERVALS FOR THE MEAN; KNOWN VARIANCE We assume here that the population variance σ 2 is known. This is an unrealistic assumption, but it allows us to give a simplified presentation which

Sections 7.1 and 7.2 This chapter presents the beginning of inferential statistics. The two major applications of inferential statistics Estimate a population parameter: proportion, mean Test some claim

9.2 Examples Example 1 A simple random sample of size n is drawn. The sample mean,, is found to be 19.2, and the sample standard deviation, s, is found to be 4.7. a) Construct a 95% confidence interval

. Activity 7 Estimating and Finding Confidence Intervals Topic 33 (40) Estimating A Normal Population Mean μ (σ Known) A random sample of size 10 from a population of heights that has a normal distribution

Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

Introduction to the Practice of Statistics Sixth Edition Moore, McCabe Section 5.1 Homework Answers 5.18 Attitudes toward drinking and behavior studies. Some of the methods in this section are approximations

Statistics for Engineers 4-1 4. Introduction to Statistics Descriptive Statistics Types of data A variate or random variable is a quantity or attribute whose value may vary from one unit of investigation

11.2 POINT ESTIMATES AND CONFIDENCE INTERVALS Point Estimates Suppose we want to estimate the proportion of Americans who approve of the president. In the previous section we took a random sample of size

Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this

Lecture #7 Chapter 7: Estimates and sample sizes In this chapter, we will learn an important technique of statistical inference to use sample statistics to estimate the value of an unknown population parameter.

4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

8.2 Confidence Intervals for One Population Mean When σ is Known Tom Lewis Fall Term 2009 8.2 Confidence Intervals for One Population Mean When σ isfall Known Term 2009 1 / 6 Outline 1 An example 2 Finding

MAT140: Applied Statistical Methods Summary of Calculating Confidence Intervals and Sample Sizes for Estimating Parameters Inferences about a population parameter can be made using sample statistics for

Chapter 8 Hypothesis Testing Hypothesis In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing

Chapter 7. Estimates and Sample Size Chapter Problem: How do we interpret a poll about global warming? Pew Research Center Poll: From what you ve read and heard, is there a solid evidence that the average

Section 3.1 Measures of Central Tendency: Mode, Median, and Mean One number can be used to describe the entire sample or population. Such a number is called an average. There are many ways to compute averages,

LESSON SEVEN CONFIDENCE INTERVALS FOR MEANS AND PROPORTIONS An interval estimate for μ of the form a margin of error would provide the user with a measure of the uncertainty associated with the point estimate.

Overview The Basics of a Test Dr Tom Ilvento Department of Food and Resource Economics Alternative way to make inferences from a sample to the Population is via a Test A hypothesis test is based upon A

Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation

CONFIDENCE INTERVALS I ESTIMATION: the sample mean Gx is an estimate of the population mean µ point of sampling is to obtain estimates of population values Example: for 55 students in Section 105, 45 of

ECO 51 Research Methods 1. Data and Descriptive Statistics (Review) Data A variable - a characteristic of population or sample that is of interest for us. Data - the actual values of variables Quantitative

Chapter 6: Confidence Intervals and Hypothesis Testing When analyzing data, we can t just accept the sample mean or sample proportion as the official mean or proportion. When we estimate the statistics

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 8.1 Homework Answers 8.1 In each of the following circumstances state whether you would use the large sample confidence interval,

Comparing Means Between Groups Michael Ash Lecture 6 Summary of Main Points Comparing means between groups is an important method for program evaluation by policy analysts and public administrators. The

Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: THE NORMAL CURVE AND "Z" SCORES: The Normal Curve: The "Normal" curve is a mathematical abstraction which conveniently

The Binomial Probability Distribution MATH 130, Elements of Statistics I J. Robert Buchanan Department of Mathematics Fall 2015 Objectives After this lesson we will be able to: determine whether a probability

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

Two-sample hypothesis testing, I 9.07 3/09/2004 But first, from last time More on the tradeoff between Type I and Type II errors The null and the alternative: Sampling distribution of the mean, m, given

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

Lab 6: Sampling Distributions and the CLT Objective: The objective of this lab is to give you a hands- on discussion and understanding of sampling distributions and the Central Limit Theorem (CLT), a theorem

Sampling Distribution for a Proportion Start with a population, adult Americans and a binary variable, whether they believe in God. The key parameter is the population proportion p. In this case let us

AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

Sample Practice problems - chapter 12-1 and 2 proportions for inference - Z Distributions Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide

Module 5 Hypotheses Tests: Comparing Two Groups Objective: In medical research, we often compare the outcomes between two groups of patients, namely exposed and unexposed groups. At the completion of this

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

MATH 13150: Freshman Seminar Unit 8 1. Prime numbers 1.1. Primes. A number bigger than 1 is called prime if its only divisors are 1 and itself. For example, 3 is prime because the only numbers dividing

Probability Models for Discrete Variables Our study of probability begins much as any data analysis does: What is the distribution of the data? Histograms, boxplots, percentiles, means, standard deviations

MEASURES OF DISPERSION Measures of Dispersion While measures of central tendency indicate what value of a variable is (in one sense or other) average or central or typical in a set of data, measures of

Chapter 7 Part 2 Hypothesis testing Power November 6, 2008 All of the normal curves in this handout are sampling distributions Goal: To understand the process of hypothesis testing and the relationship

Mind on Statistics Chapter 10 Section 10.1 Questions 1 to 4: Some statistical procedures move from population to sample; some move from sample to population. For each of the following procedures, determine

Numerical Measures of Central Tendency Often, it is useful to have special numbers which summarize characteristics of a data set These numbers are called descriptive statistics or summary statistics. A

I. Terms, Concepts. Introduction to Hypothesis Testing A. In general, we do not know the true value of population parameters - they must be estimated. However, we do have hypotheses about what the true

Chapter 7 - Practice Problems 1 SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. Provide an appropriate response. 1) Define a point estimate. What is the

Lesson 20 Probability and Cumulative Distribution Functions Recall If p(x) is a density function for some characteristic of a population, then Recall If p(x) is a density function for some characteristic

Practice for Chapter 6 & 7 Math 227 This is merely an aid to help you study. The actual exam is not multiple choice nor is it limited to these types of questions. Using the following uniform density curve,

Statistics GCSE Higher Revision Sheet This document attempts to sum up the contents of the Higher Tier Statistics GCSE. There is one exam, two hours long. A calculator is allowed. It is worth 75% of the

Comparing Two Groups Chapter 7 describes two ways to compare two populations on the basis of independent samples: a confidence interval for the difference in population means and a hypothesis test. The

Confidence intervals Today, we re going to start talking about confidence intervals. We use confidence intervals as a tool in inferential statistics. What this means is that given some sample statistics,