2 Chapter 7: Data for Decisions SamplingStatisticsThe science of collecting, organizing, and interpreting data.How is the data produced? Sampling and experiments.SamplingGather information about a large group of individuals.Time, cost, and inconvenience forbid contacting every individual.Instead, gather information about only part of the group in order to draw conclusions about the whole.Population – The entire group of individuals about whom we want information.Sample – Part of the population from which we actually collect information used to draw conclusions about the whole.2

3 Chapter 7: Data for Decisions Bad Sampling MethodsIf personal choice is involved in selecting the sample, the following could happen:Results could become biased.The sample may not be a true representation of the population.Convenience SamplesInterviewer chooses the sample from individuals close at hand (easiest to reach).Example: Mall surveysVoluntary Response SamplePeople who choose themselves by responding to a general appeal.People with strong opinions are most likely to respond; can cause bias.Examples: Online polls, call-ins, write-ins.Bias – The design of a statistical study that systematically favors a certain outcome.3

4 Chapter 7: Data for Decisions Simple Random SamplesSimple Random Sample (SRS)An SRS of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected.Choosing a sample by chance avoids bias by giving all individuals an equal chance to be chosen (a good sampling method).Examples of SRSDraw names from a hat: Place all the names of the people in the population into a hat and draw out a handful (the sample).Slow and inconvenientUse the table of random digits: A more efficient way of randomly selecting the sample without bias.For smaller samples, tables of random digits are used.For larger samples, computers do the random digit sampling.4

5 Chapter 7: Data for Decisions Simple Random SampleTwo Steps in Choosing a Simple Random SampleGive each member of the population a numerical label of the same length.Example: 100 items can be labeled with two digits 01, 02, …, 99, 00To choose the random sample, select a line in the digit table.For a sample size of n, start reading off numbers of length of the labels until n individuals are selected from the population.When selecting the n individuals for the sample from the random digits table:1. Do not use any group of digits not used as a label.2. Do not use any repeats.A table of random digits – A list of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these two properties:Each entry in the table is equally likely to be any of the 10 digits from 0 through 9 .The entries are independent of one another. That is, knowledge of one part of the table gives no information about the other part.5

6 Chapter 7: Data for Decisions Simple Random SampleUsing the Random Digit TableExample:A group of 70 people were labeled 01, 02, 03, …, 69, 70.In the random digits table, line 104 was selected and three lucky winners were selected.Reading off two-digit labels from line 104…52 was selected first; 71 was skipped over (because it is not in the range of labels); 13 was chosen; then 88, 89, and 93 were skipped over (out of range); and 07 was chosen. Winners: 52 ,13, and 076

7 Chapter 7: Data for Decisions Cautions About Sample SurveysSample surveys of large populations require the following:A good sampling design (can be done with SRS)An accurate and complete list of the populationParticipation of all individuals selected for the sampleA question posed that is neutral and clearBias can occur due to the following:Problems with obtaining an accurate and complete population listUndercoverage – Occurs when some groups in the population are left out of the process of choosing the sample.Example: Homeless, prison inmates, students in dormitories, etc.Problems with getting 100% participation of sampled peopleNonresponse – Occurs when an individual chosen for the sample cannot be contacted or refuses to participate.Problems with posing a misleading or confusing question7

8 Chapter 7: Data for Decisions ExperimentsObservation versus ExperimentsObservational Study – Example: sample surveyObserves individuals and measures variable of interest but does not attempt to influence the response.Purpose is to describe some group or situation.ExperimentDeliberately imposes some treatment on individuals in order to observe their responses.Purpose is to study whether the treatment causes a change in the response.8

9 Chapter 7: Data for Decisions ExperimentsExamining Cause and Effect Between VariablesExperiments are the preferred method for examining the effect of one variable on another.By imposing specific treatment of interest and controlling other influences, we can pin down cause and effect.Uncontrolled ExperimentWhen it is not possible to control outside factors that can influence the outcome.Confounding – The variables, whether part of a study or not, are said to be confounded when their effects on the outcome cannot be distinguished from each other.

10 Chapter 7: Data for Decisions ExperimentsExample: GMAT Prep Class A college only offers a GMAT exam preparation course online, whereas in the past it was only offered live. The students who take the online course score an average of 10% higher on the GMAT exam than those who took the live course in the past. Can we conclude that the online course is more effective? No, there may be confounding variables.

11 Chapter 7: Data for Decisions ExperimentsRandomized Comparative Experiment (helps confounding)The outside effects and confounding variables act on all groups.An experiment to compare two or more treatments in which people, animals, or things are assigned to treatments by chance.Randomized – The subjects are assigned to treatments by chance.Comparative – Compares two or more treatments.11

12 Chapter 7: Data for Decisions Experiments versus Observational StudiesPlacebo EffectThe effect of a dummy treatment (such as an inert pill in a medical experiment) on the response of the subjects.The tendency to respond favorably to any treatment.Double-Blind ExperimentsAn experiment in which neither the experimental subjects nor the persons who interact with them know which treatment each subject received. This helps to eliminate possible influences or biases between the subjects and workers — everyone is kept “blind.”Observational StudyDoes not try to manipulate the environment (such as assigning treatments to people); it simply observes the measurements of variables of interest that result from people’s free choices.

13 Chapter 7: Data for Decisions Inference: From Sample to PopulationStatistical InferenceWhen the sample was chosen at random from a population, we can infer conclusions about the wider population from these data.Statistical inference works only if the data comes from random samples or a randomized comparative experiment.Parameter is a number that describes the population.A parameter is a fixed number (in practice we do not know its value).A statistic is a number that describes a sample.The value of a statistic is known when we have taken a sample, but it can change from sample to sample.13

14 Chapter 7: Data for Decisions Inference: From Sample to PopulationExample:A random sample of 2500 people was chosen from the population and asked a question: “Do you like getting new clothes but find shopping for clothes frustrating and time consuming?” people agreed.Infer that 66% of the population agrees.

15 Chapter 7: Data for Decisions Inference: From Sample to PopulationSampling DistributionThe distribution of values taken by the statistic in all possible samples of the same size from the same population.For a fixed number of trials, a distribution with larger sample sizes will have less variation and the values will lie closer to the mean.15

16 Chapter 7: Data for Decisions Inference: From Sample to PopulationSample ProportionChoose a SRS of size n from a large population that contains population proportion p of successes. Then the sample proportion of successes is:Then…Shape: For large sample sizes , the sampling distribution of is approximately normal.Center: The mean of the sampling distribution of is p.Spread: The standard deviation of the sampling distribution is:For the shopping example…With a mean p = 0.6 and n = 2500, stand. dev. is16

17 Chapter 7: Data for Decisions Confidence IntervalsThe Rule68% of the observations fall within ± 1 standard deviation of the mean.95% of the observations fall within ± 2 standard deviations of the mean.99.7% of the observations fall within ± 3 standard deviations of the mean.95% Confidence IntervalAn interval obtained from the sample data by a method that in 95% of all samples will produce an interval containing the true population parameterA 95% confidence interval for p is approximately:17