Lab 4: Sampling & Decisions

YOUR NAME

YOUR PARTNERS NAME

2019-11-13 17:02:17

A key focus of this week is how to make inferences about populations based on samples. The essential logic lies in comparing a single instance of a statistic such as a sample mean to a distribution of such values. The comparison can lead to one of two conclusions – the sample statistic is either extreme or not extreme. But what are the thresholds for making this kind of judgment call (i.e., whether a value is extreme or not)? This activity explores that question.

The problem is this: You receive a sample containing the ages of 30 students. You are wondering whether this sample is a group of undergraduates (mean age = 20 years) or graduates (mean age = 25 years). To answer this question, you must compare the mean of the sample you receive to a distribution of means from the population. The following fragment of R code begins the solution:

After you run this code, the variable “testSample” will contain either a sample of undergrads or a sample of grads. The line before last “flips a coin” by generating one value from a uniform distribution (by default the distribution covers 0 to 1) and comparing it to 0.5. The question you must answer with additional code is: Which is it, grad or undergrad? Here are the steps that will help you finish the job:

Annotate the code above with line-by-line commentary. To get full credit on this assignment you must demonstrate a clear understanding of what the six lines of code above actually do! You will have to lookup the meaning of some commands. For the set.seed() function take a look at this StackOverflow thread: reasons for using the set seed function

The next line of code should generate a list of sample means from the population called “studentPop.” Very similar code to accomplish this appears right in the middle of Chapter 10. How many sample means should you generate? Really you can create any number that you want – hundreds, thousands, whatever – but I suggest for ease of inspection that you generate just 100 means. That is a pretty small number, but it makes it easy to think about percentiles and ranks.

Once you have your list of sample means generated from studentPop, the trick is to compare mean(testSample) to that list of sample means and see where it falls. Is it in the middle of the pack? Far out toward one end? Here is one hint that will help you: In chapter 10 (p. 90), the quantile() command is used to generate percentiles based on thresholds of 2.5% and 97.5%. Those are the thresholds we want, and the quantile() command will help you create them.