Unit 16 Normal Distributions

Transcription

1 Unit 16 Normal Distributions Objectives: To obtain relative frequencies (probabilities) and percentiles with a population having a normal distribution While there are many different types of distributions that a population may have (see Figures 15-1a to 15-1f), normal distributions are of particular importance in statistics One of the primary reasons normal distributions are so important is because, as we have seen previously, the sampling distribution of x with simple random sampling takes on the properties of a normal distribution, with a sufficiently large sample size n Figure 15-2 gave us a somewhat detailed description of a normal distribution in terms of one, two, and three standard deviations away from the mean We have seen that a normal distribution is symmetric and bell-shaped, and that practically all of the items in a normally distributed population are within three standard deviations of the mean We now want to describe normal distributions in a much more detailed manner than merely in terms of one, two, and three standard deviations away from the mean Table A2 in the appendix provides areas under a normal density curve in terms of z-scores A normal curve described entirely in terms of z-scores is called a standard normal curve By converting raw scores to z-scores, we can use Table A2 to find many different areas under any normal density curve These areas under a normal density curve can be interpreted as relative frequencies or as probabilities In addition to using Table A2, statistical software packages, spreadsheets, programmable calculators, etc can also be used to find areas under a normal density curve The figure at the top of the Table A2 indicates that you can use this table directly to find the area under a normal curve above any value greater than the mean µ Such an area is found by obtaining the z-score of a value greater than the mean and reading the corresponding area in the body of the table The z-scores in the table are all to two decimal place accuracy, with the z-scores to the first decimal place displayed as the row labels, and the column labels providing the second decimal place The areas in the body of the table are all given to four decimal place accuracy To illustrate the use of Table A2, we shall consider several examples involving the population of weights of oranges from a particular grove Let us suppose that the weights of oranges from the grove have a normal distribution with mean µ = 781 oz and standard deviation σ = oz We shall consider finding the percentage of orange weights that lie in a given range, or, in other words, the probability that one randomly 107

2 selected orange has a weight in the given range These relative frequencies (or probabilities) will of course correspond to an appropriate area under a normal density curve To begin, we shall find the percentage, or relative frequency, of oranges that weigh more than 991 oz, which can also be interpreted as the probability that one randomly selected orange has a weight more than 991 oz To find this probability, we first use µ = 781 and σ = to find the z-score of 991 oz as follows: = The proportion of shaded area in Figure 16-1 is the desired probability This shaded area corresponds exactly to the shaded area in the figure at the top of Table A2 The desired area is found directly from Table A2 in the row labeled 15 and the column labeled 009 The probability that one randomly selected orange weighs more than 991 oz is (or 559%) Next, we shall find the percentage, or relative frequency, of oranges that weigh less than 698 oz, which can also be interpreted as the probability that one randomly selected orange has a weight less than 698 oz To find this probability, we first use µ = 781 and σ = to find the z-score of 698 oz as follows: = 063 The proportion of shaded area in Figure 16-2 is the desired probability This shaded area is a mirror image of the shaded area in the figure at the top of Table A2 Since a normal curve is symmetric, the desired area is found directly from Table A2 in the row labeled 06 and the column labeled 003 The probability that one randomly selected orange weighs less than 698 oz is (or 2643%) We cannot always obtain the desired area under the standard normal curve directly from Table A2 It is sometimes necessary to use the fact that the total area under the standard normal curve is one (or 100%) to find desired areas under a standard normal curve To illustrate, we shall find the percentage, or relative frequency, of oranges that weigh less than 1066 oz, which can also be interpreted as the probability that one randomly selected orange has a weight less than 1066 oz To find this probability, we first use µ = 781 and σ = to find the z-score of 1066 oz as follows: = The proportion of shaded area in Figure 16-3 is the desired probability, but this shaded area does not correspond to the shaded area in the figure at the top of Table A2; however, the unshaded area in Figure 16-3 does correspond exactly to the shaded area in the figure at the top of Table A2 This unshaded area is then found directly from Table A2 in the row labeled 21 and the column labeled 006; since the total area under a normal curve is equal to 1, the desired shaded area in Figure 16-3 is found by subtracting this entry of Table B2 from 1 108

3 The probability that one randomly selected orange weighs less than 1066 oz is = (or 9846%) As another illustration, we shall have you find the percentage, or relative frequency, of oranges that weigh more than 653 oz, which can also be interpreted as the probability that one randomly selected orange has a weight more than 653 oz Find the z-score for 653 oz, and label the value 653 oz on the horizontal axis in Figure 16-4 Then, shade the desired area under the normal curve, and use Table A2 to obtain the desired probability (You should find that the z-score is 097, and that the probability that one randomly selected orange weighs more than 653 oz is (or 8340%)) If the desired area under the standard normal curve is in between two given values, then we need to read Table A2 twice Let us obtain the percentage, or relative frequency, of oranges that weigh between 6 and 8 oz, which can also be interpreted as the probability that one randomly selected orange has a weight between 6 and 8 oz To find this probability, we first use µ = 781 and σ = to find the z-score of 6 oz to be = 137, and to find the z-score of 8 oz to be = The proportion of shaded area in Figure 16-5 is the desired probability The unshaded area below 6 oz in Figure 16-5 is a mirror image of the shaded area in the figure at the top of Table A2; also, the unshaded area above 8 oz in Figure 16-5 corresponds exactly to the shaded area in the figure at the top of Table A2 We find the total unshaded area by adding the entry of Table A2 in the row labeled 13 and the column labeled 007 to the entry of Table A2 in the row labeled 01 and the column labeled 004; we then obtain the desired area by subtracting this unshaded area from 1 The probability that one randomly selected orange weighs between 6 and 8 oz is 1 ( ) = (or 4704%) In the illustration just completed, the desired area under the standard normal curve was in between two values, where one was below µ, and the other was above µ We shall now consider instances where the desired area under the standard normal curve is in between two values, where either both values are greater than µ, or both values are less than µ First, we obtain the percentage, or relative frequency, of oranges that weigh between 55 and 65 oz, which can also be interpreted as the probability that one randomly selected orange has a weight between 55 and 65 oz To find this probability, we first use µ = 781 and σ = to find the z-score of 55 oz to be = 175, 109

4 and to find the z-score of 65 oz to be = 099 The proportion of shaded area in Figure 16-6 is the desired probability The unshaded area below 55 oz in Figure 16-6 is a mirror image of the shaded area in the figure at the top of Table A2; also, if the shaded and unshaded areas below 65 oz are combined together in Figure 16-6, this combined area is a mirror image of the shaded area in the figure at the top of Table A2 We find the desired area by subtracting the entry of Table A2 in the row labeled 17 and the column labeled 005 from the entry of Table A2 in the row labeled 09 and the column labeled 009 The probability that one randomly selected orange weighs between 55 and 65 oz is = (or 1210%) Now, we shall let you find the percentage, or relative frequency, of oranges that weigh between 854 and 932 oz, which can also be interpreted as the probability that one randomly selected orange has a weight between 854 and 932 oz Find the z-score for each of 854 and 932 oz, and label the values for 854 and 932 oz on the horizontal axis in Figure 16-7 Then, shade the desired area under the normal curve, and use Table A2 to obtain the desired probability (You should find that the z-scores are +055 and +114, and that the probability that one randomly selected orange weighs between 854 and 932 oz is (or 1641%)) Since practically all of the area under a normal curve is within three standard deviations of the mean, there are no z-scores in Table A2 below 309 or above +309 When one orange weight is randomly selected from the population with mean µ = 781 oz and standard deviation σ = oz, we would consider the probability of observing a weight greater than 20 oz to be practically zero (0), because the z-score of 20 oz is +923 Of course, common sense might suggest to us that the likelihood of finding an orange weighing more than 20 oz is extremely small! In a similar fashion, we would conclude that practically all oranges weigh more than 2 oz (for which the z-score is 440), or in other words, the probability of selecting an orange weighing more than 2 oz is one (1) It is sometimes desirable to obtain percentiles and percentile ranks from a density curve We define the pth percentile to be the value with p% of the total area below it and (100 p)% of the total area above it; also, the percentage of area below a given value x is defined to be the percentile rank of x To illustrate the use of Table B2 in obtaining percentiles and percentile ranks with a normally distributed population, we shall return to the population of orange weights having a normal distribution with mean µ = 781 oz and standard deviation σ = oz Finding the percentile rank for a value x from a normally distributed population simply amounts to calculating the area under the normal density curve which lies below x For instance, earlier we found the percentage of oranges weighing less than 698 oz to be 2643%; we can then say that the percentile rank of an orange weighing 698 oz is 2643 or 26 Often, the percent sign (%) is omitted when stating a percentile rank, 110

5 since a percentile rank is understood to be a percentage As another illustration, recall that we have previously found that the percentage of oranges weighing less than 1066 oz is 9846%; we can then say that the percentile rank of an orange weighing 1066 oz must be 9846 or 98 Finding the pth percentile of a normally distributed population amounts to finding the value below which lies p percent of the area under the normal density curve Since a normal curve is symmetric, the 50th percentile (ie, the median) is equal to the mean µ We determine other percentiles by using Table A2 to find the z-score of the desired percentile and converting this z-score to a raw score To illustrate, we shall obtain the 90th percentile of the orange weights and the 30th percentile of the orange weights To find the 90th percentile, we first realize that the 90th percentile must be greater than the median (which is the mean µ), implying that the z-score of the 90th percentile must be positive We then use Table A2 to determine the desired z-score, that is, the z-score of the orange weight below which lie 90% of all the orange weights and above which lie 10% of all the orange weights Since the areas listed in Table A2 are the areas under the normal curve above a positive z-score, by searching for the area closest to 010 (which is 01003), we find that the desired z-score is +128 In Figure 16-8, label the z-score +128 on the horizontal axis; if you draw a vertical line through the graph at the location of the z-score +128, you should be able to see that roughly 90% of the area under the normal curve lies below this z-score, and 10% of the area under the normal curve lies above this z-score Recall that we use x = µ + zσ to convert a z-score to a raw score Convert the z-score +128 to a raw score to obtain the 90th percentile of the orange weights (You should find that the 90th percentile is approximately 950 oz) To find the 30th percentile of the orange weights, we must first realize that the 30th percentile must be smaller than the median (which is the mean µ), implying that the z-score of the 30th percentile must be negative We then use Table A2 to determine the desired z-score, that is, the z-score of the orange weight below which lie 30% of all the orange weights and above which lie 70% of all the orange weights The areas listed in Table A2 are the areas under the normal curve above a positive z-score, but since normal distributions are symmetric, the areas listed can also be treated as the areas under the normal curve below a negative z-score By searching for the area closest to 030 (which is 03015), we find that the desired z-score is 052 In Figure 16-9, label the z-score 052 on the horizontal axis; if you draw a vertical line through the graph at the location of the z-score 052, you should be able to see that roughly 30% of the area under the normal curve lies below this z-score, and 70% of the area under the normal curve lies above this z-score Use x = µ + zσ to convert the z-score 052 to a raw score to obtain the 30th percentile of the orange weights (You should find that the 30th percentile is approximately 712 oz) We have now illustrated the use of Table A2 in obtaining desired areas under the standard normal curve and in obtaining percentiles Now that we have thoroughly discussed normal distributions, we once again recall how we have previously observed that the sampling distribution of x with simple random samples of sufficiently large size n, takes on the properties of a normal distribution; consequently, the sampling distribution of x can be approximated using a normal distribution, and we shall very shortly begin making heavy use of this fact 111

6 Self-Test Problem 16-1 Suppose the right-hand grip strength for men between the ages of 20 and 40 is normally distributed with mean 863 lbs and standard deviation 78 lbs Draw a sketch illustrating the probability that one randomly selected male between the ages of 20 and 40 will have a right-hand grip strength (a) over 90 lbs, and find this probability; (b) under 96 lbs, and find this probability; (c) over 70 lbs, and find this probability; (d) under 75 lbs, and find this probability; (e) between 85 and 100 lbs, and find this probability; (f) between 88 and 95 lbs, and find this probability; (g) between 75 and 82 lbs, and find this probability (h) Draw a sketch illustrating the probability that the right-hand grip strength for one randomly selected male is within 4 lbs of the population mean, and find this probability (i) Find the percentile rank for a male whose right-hand grip strength is 90 lbs (j) Find the quartiles for the distribution of right-hand grip strengths Answers to Self-Test Problems 16-1 (a) or 3192% (b) or 8925% (c) or 9817% (d) or 735% (e) or 5283% (f) or 2815% (g) or 2177% (h) or 3900% (i) 6808 or 68 (j) The quartiles are approximately 811, 863, and 915 lbs Summary A normal curve described entirely in terms of z-scores is called a standard normal curve Tables of standard normal probabilities (eg, Table A2) provide a detailed description of a normal density curve in terms of z-scores Using such a table together with the fact that a normal distribution is symmetric, we can obtain areas under a normal density curve, which can be interpreted as relative frequencies or as probabilities, and we can obtain percentile ranks and percentiles Statistical software packages, spreadsheets, programmable calculators, etc are often able to supply the same information as a table of standard normal probabilities Since the sampling distribution of x with simple random samples of sufficiently large size n, takes on the properties of a normal distribution, the sampling distribution of x can be approximated using a normal distribution 112

Unit 21 Student s t Distribution in Hypotheses Testing Objectives: To understand the difference between the standard normal distribution and the Student's t distributions To understand the difference between

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 1.3 Homework Answers 1.80 If you ask a computer to generate "random numbers between 0 and 1, you uniform will get observations

The Normal Distribution Cal State Northridge Ψ320 Andrew Ainsworth PhD The standard deviation Benefits: Uses measure of central tendency (i.e. mean) Uses all of the data points Has a special relationship

6-2 The Standard Normal Distribution This section presents the standard normal distribution which has three properties: 1. Its graph is bell-shaped. 2. Its mean is equal to 0 (μ = 0). 3. Its standard deviation

Lesson 7 Z-Scores and Probability Outline Introduction Areas Under the Normal Curve Using the Z-table Converting Z-score to area -area less than z/area greater than z/area between two z-values Converting

Chapter 4 The Standard Deviation as a Ruler and the Normal Model The standard deviation is the most common measure of variation; it plays a crucial role in how we look at data. Z- scores measure standard

Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution

Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1: THE NORMAL CURVE AND "Z" SCORES: The Normal Curve: The "Normal" curve is a mathematical abstraction which conveniently

NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

Chapter 3 Normal Distribution Density curve A density curve is an idealized histogram, a mathematical model; the curve tells you what values the quantity can take and how likely they are. Example Height

32 Measures of Central Tendency and Dispersion In this section we discuss two important aspects of data which are its center and its spread. The mean, median, and the mode are measures of central tendency

Chapter 6 The Standard Deviation as a Ruler and the Normal Model Copyright 2012, 2008, 2005 Pearson Education, Inc. The Standard Deviation as a Ruler The trick in comparing very different-looking values

Unit 9 Describing Relationships in Scatter Plots and Line Graphs Objectives: To construct and interpret a scatter plot or line graph for two quantitative variables To recognize linear relationships, non-linear

6.3 Applications of Normal Distributions Objectives: 1. Find probabilities and percentages from known values. 2. Find values from known areas. Overview: This section presents methods for working with normal

Density Probability Models for Continuous Random Variables At right you see a histogram of female length of life. (Births and deaths are recorded to the nearest minute. The data are essentially continuous.)

Discrete vs Continuous Data The Normal Curve and The Sampling Distribution We have seen examples of probability distributions for discrete variables X, such as the binomial distribution. We could use it

Unit 24 Hypothesis Tests about Means Objectives: To recognize the difference between a paired t test and a two-sample t test To perform a paired t test To perform a two-sample t test A measure of the amount

13.2 Measures of Central Tendency Measures of Central Tendency For a given set of numbers, it may be desirable to have a single number to serve as a kind of representative value around which all the numbers

10-3 Measures of Central Tendency and Variation So far, we have discussed some graphical methods of data description. Now, we will investigate how statements of central tendency and variation can be used.

ean and edian We discuss the mean and the median, two important statistics about a distribution. The edian The median is the halfway point of a distribution. It is the point where half the population has

Sociology 301 Exam Review Liying Luo 03.22 Exam Review: Logistics Exams must be taken at the scheduled date and time unless 1. You provide verifiable documents of unforeseen illness or family emergency,

CHAPTER SEVEN Hypothesis Testing with z Tests NOTE TO INSTRUCTOR This chapter is critical to an understanding of hypothesis testing, which students will use frequently in the coming chapters. Some of the

Chapter 7 What to do when you have the data We saw in the previous chapters how to collect data. We will spend the rest of this course looking at how to analyse the data that we have collected. Stem and

This is Descriptive Statistics, chapter from the book Beginning Statistics (index.html) (v..). This book is licensed under a Creative Commons by-nc-sa. (http://creativecommons.org/licenses/by-nc-sa/./)

ch6apractest Using the following uniform density curve, answer the question. 1) What is the probability that the random variable has a value greater than 2? 1) A) 0.625 B) 0.875 C) 0.700 D) 0.750 2) What

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

1 Methods for Describing Data Sets.1 Describing Data Graphically In this section, we will work on organizing data into a special table called a frequency table. First, we will classify the data into categories.

MODE The mode of the sample is the value of the variable having the greatest frequency. Example: Obtain the mode for Data Set 1 77 For a grouped frequency distribution, the modal class is the class having

8. THE NORMAL DISTRIBUTION The normal distribution with mean μ and variance σ 2 has the following density function: The normal distribution is sometimes called a Gaussian Distribution, after its inventor,

Unit 26 Estimation with Confidence Intervals Objectives: To see how confidence intervals are used to estimate a population proportion, a population mean, a difference in population proportions, or a difference

Using Your TI-NSpire Calculator: Normal Distributions Dr. Laura Schultz Statistics I Always start by drawing a sketch of the normal distribution that you are working with. Shade in the relevant area (probability),

1 Central Tendency CENTRAL TENDENCY: A statistical measure that identifies a single score that is most typical or representative of the entire group Usually, a value that reflects the middle of the distribution

Unit 2 Review Name Use the given frequency distribution to find the (a) class width. (b) class midpoints of the first class. (c) class boundaries of the first class. 1) Miles (per day) 1-2 9 3-4 22 5-6

CHAPTER 7: THE CENTRAL LIMIT THEOREM Exercise 1. Yoonie is a personnel manager in a large corporation. Each month she must review 16 of the employees. From past experience, she has found that the reviews

Statistical Inference Idea: Estimate parameters of the population distribution using data. How: Use the sampling distribution of sample statistics and methods based on what would happen if we used this

Unit 22 One-Sided and Two-Sided Hypotheses Tests Objectives: To differentiate between a one-sided hypothesis test and a two-sided hypothesis test about a population proportion or a population mean To understand

STATISTICS FOR PSYCH MATH REVIEW GUIDE ORDER OF OPERATIONS Although remembering the order of operations as BEDMAS may seem simple, it is definitely worth reviewing in a new context such as statistics formulae.

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

Page 1 of 16 Chapter 2: Exploring Data with Graphs and Numerical Summaries Graphical Measures- Graphs are used to describe the shape of a data set. Section 1: Types of Variables In general, variable can

Numerical Measures of Central Tendency Often, it is useful to have special numbers which summarize characteristics of a data set These numbers are called descriptive statistics or summary statistics. A

Interpreting Data in Normal Distributions This curve is kind of a big deal. It shows the distribution of a set of test scores, the results of rolling a die a million times, the heights of people on Earth,

Week 3&4: Z tables and the Sampling Distribution of X 2 / 36 The Standard Normal Distribution, or Z Distribution, is the distribution of a random variable, Z N(0, 1 2 ). The distribution of any other normal

What Does the Normal Distribution Sound Like? Ananda Jayawardhana Pittsburg State University ananda@pittstate.edu Published: June 2013 Overview of Lesson In this activity, students conduct an investigation

Chapter 4. robability and robability Distributions Importance of Knowing robability To know whether a sample is not identical to the population from which it was selected, it is necessary to assess the

Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

Chapter 5 Section 5.1: Central Tendency Mode: the number or numbers that occur most often. Median: the number at the midpoint of a ranked data. Example 1: The test scores for a test were: 78, 81, 82, 76,

Areas in Intervals ID: 9472 TImath.com Time required 30 minutes Activity Overview In this activity, students use several methods to determine the probability of a given normally distributed value being

Unit 7: Normal Curves Summary of Video Histograms of completely unrelated data often exhibit similar shapes. To focus on the overall shape of a distribution and to avoid being distracted by the irregularities

Cents and the Central Limit Theorem Overview of Lesson In this lesson, students conduct a hands-on demonstration of the Central Limit Theorem. They construct a distribution of a population and then construct