2 Objectives Learning Objective Performance Objectives- To understand the topic on Normal Distribution and its importance in different disciplines.Performance ObjectivesAt the end of this lecture the student will be able to:Draw normal distribution curves and calculate the standard score (z score)Apply the basic knowledge of normal distribution to solve problems.Interpret the results of the problems.Tripthi M. Mathew, MD, MPH

4 What is Normal (Gaussian) Distribution?The normal distribution is a descriptive modelthat describes real world situations.It is defined as a continuous frequency distribution of infinite range (can take any values not just integers as in the case of binomial and Poisson distribution).This is the most important probability distribution in statistics and important tool in analysis of epidemiological data and management science.The Normal distribution is also known as the Gaussian Distribution and the curve is also known as the Gaussian Curve, named after German Mathematician-Astronomer Carl Frederich Gauss.Tripthi M. Mathew, MD, MPH

5 Characteristics of Normal DistributionIt links frequency distribution to probability distributionHas a Bell Shape Curve and is SymmetricIt is Symmetric around the mean:Two halves of the curve are the same (mirror images)Tripthi M. Mathew, MD, MPH

6 Characteristics of Normal Distribution Cont’dHence Mean = MedianThe total area under the curve is 1 (or 100%)Normal Distribution has the same shape as Standard Normal Distribution.Tripthi M. Mathew, MD, MPH

8 Z Score (Standard Score)3Z = X - μZ indicates how many standard deviations away from the mean the point x lies.Z score is calculated to 2 decimal places.σThe relationship between the normal variable X and Z score is given by the Z score or standard score. Mu (μ) is the mean and sigma (σ) is the standard deviation of the population.Tripthi M. Mathew, MD, MPH

9 Tables Areas under the standard normal curve (See Normal Table)The value of z can be calculated using the Z score. The z value can also be found in tables on standard normal curve or normal distribution curve which can be found in the appendices of most statistics or modelling textbooks.Tripthi M. Mathew, MD, MPH

10 Diagram of Normal Distribution Curve (z distribution)33.35%13.6%2.2%0.15μThis is the diagram of a normal distribution curve or z distribution. Note the bell shape of the curve and that its ends/tail don’t touch the horizontal axis below. As I mentioned earlier, the area under the curve equals 1 or 100%. Therefore, each half of the distribution from the center (that is from the mean is equal to 50%. Thus, the area from/above the mean up to 1 standard deviation is equal to 33.35%, area above +1 standard deviation is equal to 13.6%, the area above +2 standard deviation is equal to 2.2% and area above +3 standard deviations is equal to 0.1%. Since the other half is a mirror image, the percentage/proportion of area above -1 standard deviation is the same as the area above + 1 standard deviation i.e. it is 33.35%. And -2 standard deviation=+2 standard deviation and so forth….Modified from Dawson-Saunders, B & Trapp, RG. Basic and Clinical Biostatistics, 2nd edition, 1994.Tripthi M. Mathew, MD, MPH

11 Distinguishing FeaturesThe mean ± 1 standard deviation covers 66.7% of the area under the curveThe mean ± 2 standard deviation covers 95% of the area under the curveThe mean ± 3 standard deviation covers 99.7% of the area under the curveTherefore, we can see that the mean +/- SD contains 66.7% of the area under the curve. i.e. total area of + 1SD and -1SD is equal to 66.7% (33.35% %). Similarly, the mean +/- 2 SD contains 95% of the area under the normal curve and the mean +/- 3 standard deviations contains 99.7% of the area under the normal curve.Tripthi M. Mathew, MD, MPH

12 Skewness Positive Skewness: Mean ≥ MedianNegative Skewness: Median ≥ MeanPearson’s Coefficient of Skewness3:= 3 (Mean –Median)Standard deviationAs I mentioned earlier the Mean = Median in the normal distribution curve. However, when the Mean is greater than the median there is a positive skewness (that is the tail is to the right) to the normal distribution curve. And when the Median is greater then the mean, the tail is skewed to the left, that there is negative skewness.The skewness can be calculated using the Pearson’s Coefficient of Skewness.Tripthi M. Mathew, MD, MPH

16 Exercise # 1 Then: 1) What area under the curve is above 80 beats/min?Now we know, Z =X-M/SD Z=? X=80, M= 70, SD=10 . So we have to find the value of Z. For this we need to draw the figure…..and find the area which corresponds to Z.Modified from Dawson-Saunders, B & Trapp, RG. Basic and Clinical Biostatistics, 2nd edition, 1994.Tripthi M. Mathew, MD, MPH

17 Diagram of Exercise # 1 13.6% 0.15 -3 -2 -1 μ 1 2 3 33.35% 2.2% 0.159μ0.159Since M=70, then the area under the curve which is above 80 beats per minute corresponds to above + 1 standard deviation. The total shaded area corresponding to above 1+ standard deviation in percentage is 15.9% or Z= 15.9/100 = Or we can find the value of z by substituting the values in the formula Z= X-M/ standard deviation. Therefore, Z= 70-80/ /10= is the same as The value of z from the table for 1.00 is How do we interpret this? This means that 15.9% of normal healthy individuals have a heart rate above one standard deviation (greater than 80 beats per minute).The exercises are modified from examples in Dawson-Saunders, B & Trapp, RG. Basic and Clinical Biostatistics, 2nd edition, 1994.Tripthi M. Mathew, MD, MPH

25 Diagram of Exercise # 5 13.6% 0.15 -3 -2 -1 μ 1 2 3 33.35% 2.2%μIn this question, we need to calculate Z1 and Z2. Therefore, Z1 =70-40/10, which is equal to 3. The z value of 3 isSimilarly, the value of Z2 is /10 which is equal to -3. Thus, the value of z is 0.015And so, Z1+Z2 is equal to 0.3%. But how do we interpret this value? Please see the solution/answer #5 slide for its interpretation.0.0150.015The exercises are modified from examples in Dawson-Saunders, B & Trapp, RG. Basic and Clinical Biostatistics, 2nd edition, 1994.Tripthi M. Mathew, MD, MPH

26 Solution/Answers 1) 15.9% or 0.159 2) 2.3% or 0.023 3) 95.4% or 0.954Calculation of the problems and Interpretation of results:For calculation of exercise # 1 see earlier slide. The result of exercise # 1 is 15.9%. This means that 15.9% of normal healthy individuals have a heart rate above one standard deviation (greater than 80 beats per minute).2) Calculation for exercise #2Z = X- μ z = = 20/10 = If we look at the normal distribution tables, then the z value of 2.00 corresponds to or 2.3%σThis means that 2.3% of normal healthy individuals have a heart rate above two standard deviation (greater than 90 beats per minute).3) Calculation for exercise # 3Z = X- μ Z1 = = -20/10 = and Z2= 90-70/10 =2.00. The area between -2 standard deviations and +2 standard deviations fromThe z tables is or 95.4%.This means that 95.4% have a heart rate between -2 and +2 standard deviations (between beats per minute).The exercises are modified from examples in Dawson-Saunders, B & Trapp, RG. Basic and Clinical Biostatistics, 2nd edition, 1994.Tripthi M. Mathew, MD, MPH

27 Solution/Answers Cont’d4) 0.15 % or 0.0155) 0.3 % or (for each tail)4) Calculation for exercise #4Again, z = X- μ = = 30/10 = From the z tables the value of 3.00 corresponds to or 0.15%σThis means that only 0.15% have a heart rate above 3 standard deviations (greater than 100 beats per minute).5) For calculations for this question, please see the earlier slide on this problem and diagram. The answer is 0.3%. This means that only 0.3% have a heart rate either below or above 3 standard deviations (less than 40 or greater than 100 beats per minute).The exercises are modified from examples in Dawson-Saunders, B & Trapp, RG. Basic and Clinical Biostatistics, 2nd edition, 1994.Tripthi M. Mathew, MD, MPH

28 Application/Uses of Normal DistributionIt’s application goes beyond describing distributionsIt is used by researchers and modelers.The major use of normal distribution is the role it plays in statistical inference.The z score along with the t –score, chi-square and F-statistics is important in hypothesis testing.It helps managers/management make decisions.Tripthi M. Mathew, MD, MPH