Statistics for Quantitative Analysis

Similar presentations

1 Statistics for Quantitative AnalysisCHM 235 – Dr. SkrabalStatistics for Quantitative AnalysisStatistics: Set of mathematical tools used to describe and make judgments about dataType of statistics we will talk about in this class has important assumption associated with it:Experimental variation in the population from which samples are drawn has a normal (Gaussian, bell-shaped) distribution.- Parametric vs. non-parametric statistics

2 Normal distribution Infinite members of group: populationCharacterize population by taking samplesThe larger the number of samples, the closer the distribution becomes to normalEquation of normal distribution:

4 Normal distributionDegree of scatter (measure of central tendency) of population is quantified by calculating the standard deviationStd. dev. of population = Std. dev. of sample = sCharacterize sample by calculating

6 Standard deviation and the normal distributionThere is a well-defined relationship between the std. dev. of a population and the normal distribution of the population: ± 1 encompasses 68.3 % of measurements ± 2 encompasses 95.5% of measurements ± 3 encompasses 99.7% of measurements(May also consider these percentages of area under the curve)

7 Example of mean and standard deviation calculationConsider Cu data: 5.23, 5.79, 6.21, 5.88, 6.02 nM= nM  5.82 nMs = nM  0.36 nMAnswer: 5.82 ± 0.36 nM or 5.8 ± 0.4 nMLearn how to use the statistical functions on your calculator. Do this example by longhand calculation once, and also by calculator to verify that you’ll get exactly the same answer. Then use your calculator for all future calculations.

9 Standard errorTells us that standard deviation of set of samples should decrease if we take more measurementsStandard error =Take twice as many measurements, s decreases byTake 4x as many measurements, s decreases byThere are several quantitative ways to determine the sample size required to achieve a desired precision for various statistical applications. Can consult statistics textbooks for further information; e.g. J.H. Zar, Biostatistical Analysis

10 Variance Used in many other statistical calculations and testsVariance = s2From previous example, s = 0.36s2 = (0.36)2 = (not rounded because it is usually used in further calculations)

11 Average deviationAnother way to express degree of scatter or uncertainty in data. Not as statistically meaningful as standard deviation, but useful for small samples.Using previous data:

13 Some useful statistical testsTo characterize or make judgments about dataTests that use the Student’s t distributionConfidence intervalsComparing a measured result with a “known” valueComparing replicate measurements (comparison of means of two sets of data)

15 Confidence intervalsQuantifies how far the true mean () lies from the measured mean, . Uses the mean and standard deviation of the sample.where t is from the t-table and n = number of measurements.Degrees of freedom (df) = n - 1 for the CI.

17 Interpreting the confidence intervalFor a 95% CI, there is a 95% probability that the true mean () lies between the range 1.3 ± 0.2 nM, or between 1.1 and 1.5 nMFor a 50% CI, there is a 50% probability that the true mean lies between the range 1.3 ± 0.05 nM, or between 1.25 and 1.35 nMNote that CI will decrease as n is increasedUseful for characterizing data that are regularly obtained; e.g., quality assurance, quality control

18 Comparing a measured result with a “known” value“Known” value would typically be a certified value from a standard reference material (SRM)Another application of the t statisticWill compare tcalc to tabulated value of t at appropriate df and CL.df = n -1 for this test

20 Comparing replicate measurements or comparing means of two sets of dataYet another application of the t statisticExample: Given the same sample analyzed by two different methods, do the two methods give the “same” result?Will compare tcalc to tabulated value of t at appropriate df and CL.df = n1 + n2 – 2 for this test

22 Comparing replicate measurements or comparing means of two sets of data—exampleNote: Keep 3 decimal places to compare to ttable.Compare to ttable at df = – 2 = 6 and 95% CL.ttable(df=6,95% CL) = 2.447If |tcalc|  ttable, results are not significantly different at the 95%. CL.If |tcalc|  ttable, results are significantly different at the 95% CL.Since |tcalc| (5.056)  ttable (2.447), results from the two methods are significantly different at the 95% CL.

23 Comparing replicate measurements or comparing means of two sets of dataWait a minute! There is an important assumption associated with this t-test:It is assumed that the standard deviations (i.e., the precision) of the two sets of data being compared are not significantly different.How do you test to see if the two std. devs. are different?How do you compare two sets of data whose std. devs. are significantly different?

24 F-test to compare standard deviationsUsed to determine if std. devs. are significantly different before application of t-test to compare replicate measurements or compare means of two sets of dataAlso used as a simple general test to compare the precision (as measured by the std. devs.) of two sets of dataUses F distribution

28 Comparing replicate measurements or comparing means of two sets of data-- revisitedThe use of the t-test for comparing means was justified for the previous example because we showed that standard deviations of the two sets of data were not significantly different.If the F-test shows that std. devs. of two sets of data are significantly different and you need to compare the means, use a different version of the t-test 

29 Comparing replicate measurements or comparing means from two sets of data when std. devs. are significantly different

30 Flowchart for comparing means of two sets of data or replicate measurementsUse F-test to see if std. devs. of the 2 sets of data are significantly different or notStd. devs. are significantly differentStd. devs. are not significantly differentUse the 2nd version of the t-test (the beastly version)Use the 1st version of the t-test (see previous, fully worked-out example)

31 One last comment on the F-testNote that the F-test can be used to simply test whether or not two sets of data have statistically similar precisions or not.Can use to answer a question such as: Do method one and method two provide similar precisions for the analysis of the same analyte?

32 Evaluating questionable data points using the Q-testNeed a way to test questionable data points (outliers) in an unbiased way.Q-test is a common method to do this.Requires 4 or more data points to apply.Calculate Qcalc and compare to QtableQcalc = gap/rangeGap = (difference between questionable data pt. and its nearest neighbor)Range = (largest data point – smallest data point)