2 Topic: Goodness of Fit (GoF) tests Goals: Learn how to use and interpret the following tests Chi-square Kolmogorov-Smirnov, Anderson-Darling, Shapiro-Wilk Assignments: Use the data file Lab3_data_sets.MTW from the lab webpage. 1. Chi-square test for a uniform [0,1] sample in column C1. Perform the chi-square test with 3, 5, and 10 categories. Find the P- values, compare, discuss. 2. Anderson-Darling test for two exponential samples in columns C2 and C3. (Use the probability plot option, which shows the AD test results.) The two samples are from the exponential distribution with mean 5; C2 has length 100, while C3 has length For each sample, perform the AD test several times for different means of the theoretical distribution and find the limits of the mean corresponding to the P-value of above 5%. Compare results, discuss. 3. Chi-square test for a Normal(10,2) sample in column C4. Perform the chi-square test for the normal sample, using the cdf transformation that produces a uniform sample. Compare the chisquare P-value with that of KS, AD, and SW tests. Discuss. 4. Chi-square test for three samples in columns C7-C9. One of these samples in from the U[0,1] distribution, second significantly deviates from the U[0,1], and the third is overfit to the U[0,1]. Use the chi-square test to decide which sample is which. Report: A printed report for this Lab is due on Thursday, March 3 in class. BW printouts are OK. Reports will not be accepted by mail. Page 2

3 1. Introduction The methods considered in this Lab are focused on the following problem: Given a sample X i, i=1,,n and a distribution (cdf) F(x) decide whether the sample is coming from this distribution. In other words we test the hypothesis versus an alternate hypothesis H 0 : X i, i =1,,n are from the distribution F(x) H a : X i, i =1,,n are not from the distribution F(x). The considered methods will complement visual analysis that uses histograms, dot plots, ecdfs, and probability plots. We will be focused here on continuous distributions F(x). 2. Chi-square test Recall that the essence of the chi-square test is to consider the observed numbers O i of the sample values within the predefined k bins [x i-1, x i ], i=1,,k, and compare them with the expected numbers E i = n[f(x i ) F(x i-1 )]. The test statistic used in the chi-square test is defined as 2 χ ( O E ) 2 k i i =. i= 1 Ei It can be shown that if all E i > 5 and we have sufficiently large sample, the statistic χ 2 is distributed approximately as a chi-square random variable with (k-1) degrees of freedom, χ 2 k-1. Preliminary data transformations The chi-square test in Minitab works with multinomial distributions, thus if we want to apply it to continuous data, some preliminary work should be done. Specifically, we need to transform our continuous sample into a multinomial sample. For that we need to code our data replacing each sample value with the category (bin) index this value belongs to. Case 1: F(x) is the uniform distribution on [0,1] For the uniform distribution it is convenient to choose an equidistant binning: [0, 1/k), [1/k, 2/k),, [1-1/k, 1). Page 3

4 The coding can be done using the Calc/Calculator: The function floor(x*k) will map the sample values to their class indices (using k equidistant bins.) Case 2: Arbitrary continuous distribution F(x) If F(x) is not the uniform [0,1], we define U i = F(X i ). If the sample is actually from the distribution F(x) (if the null hypothesis is true), then the sample U i has the uniform distribution on [0,1] and we can apply the technique of Case 1. The chi-square test is implemented in menu Stat/Tables/Chi-Square Goodness-of-Fit test (One Variable): Page 4

5 We work with the index variable (not the original one!), and choose the Categorical data radio button; the test is Equal proportions: The test results in some graphs (will be discussed in class) and the following (or similar) session outcome, with details of the analysis, and the resulting P-value: Advantage of chi-square test: Can work with any distribution (discrete or continuous). Disadvantages of chi-square test: Requires a large number of observations (to ensure convergence). Results depend on the chosen bins. Page 5

6 2. Kolmogorov-Smirnov, Anderson-Darling, and Shapiro-Wilk tests These tests are implemented in Minitab only for testing the Normal distribution (although KS ad AD tests can be applied to other distributions as well). The tests can be accessed via menu Stat/Basic Statistics/Normality Test: Page 6

7 In the next menu, you choose the variable to analyze, test to perform, and some other options that will be discussed in class: The test results are summarized in the output figure: Page 7

Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

Overview In this activity, you will look at a setting that involves categorical data and determine which is the appropriate chi-square test to use. You will input data into a list or matrix and conduct

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. In practice, quality professionals sometimes

125: Chi-Square Goodness of Fit Tests CD12-1 125: CHI-SQUARE GOODNESS OF FIT TESTS In this section, the χ 2 distribution is used for testing the goodness of fit of a set of data to a specific probability

Minitab Guide This packet contains: A Friendly Guide to Minitab An introduction to Minitab; including basic Minitab functions, how to create sets of data, and how to create and edit graphs of different

MATH 10: Elementary Statistics and Probability Chapter 11: The Chi-Square Distribution Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

13.2 The Chi Square Test for Homogeneity of Populations The setting: Used to compare distribution of proportions in two or more populations. Data is organized in a two way table Explanatory variable (Treatments)

11-2 Goodness of Fit Test In This section we consider sample data consisting of observed frequency counts arranged in a single row or column (called a one-way frequency table). We will use a hypothesis

Chapter 3 RANDOM VARIATE GENERATION In order to do a Monte Carlo simulation either by hand or by computer, techniques must be developed for generating values of random variables having known distributions.

STAT -50 Introduction to Statistics The Chi-Square Test The Chi-square test is a nonparametric test that is used to compare experimental results with theoretical models. That is, we will be comparing observed

Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

Comparing Multiple Proportions, Test of Independence and Goodness of Fit Content Testing the Equality of Population Proportions for Three or More Populations Test of Independence Goodness of Fit Test 2

Statistics 641 - EXAM II - 1999 through 2003 December 1, 1999 I. (40 points ) Place the letter of the best answer in the blank to the left of each question. (1) In testing H 0 : µ 5 vs H 1 : µ > 5, the

Using CrunchIt (http://bcs.whfreeman.com/crunchit/bps4e) or StatCrunch (www.calvin.edu/go/statcrunch) 1. In general, this package is far easier to use than many statistical packages. Every so often, however,

HOW TO USE MINITAB: INTRODUCTION AND BASICS 1 Noelle M. Richard 08/27/14 CONTENTS * Click on the links to jump to that page in the presentation. * 1. Minitab Environment 2. Uploading Data to Minitab/Saving

MA217 Fall 2010 Project #2: Reviewing Chapters 9 to 11: Sampling and Hypothesis Testing Total 50 points The goal of this project is for your group (up to 3 people total) to design and run two hypothesis

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

CHAPTER IV FINDINGS AND CONCURRENT DISCUSSIONS Hypothesis 1: People are resistant to the technological change in the security system of the organization. Hypothesis 2: information hacked and misused. Lack

1 Statistical Inference and t-tests Objectives Evaluate the difference between a sample mean and a target value using a one-sample t-test. Evaluate the difference between a sample mean and a target value

Analysis of categorical data: Course quiz instructions for SPSS The dataset Please download the Online sales dataset from the Download pod in the Course quiz resources screen. The filename is smr_bus_acd_clo_quiz_online_250.xls.

Economic Order Quantity and Economic Production Quantity Models for Inventory Management Inventory control is concerned with minimizing the total cost of inventory. In the U.K. the term often used is stock

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

CATEGORICAL DATA Chi-Square Tests For Univariate Data 1 CATEGORICAL DATA Chi-Square Tests for Univariate Data Recall that a categorical variable is one in which the possible values are categories or groupings.

University of Pennsylvania ScholarlyCommons Wharton Research Scholars Journal Wharton School 4-1-2004 Analysis of Call Center Data Yu Chu Cheng University of Pennsylvania This paper is posted at ScholarlyCommons.

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS CHI-SQUARE TESTS OF INDEPENDENCE (SECTION 11.1 OF UNDERSTANDABLE STATISTICS) In chi-square tests of independence we use the hypotheses. H0: The variables are independent

General instructions for the content of all StatTools assignments and the use of StatTools: An important part of Business Management 330 is learning how to conduct statistical analyses and to write text

1 ED632G: Research/Applied Educational Psychology This tutorial is designed to help ED632G students have a better understanding on how to run a general pre-test vs. posttest or improvement over semesters

2. DATA AND EXERCISES (Geos2911 students please read page 8) 2.1 Data set The data set available to you is an Excel spreadsheet file called cyclones.xls. The file consists of 3 sheets. Only the third is

Mega Millions Lottery Minitab Project 1 The Mega Millions game is a lottery game that is played by picking 5 balls between 1 and 52 and additional megaball between 1 and 52. The cost per game is $1. The

Stats for Strategy HOMEWORK 3 (Topics 4 and 5) (revised spring 2015) DIRECTIONS Data files are available from the main Stats website for many exercises. (Smaller data sets for other exercises can be typed

BONUS CHAPTER Chi-Square Tests In the previous chapters, we explored the wonderful world of hypothesis testing as we compared means and proportions of one, two, three, and more populations, making an educated

1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

The Chi-square test when the expected frequencies are less than 5 Wai Wan Tsang and Kai Ho Cheng Department of Computer Science, The University of Hong Kong, Pokfulam Road, Hong Kong {tsang, khcheng3}@cs.hku.hk

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

1 Chapter 13 Chi-Square This section covers the steps for running and interpreting chi-square analyses using the SPSS Crosstabs and Nonparametric Tests. Specifically, we demonstrate procedures for running

Using Your TI-NSpire Calculator: Descriptive Statistics Dr. Laura Schultz Statistics I This handout is intended to get you started using your TI-Nspire graphing calculator for statistical applications.

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

Statistical graphing in spreadsheets Zhong Guan Department of Mathematical Sciences Indiana University South Bend PO Box 7111 South Bend IN 46634 USA zguan@iusb.edu Spreadsheet program such as Microsoft

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

SPSS: Expected frequencies, chi-squared test. In-depth example: Age groups and radio choices. Dealing with small frequencies. Quick Example: Handedness and Careers Last time we tested whether one nominal

Descriptive Statistics Descriptive statistics consist of methods for organizing and summarizing data. It includes the construction of graphs, charts and tables, as well various descriptive measures such

Stat 104: Quantitative Methods for Economists Class 26: Chi-Square Tests 1 Two Techniques The first is a goodness-of-fit test applied to data produced by a multinomial experiment, a generalization of a

Chapter 302 Point-Biserial and Biserial Correlations Introduction This procedure calculates estimates, confidence intervals, and hypothesis tests for both the point-biserial and the biserial correlations.

EXAMPLES - SAMPLING DISTRIBUTION EXCEL INSTRUCTIONS This exercise illustrates the process of the sampling distribution as stated in the Central Limit Theorem. Enter the actual data in Column A in MICROSOFT

Lab 6: Sampling Distributions and the CLT Objective: The objective of this lab is to give you a hands- on discussion and understanding of sampling distributions and the Central Limit Theorem (CLT), a theorem

F09 Biol 322 chi square notes 1. Before proceeding with the chi square calculation, clearly state the genetic hypothesis concerning the data. This hypothesis is an interpretation of the data that gives

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

Guide for SPSS for Windows Index Table Open an existing data file Open a new data sheet Enter or change data value Name a variable Label variables and data values Enter a categorical data Delete a record