Multiple Regression Analysis (ANCOVA)

Transcription

1 Chapter 16 Multiple Regression Analysis (ANCOVA) In many cases biologists are interested in comparing regression equations of two or more sets of regression data. In these cases, the interest is in whether the sample slopes (b) are estimates of the same or different population slopes (β). If it is concluded that the regression lines are parallel, that is that the slopes are equal, then there is often interest in determining whether the regressions have the same elevation (i.e. same y-intercept, α) and coincide. The test for coincidence (elevation, or same y-intercept) can only be carried out for those lines that have been demonstrated to have no significant statistical difference in slope. In cases where there are only two lines, a t-test can be used to test the difference between slopes, and if the slopes are equal, then another t-test can be used to determine if the difference in elevation is significant. However, t-tests cannot be used when there are more than two lines; thus, t-tests are limited in their application and their use is not covered in this course. The procedure, which must be used in comparing more than two lines, is known as an analysis of covariance (ANCOVA). It allows for testing Ho: β 1 = β 2 = β 3 = β k with an alternative hypothesis that the regression lines were not derived from samples estimating populations which all had equal slopes. ANCOVA can also be used to compare only two lines where Ho: β 1 = β 2 and Ha: β 1 β 2. Furthermore, ANCOVA also allows for testing whether the elevations are equal for those regression lines, which are demonstrated to come from populations with equal slopes. The null hypothesis is simply that the elevations are equal (the lines coincide) while the alternative hypothesis is that the elevations are not equal (the lines do not coincide). To illustrate how ANCOVA works, let s consider the relationship of oxygen consumption to air temperature in two bird species that co-occur in semi-desert areas of the southwest (Table 16.1, Figure 16.1). Inspection of the figure indicates that: a) there appears to be a negative relationship between oxygen consumption and ambient air temperature in both species, and b) the relationships appear very similar for the two species. The first step in the analysis of these data is to test if there is a significant relationship between air temperature and oxygen consumption for each species; in other words test Ho: β = 0. If Ho is accepted for both bird species, then one concludes that there is no relationship between oxygen consumption and air temperature (the Y and X axes) in either of these two species, and regression statistics are inappropriate. The appropriate analysis would then be to use an ANOVA to determine if the mean Y values (e.g. oxygen consumption) came from the same population. If Ho: β = 0 is accepted for one species but not the other then the analysis is complete and it is concluded that there is a relationship between oxygen consumption and air temperature for only one species. Finally if Ho: β = 0 is rejected for both species, then ANCOVA can be applied to determine if the relationships are the same for the two species. Table Oxygen consumption (ml O 2 g -1 h -1 ) of the cardinal and pyrrhuloxia at various air temperatures ( C) C 16.0 C 20.1 C 22.5 C 25.0 C 27.5 C Cardinal

2 Pyrrhuloxia Oxygen Consumption (ml/g/hr) Temperature (C) Figure Relationship of oxygen consumption (ml O 2 /g/h) and air temperature ( C) in two species of bird. The regression analysis done on the two sets of data using the Excel template Regression.XLT is shown in Table In both bird species r 2 (the ratio of SSregression to SStotal) is quite high which indicates that the relationship between Y and X explains most of the variation between the mean Y s. Furthermore, the F values for the Ho: β = 0 are quite high with P values well below the 0.05 level. Biologically it can be concluded that there is a relationship between oxygen consumption and ambient air temperature in both bird species, and regression statistics are appropriate for these data. Table The relationship between oxygen consumption (ml O 2 g -1 h -1 ) and air temperature ( C) for two Arizona bird species. n equation r 2 s a s b S yx Cardinal 6 = ( C) Pyrrhuloxia 6 = ( C) The next step would be to compare the relationships in the two species of birds to see if the slopes or regression coefficients are statistically equal (Ho: β 1 = β 2 ). This is done using an Excel template ANCOVA&SNK.XLT (Table 16.3) which is based on calculations done in the regression analysis. An F statistic is computed as F = MS Regression Coefficient / MS Pooled [16.1] and this is a one-sided test of Ho: MSregression coefficient MSpooled where Ha: MSregression coefficient > MSpooled. If the latter Ho is accepted then the slopes are 96

3 statistically equal; however, if Ho: MSregression coefficient MSpooled is rejected then the slopes are considered to be statistically ( significantly ) not equal. The pooled MS can be thought of as the average MS residual of all the lines involved. The regression coefficient MS is the difference between the unexplained variation of an average slope (common MS) and the unexplained variation averaged from all the lines involved (pooled MS). It basically expresses the variation resulting form differences between slopes. The closer the slopes parallel one another the smaller the MS regression coefficient becomes, and it increase in value as the slopes diverge from one another. It would generally be expected to be smaller than the average unexplained variation of the individual lines; this leads to the Ho that the regression coefficient MS will be less than the pooled MS. Calculating ANCOVA Using EXCEL The first step using the Excel template ANCOVA&SNK.XLT is to copy all the uncorrected sum of squares that were calculated during each of the regression analyses (Figure 16.2). Figure Uncorrected sum of squares calculated during each regression analysis and transferred to the Excel ANCOVA template. From the uncorrected sum of squares, the template computes the MS pooled by summing the residual SS and df of each of the regressions as shown in Figure As always MS is computed by dividing SS by its df. The calculation of the MS regression coefficient requires that the common SS first be computed. Each of the corrected sums of squares (SdX2, SdXd, SdY2) of all the lines are summed and placed in the appropriate column in the row entitled common. These sums are then used to compute the common SS in the usual manner. The common df is equal to the pooled df plus k-1, where k is the number of lines involved in the analysis. In this example k = 2 and the common df = 9 (=8+1). The regression coefficient SS is simply the difference between the common SS and the pooled SS, and its df = k-1 where again k is the number of lines involved in the analysis. The F statistic calculated to test Ho: β 1 = β 2 using equation 16.1 is displayed in the ANCOVA output adjacent to the heading Ho: slopes equal. The P(F [α(1)=0.05,1,8] ) is also displayed and for this example is very high, and Ho: MS regression coefficient MS pooled is accepted which means that Ho: β 1 = β 2 must also be accepted. 97

4 Figure Corrected sums of squares and ANCOVA tests for equal slopes and elevations from the ANCOVA&SNK.XLT template. Because the slopes are equal, the elevations of the two lines may be validly compared. If the were not equal, the ANCOVA template would still generate the test statistic but the test would not be valid. The F statistic is generated for testing Ho: elevations equal is computed as F = MS Adjusted Means/ MS Common [16.2] Which again is a one-sided test of Ho: MS adjusted means MS common where Ha: MS adjusted means > MS common. If the latter Ho is accepted, then the elevations are statistically equal; however, if Ho: MS adjusted means MS common is rejected then the elevations are considered to be statistically ( significantly ) not equal, in other words they are significantly different. The MS adjusted means corresponds to the average difference in elevations between each of the lines involved in the ANCOVA. The adjusted means SS is computed as the difference between the total SS and the common SS. The df for the adjusted means is k-1 or the number of lines involved less one; this is the same df as for the regression coefficient. The total SS is computed by generating uncorrected sums for all the points of all the lines involved in the analysis; thus, all the data are treated as if there is one line and the regression statistics are generated on this one line. This can be done rather quickly by simply totaling the sums of X, Y and uncorrected sums of squares for each of the regression lines involved and then obtaining the corrected sums of squares. In the example (Table 16.1) the F statistic for Ho: elevations equal is displayed adjacent to the associated probability statement. The P(F [α(1)=0.05,1,9] = 2.25) is clearly above 98

5 the critical P = 0.05 level. Therefore, Ho: MS adjusted means MS common must be accepted which also means accepting Ho: elevations equal. The biological conclusions made about the two bird species can be summarized as follows: There was a linear relationship between oxygen consumption and ambient air temperature for each of the two bird species. The two bird species exhibit the same decrease in oxygen consumption per unit change in ambient air temperature. The elevations of the two lines were statistically similar; thus, at the same ambient air temperature both bird species exhibit no significant differences in oxygen consumption. Given that there are only two lines ANCOVA statistics can be used to a) test another hypothesis, and b) quantify the difference in elevation if the difference is significant. These options are covered below. If there are more than two lines, then multiple comparison tests have to be applied to determine where the differences in slope and elevation lie. One of these test, the Student-Newman-Keuls or SNK test is presented in the next chapter. When only two lines are being compared, ANCOVA calculations can be easily used to test if the variability about the two is similar. An F test statistic is computed by dividing the MS residuals of the two lines; the large MS residual is placed in the numerator and the smaller MS residual in the denominator. The dfs equal df of the numerator MS and df of the denominator MS. This is a two-sided F test with Ho: σyx 1 = σyx 2. In the example problem, F = MS residual 2 / MS residual 1 = (0.174/4) / (0.082/4) = [16.3] with 4 df in the numerator and 4 in the denominator. The P(F [α(2)=0.05,4,4] = 2.122) > 0.05, and therefore Ho: σyx 1 = σyx 2 is accepted. Thus, there is no difference in variability about the lines relating oxygen consumption to ambient air temperature in the Cardinal and Pyrrhuloxia. If the two lines do not have significantly different slopes but do have significantly different elevations, then the magnitude of the difference in elevations can be determined as the difference between the adjusted means of the two lines. The adjusted mean for a line is computed as Y adj = Y L - b c (X L X T ) [16.4] Where Y adj = adjusted mean Y for the line, Y L = mean of the Y values for the line, b c = common slope from the ANCOVA table, X L = mean of the X values for the line, and X T = mean of all X values for both lines. The magnitude of the difference in elevations is then computed by subtracting Y adj of one line from the other. This calculation is only meaningful if the elevations are significantly different. This was not the case in the two species example (Table 17.1). However, the adjusted means of each line can be computed to illustrate the calculation. In this example the mean of all the X values for both lines (X T, n=10) and the 99

6 mean of X values for each line (X L ) are the same value. Thus, the adjusted means for each line are simply their respective y-intercepts as shown here Cardinal Y adj = (-0.151) ( ) = Pyrrhuloxia Y adj = (-0.148) ( ) = The difference between the two lines, 0.153, is not significant as shown by acceptance of Ho: elevations are equal. Both the test of variability about the lines and quantifying the difference between lines can be conducted in cases where there are more than two lines. However, this is beyond the scope of this course (and also very tedious!). 100

7 ANCOVA + SNK Analysis of covariance, like ANOVA, only indicates if there are differences between slopes or elevations but it does not indicate where the difference lies. For example, Ha s for three lines may be one of the following: β 1 β 2 β 3 or β 1 β 2 = β 3 or β 1 = β 2 β 3. However, ANCOVA provides information allowing only for deciding whether the slopes and elevations are equal or not. A second procedure must be used to determine where the difference lies in either slopes or, in cases where slopes are the same, elevations. The procedure used is a multiple comparison test called the Student-Newman-Keuls multiple range tests (SNK) for short. It is very similar to the SNK procedure used to determine where differences lie between multiple means (Chapter 9). The general form of the SNK test is q = difference between 2 statistics / average SE of two statistics where the generalized Ho is that the 2 variables come from the same population. The degrees of freedom are p, the number of statistics across which the comparison is being made, and the df of average standard error of the two statistics. For example, if two slopes are being compared and they are separated in rank by one other slope, then p = 3. In SNK regression the average SE of the two statistics always includes the MS pooled from ANCOVA, so the pooled df is the df used in the SNK tests. As always, if the calculated q value is greater than the critical value, the Ho is rejected. The first step is to rank the statistics (e.g. slopes or elevations) from highest to lowest. Then the q test statistics are calculated by comparing the highest with first the lowest, then the next lowest, etc., until all possible comparisons have been made for the highest statistic. The next highest statistic is the compared to the lowest, the next lowest, etc. For example, if five slopes are ranked such that 5 is highest and 1 lowest, then the order of comparison would be: 5 vs 1 5 vs 2 5 vs 3 5 vs 4 then 4 vs 1 4 vs 2 4 vs 3 then 3 vs 1 3 vs 2 then 2 vs 1 Two additional important procedural rules are as follows. First as in all SNK tests, if no difference is found between two statistics, it is concluded that no difference exists between any statistics enclosed by these two. For example, if five slopes are compared and the calculated q of the comparison 5 vs 1 demands acceptance of Ho: β 5 = β 1, then it must be also concluded that β 4 = β 3 = β 2. Note that this can happen even though ANCOVA demonstrates that there is a significant difference between slopes (one wouldn t initiate SNK procedures unless this was the case). This simply reflects the fact that ANCOVA is a more powerful test than is the SNK test. A second important procedural rule deals with regression statistics only. Elevations can only be compared for lines with similar slopes. If the slopes are significantly different, elevations cannot be legitimately compared. 101

8 In SNK tests for slopes, q is computed as q = ba ( bb [17.1] MSpooled & 1 1 # $ +! % ' x A ' xb " to test Ho: β A = β B where A and B refer to two different ranked slopes. The SE, the denominator, is based on the corrected sums of X (Σx 2 ) obtained in regression and the MS pooled obtained in ANCOVA. If Σx 2 is the same for both lines (Σx 2 A = Σx 2 B), then SE reduces to = MSpooled SE! x 2 [17.2] The degrees of freedom for the calculated q are p, the number of slopes across which the comparison is made, and the pooled df from the ANCOVA analysis. For elevations, as earlier cautioned, the test can be applied to only those lines that have the same slopes (i.e., no significant difference in slopes). In cases where the slopes are equal, then Ho: elevations are equal can be tested. The test statistic q is computed as ( Y A ( Y B) ( bc( X A ( X B) 2 & 1 1 ( X A ( X B) q = [17.3] MSpooled # $ + +! $ n % A nb ' x A + ' xb!" where the numerator is the difference in adjusted mean y s and the denominator is the SE. The Y and X are the mean values for each of the lines, Σx 2 are the corrected sums of X and b c is the common slope for two lines being compared which is computed which is computed as! xy A +! xyb b c = [17.4] 2 2! x +! x A B with Σxy being the corrected sum of the cross products. The degrees of freedom for the calculated q are p, the number of elevations across which the comparison is made, and the pooled df from the ANCOVA analysis. The calculations involved in the SNK procedure, testing differences between slopes and elevations are a little daunting, particularly equation 17.3, and tedious. Fortunately, there is an Excel template that is available to perform these calculations in conjunction with ANCOVA. 102

9 Comparison of Regression Equations Using EXCEL The SNK template is the second worksheet of the EXCEL ANCOVA&SNK.XLT template. There are two portions to the SNK analysis 1) comparison of slopes, and 2) comparison of elevations for those lines with equal slopes. Values from the first worksheet, titled ANCOVA, are automatically put into the SNK template. The values are the sample size, slope, Σx 2, Σxy, y, and x for each line, for up to ten regression lines from the ANCOVA worksheet. The data are transferred in the order they were entered into the ANCOVA worksheet (Figure 17.1), and as indicted by the instructions, the user has to do a descending sort by slope. It is important that only data be selected before doing the sort. For example, if only four regression lines have been analyzed then it is important to select only the appropriate four rows and not include any additional rows from the data box. Sort can be found under Tools Sort. SNK Slopes Do descending sort on slopes (b) in D from A4 to H13 using only rows containing data (no zeros). ID n b!x 2!xy Mean X Mean Y 1 Sandy,NoTrees For Elevation SNK 2 Clay,NoTrees copy data for lines with same slopes 3 Clay,Trees put in blue box in next worksheet 4 Sandy,Trees Figure Hypothetical data after being sorted by decreasing slope in the SNK Slopes worksheet of the ANCOVA & SNK template. After sorting, the q values are calculated appropriately and the analysis is complete. The critical q value is the P=0.05 value and is imported from the 4 th worksheet in the ANCOVA&SNK.XLT template. The statistical decision of acceptance or rejection of the null hypothesis is obtained by simply comparing the calculated q to the critical value. If the calculated value is greater than the critical value, then the probability that sample A and B come from the same population is less than Thus, the Ho: slopes are equal is rejected. Conversely, a calculated q lower than critical q results in a probability greater than 0.05 that the two samples come from the same population and the Ha: slopes are unequal is accepted. If more than two slopes are found to be equal, then the SNK for comparison of elevations can be completed. The sorted regression data is then copied from the SNK Slopes worksheet to the SNK Elevations worksheet (Figure 17.2) where it is then sorted by descending mean Y. The q values for the elevations are then calculated according to equation 17.3, and compared against the critical q values (from the 4 th worksheet). The same rules apply for interpreting SNK elevations as applied for SNK slopes; if the calculated q is greater than the 103

10 critical q then P will be less than 0.05 and the elevations are considered significantly different. SNK Elevations Do descending sort on mean Y in Col I from A4 to H13 using only data rows containing data. ID n b! 2 x! xy Mean X Mean Y 1 Sandy,NoTrees Clay,NoTrees Clay,Trees Sandy,Trees Figure Hypothetical data after being copied from the SNK Slopes worksheet into the SNK Elevations and then sorted by descending mean Y. Example Problem (Problem 17.1) I. STATISTICAL STEPS A. Statement of Ho and Ha 1. Ho: β = 0 for relationship of cholesterol and age in both sexes with and without drug (4 regressions) 2. Assuming that β 0 for all lines test: Ho: β women, no drugs = β women, drug = β men, no drug = β men, drug 3. Ho: elevations equal for those lines with equal slopes B. Statistical test Generate regression statistics on all 4 lines; use ANCOVA to compare the 4 lines, SNK to find out where the differences lie. C. Computation of test statistics 1. All four regressions of cholesterol and age were found to be significant P < 0.05, and the uncorrected sum of squares is then copied from each regression worksheet into the ANCOVA&SNK template. 2. From the ANCOVA&SNK template, you find that the null hypothesis of slopes being equal is accepted because F 3,42 = and P =

11 3. Because slopes are equal, you check to see if elevations are equal. You reject the null of elevations being equal because F 3,45 = and P < This means that elevations are different and you must turn to the SNK Elevations worksheet to determine where differences occur. 4. The q values in the SNK Elevations worksheet are all larger than critical q values, therefore all of the elevations are significantly different (P < 0.05) with Men, no drug > Women, no drug > Men, drug > Women, drug. D. Determine of the P of the test statistic 1. Ho: βs = 0 P < 0.05 for all 4 groups 2. Ho: βs equal F [α(1)=0.05, 3, 42] = 0.358, P > Ho: elevations equal F [α(1)=0.05, 3, 45] = 56.36, P < 0.05 E. Statistical Inference 1. There is a relationship between cholesterol and age in both sexes, with and without drugs. 2. Slopes, change in cholesterol at any fixed age, are the same in both sexes, with and without drugs. 3. Elevations, amount of cholesterol at any fixed age, are higher in men than in women and the drug significantly reduces the elevation in either sex. 105

12 II. BIOLOGICAL INTERPRETATION Methods Regression equations were calculated by the method of least squares for the relationships of blood cholesterol to age in men and women with and without drug therapy. If there were significant regressions between age and blood cholesterol levels for these groups (α = 0.05), differences between slopes and elevations of the regression equations were tested using analysis of covariance and Student-Newman-Keuls post tests (α = 0.05). Elevations of the regression lines were compared only if the slopes were not significantly different; differences in elevation are presented at the mean age for each sample. Results For both men and women, with and without drug therapy, there was a significant relationship between age and blood cholesterol levels (F = , P = < for all regressions), and all regressions had the same slope (Table 1). Therapy with drug ZZZ significantly reduced the elevation of the relationship of blood cholesterol to age in both men and women (Table 1). Drug therapy reduced cholesterol levels slightly more in women (55%) than in men (44%). With or without drug therapy, men had significantly higher cholesterol levels at any age than did women (Fig. 1). Neither gender nor drug therapy affected the increase in blood cholesterol with increasing age (i.e., slopes). Table 1. Relationship of blood cholesterol (mg/dl) to age (Yr, years) in men and women with and without therapy using drug ZZZ. Group n Equation s yx s b s a R 2 Drug Men 10 = (Yr) Women 10 = (Yr) Control Men 19 = (Yr) Women 11 = (Yr) Slopes, F = 0.363, P > 0.05 Elevations, F = , P <

13 Blood Cholesterol (mg/100ml) Age (years) Figure 1. Relationship of blood cholesterol to age in men (squares) and women (circles) with (filled symbols) and without (open symbols) therapy using drug ZZZ. 107

14 Problem Set ANCOVA Determine if the relationship between oxygen consumption (ml O2 g-1 h-1) and ambient air temperature ( C) is different for the two bird species in questions 16.1 to ) Following are the same data given in the sample problem C 16.0 C 20.1 C 22.5 C 25.0 C 27.5 C Cardinal Pyrrhuloxia ) The following data are derived from those given in 17.1 except the oxygen consumption values for the cardinal have been decreased by 10%. What do expect to happen to the analysis? 12.9 C 16.0 C 20.1 C 22.5 C 25.0 C 27.5 C Cardinal Pyrrhuloxia ) The following data are derived from those given in 17.1 except the oxygen consumption values for the cardinal have been decreased by a particular percentage which increases with increasing temperature (i.e. 5% at 16, 10% at 20.1, etc.). What do expect to happen to the analysis? 12.9 C 16.0 C 20.1 C 22.5 C 25.0 C 27.5 C Cardinal Pyrrhuloxia ) Determine if the relationship between oxygen consumption (liters min -1 ) and power outputs (Watts) on a bicycle ergometer varies between an athlete and nonathlete Athlete Nonathlete ) Physical fitness test scores of three groups of individuals whose daily exercise regimes varied. Age Inactive Moderate Active ) Investigators wished to know if beer affected the respiratory exchange ratio (RQ) of top athletes at different time intervals during sub-maximal exercise. Time Beer Water No liquid

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

PSY 512: Advanced Statistics for Psychological and Behavioral Research 2 Understand linear regression with a single predictor Understand how we assess the fit of a regression model Total Sum of Squares

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

Regression in SPSS Workshop offered by the Mississippi Center for Supercomputing Research and the UM Office of Information Technology John P. Bentley Department of Pharmacy Administration University of

Least Squares Introduction We have mentioned that one should not always conclude that because two variables are correlated that one variable is causing the other to behave a certain way. However, sometimes

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

Chapter 6 - Analyses of Variance and Covariance as General Linear Models 6. Eye fixations per line of text for poor, average, and good readers: a. Design matrix, using only the first subject in each group:

Statistics II Final Exam - January 2012 Use the University stationery to give your answers to the following questions. Do not forget to write down your name and class group in each page. Indicate clearly

0.1 Multiple Regression Models We will introduce the multiple Regression model as a mean of relating one numerical response variable y to two or more independent (or predictor variables. We will see different

Statistics for Management II-STAT 362-Final Review Multiple Choice Identify the letter of the choice that best completes the statement or answers the question. 1. The ability of an interval estimate to

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

Chapter 07 Two-Sample T-Test from Means and SD s Introduction This procedure computes the two-sample t-test and several other two-sample tests directly from the mean, standard deviation, and sample size.

Simple Linear Regression Does sex influence mean GCSE score? In order to answer the question posed above, we want to run a linear regression of sgcseptsnew against sgender, which is a binary categorical

ECO 22000 McRAE SELF-TEST: SIMPLE REGRESSION Note: Those questions indicated with an (N) are unlikely to appear in this form on an in-class examination, but you should be able to describe the procedures

Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

Simple Linear Regression Chapter 11 Rationale Frequently decision-making situations require modeling of relationships among business variables. For instance, the amount of sale of a product may be related

Chapter 855 Introduction Linear regression is a commonly used procedure in statistical analysis. One of the main objectives in linear regression analysis is to test hypotheses about the slope (sometimes

SIMPLE REGRESSION ANALYSIS Introduction. Regression analysis is used when two or more variables are thought to be systematically connected by a linear relationship. In simple regression, we have only two

Simple Linear (OLS) Regression Regression is a method for studying the relationship of a dependent variable and one or more independent variables. Simple Linear Regression tells you the amount of variance

Gregory Carey, 1998 General Linear Model - 1 The General Linear Model: Theory 1.0 Introduction In the discussion of multiple regression, we used the following equation to express the linear model for a

ANOVA Analysis of Variance What is ANOVA and why do we use it? Can test hypotheses about mean differences between more than 2 samples. Can also make inferences about the effects of several different IVs,

Math 62 Statistics Sample Exam Questions 1. (10) Explain the difference between the distribution of a population and the sampling distribution of a statistic, such as the mean, of a sample randomly selected

One-Way Analysis of Variance (ANOVA) Example Problem Introduction Analysis of Variance (ANOVA) is a hypothesis-testing technique used to test the equality of two or more population (or treatment) means

Lecture #10 Chapter 10 Correlation and Regression The main focus of this chapter is to form inferences based on sample data that come in pairs. Given such paired sample data, we want to determine whether

Math 143 Inference on Regression 1 Review of Linear Regression In Chapter 2, we used linear regression to describe linear relationships. The setting for this is a bivariate data set (i.e., a list of cases/subjects

Chapter 560 Factorial Analysis of Variance Introduction A common task in research is to compare the average response across levels of one or more factor variables. Examples of factor variables are income

The 2 Test Use this test when: The measurements relate to the number of individuals in particular categories; The observed number can be compared with an expected number which is calculated from a theory.

Week 7 Lecture: Two-way Analysis of Variance (Chapter ) We can extend the idea of a one-way ANOVA, which tests the effects of one factor on a response variable, to a two-way ANOVA which tests the effects

Statistical Modelling in Stata 5: Linear Models Mark Lunt Arthritis Research UK Centre for Excellence in Epidemiology University of Manchester 08/11/2016 Structure This Week What is a linear model? How

Number of Faculty Chapter 12 : Linear Correlation and Linear Regression Determining whether a linear relationship exists between two quantitative variables, and modeling the relationship with a line, if

1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression

Unit 29 Chi-Square Goodness-of-Fit Test Objectives: To perform the chi-square hypothesis test concerning proportions corresponding to more than two categories of a qualitative variable To perform the Bonferroni

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

AP Statistics 2006 Scoring Guidelines The College Board: Connecting Students to College Success The College Board is a not-for-profit membership association whose mission is to connect students to college

Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

Homework Solutions Do all tests at the 5% level and quote p-values when possible. When answering each question uses sentences and include the relevant JMP output and plots (do not include the data in your

CHAPTER 2 AND 0: Least Squares Regression In chapter 2 and 0 we will be looking at the relationship between two quantitative variables measured on the same individual. General Procedure:. Make a scatterplot

Unit 24 Hypothesis Tests about Means Objectives: To recognize the difference between a paired t test and a two-sample t test To perform a paired t test To perform a two-sample t test A measure of the amount

Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the

STAT E-150 Statistical Methods Multiple Regression Three percent of a man's body is essential fat, which is necessary for a healthy body. However, too much body fat can be dangerous. For men between the

Chapter 15 (Regression Inference) AP Statistics Practice Test (TPS- 4 p796) Section I: Multiple Choice Select the best answer for each question. 1. Which of the following is not one of the conditions that

Statistics to English Translation, Part 2b: Calculating Significance Nina Zumel December, 2009 In the previous installment of the Statistics to English Translation, we discussed the technical meaning of

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

Econ 371 Problem Set #3 Answer Sheet 4.1 In this question, you are told that a OLS regression analysis of third grade test scores as a function of class size yields the following estimated model. T estscore

Allelopathic Effects on Root and Shoot Growth: One-Way Analysis of Variance (ANOVA) in SPSS Dan Flynn Just as t-tests are useful for asking whether the means of two groups are different, analysis of variance

2011 AP Exam Solutions 1. A professional sports team evaluates potential players for a certain position based on two main characteristics, speed and strength. (a) Speed is measured by the time required

AP Statistics Solutions to Packet 4 Inference for Regression Inference about the Model Predictions and Conditions HW #,, 6, 7 4. AN ETINCT BEAST, I Archaeopteryx is an extinct beast having feathers like

Stat 371, Cecile Ane Practice problems Midterm #2, Spring 2012 The first 3 problems are taken from previous semesters exams, with solutions at the end of this document. The other problems are suggested

Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection