SAS Software to Fit the Generalized Linear Model

Transcription

1 SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling tool. This popularity is due in part to the flexibility of generalized linear models in addressing a variety of statistical problems and to the availability of software to fit the models. The SAS system provides two new tools that fit generalized linear models. The GEN- MOD procedure in SAS/STAT software is available in release.09 of the SAS system and in experimental form in release.08. SAS/INSIGHT software provides a generalized linear modeling capability in release.08. This paper introduces generalized linear models and reviews the SAS software that fits the models. Introduction Generalized linear models are defined by Nelder and Wedderburn (192). The class of generalized linear models is an extension of traditional linear models that allows the mean of a population to depend on a linear predictor through a nonlinear link function and allows the response probability distribution to be any member of an exponential family of distributions. Many widely used statistical models are generalized linear models. These include classical linear models with normal errors, logistic and probit models for binary data, and log-linear models for multinomial data. Many other useful statistical models can be formulated as generalized linear models by the selection of an appropriate link function and response probability distribution. Refer to McCullagh and Nelder (1989) for a thorough account of statistical modeling using generalized linear models. The books by Aitkin, Anderson, Francis, and Hinde (1989) and Dobson (1990) are also excellent references with many examples of applications of generalized linear models. Firth (1991) provides an overview of generalized linear models. What Is a Generalized Linear Model? A traditional linear model is of the form y i = x i0 + " i where y i is the response variable for the ith observation. The quantity x i is a column vector of covariates, or explanatory variables for observation i, that is known from the experimental setting and is considered to be fixed, or non-random. The vector of unknown coefficients is estimated by a least squares fit to the data y. The " i are assumed to be independent, normal random variables with zero mean and constant variance. The expected value of y i, denoted by i,is i = x i0 While traditional linear models are used extensively in statistical data analysis, there are types of problems for which they are not appropriate. It may not be reasonable to assume that data are normally distributed. For example, the normal distribution (which is continuous) may not be adequate for modeling counts or measured proportions that are considered to be discrete. If the mean of the data is naturally restricted to a range of values, the traditional linear model may not be appropriate since the linear predictor x i0 can take on any value. For example, the mean of a measured proportion is between 0 and 1, but the linear predictor of the mean in a traditional linear model is not restricted to this range. It may not be realistic to assume that the variance of the data is constant for all observations. For example, it is not unusual to observe data where the variance increases with the mean of the data. A generalized linear model extends the traditional linear model and is therefore applicable to a wider range of data analysis problems. A generalized linear model consists of the following components. The linear component is defined just as it is for traditional linear models: i = x i0 1

2 A monotonic differentiable link function g describes how the expected value of y i is related to the linear predictor i : g( i )=x i0 The response variables y i are independent for i = 1,2,:::and have a probability distribution from an exponential family. This implies that the variance of the response depends on the mean through a variance function V : var(y i )=V ( i )=w i where is a constant and w i is a known weight for each observation. The dispersion parameter is either known, for example for the binomial distribution, or it must be estimated. As in the case of traditional linear models, fitted generalized linear models can be summarized through statistics such as parameter estimates, their standard errors, and goodness-of-fit statistics. You can also make statistical inference about the parameters using confidence intervals and hypothesis tests. However, specific inference procedures are usually based on asymptotic considerations, since exact distribution theory is not available or is not practical for all generalized linear models. Examples of Generalized Linear Models You construct a generalized linear model by deciding on response and explanatory variables for your data and choosing an appropriate link function and response probability distribution. Some examples of generalized linear models follow. Explanatory variables can be any combination of continuous variables, classification variables, and interactions. Traditional Linear Model response variable: continuous variable distribution: normal link function: identity = Logistic Regression response variable: a proportion distribution: binomial link function: logit = log( 1 ) Poisson Regression in Log Linear Model response variable: a count distribution: Poisson link function: log = log() Gamma Model with Log Link response variable: positive, continuous variable distribution: gamma link function: log = log() The GENMOD procedure fits a generalized linear model to the data by maximum likelihood estimation of the parameter vector. There is, in general, no closed form solution for the maximum likelihood estimates of the parameters. The GENMOD procedure estimates the parameters of the model numerically through an iterative fitting process. The dispersion parameter is also estimated by maximum likelihood, or optionally by the residual deviance or by Pearson s chi-square divided by the degrees of freedom. Covariances, standard errors, and associated p-values are computed for the estimated parameters based on the asymptotic normality of maximum likelihood estimators. A number of popular link functions and probability distributions are available in the GENMOD procedure. The built-in link functions are: identity: = logit: = log(=(1 )) probit: = 1 (), where is the standard normal cumulative distribution function power: if = 0 = log() if = 0 log: = log() complementary log-log: = log( log(1 )) The available distributions and associated variance functions are: normal: V () =1 binomial (proportion): V () =(1 ) Poisson: V () = gamma: V () = 2 inverse Gaussian: V () = 3 In addition, you can easily define your own link functions or distributions through DATA step programming statements used within the procedure. 2

3 An important aspect of generalized linear modeling is the selection of explanatory variables in the model. Changes in goodness-of-fit statistics are often used to evaluate the contribution of subsets of explanatory variables to a particular model. The deviance, defined to be twice the difference between the maximum attainable log likelihood and the log likelihood of the model under consideration, is often used as a measure of goodness of fit. The maximum attainable log likelihood is achieved with a model that has a parameter for every observation. One strategy for variable selection is to fit a sequence of models, beginning with a simple model with only an intercept term, and then include one additional explanatory variable in each successive model. You can measure the importance of the additional explanatory variable by the difference in deviances or fitted log likelihoods between successive models. Asymptotic tests computed by the GENMOD procedure allow you to assess the statistical significance of the additional term. The GENMOD procedure allows you to fit a sequence of models, up through a maximum number of terms specified in a MODEL statement. A table summarizes likelihood ratio statistics for each successive pair of models. The likelihood ratio statistic for testing the significance of a subset of parameters in a model is defined as twice the difference in log likelihoods between the model and the submodel with the parameters set to zero. The asymptotic distribution of the likelihood ratio statistic is chi-square with degrees of freedom equal to the difference in the number of parameters between the model and submodel. p- values are computed in PROC GENMOD based on the asymptotic distributions of likelihood ratio statistics. This is called a Type 1 analysis in the GENMOD procedure, because it is analogous to Type I (sequential) sums of squares in the GLM procedure. As with GLM Type I sums of squares, the results from this process depend on the order in which the model terms are fit. The GENMOD procedure also generates a Type 3 analysis analogous to Type III sums of squares in the GLM procedure. A Type 3 analysis does not depend on the order in which the terms for the model are specified. A GENMOD Type 3 analysis consists of specifying a model and computing likelihood ratio statistics for Type III contrasts for each term in the model. The contrasts are defined in the same way as they are in the GLM procedure. The GENMOD procedure optionally computes Wald statistics for Type III contrasts. This is computationally less expensive than likelihood ratio statistics, but it is thought to be less accurate because the specified significance level of hypothesis tests based on the Wald statistic may not be as close to the actual significance level as it is for likelihood ratio tests. A Type 3 analysis generalizes the use of Type III estimable functions in linear models. Briefly, a Type III estimable function (contrast) for an effect is a linear function of the model parameters that involves the parameters of the effect and any interactions with that effect. A test of the hypothesis that the Type III contrast for a main effect is equal to 0 is intended to test the significance of the main effect in the presence of interactions. Refer to the documentation for the GLM procedure and Chapter 9, The Four Types Of Estimable Functions, in SAS/STAT User s Guide, Version, Fourth Edition for more information about Type III estimable functions. Also, refer to SAS System For Linear Models, Third Edition. Additional features of the GENMOD procedure are: likelihood ratio statistics for user-defined contrasts, that is, linear functions of the parameters, and p-values based on their asymptotic chi-square distributions ability to create a SAS data set corresponding to most tables printed by the procedure confidence intervals for model parameters based on either the profile likelihood function or asymptotic normality PROC GLM-like syntax for the specification of the response and model effects, including interaction terms and automatic coding of classification variables Poisson Regression You can use the GENMOD procedure to fit a variety of statistical models. A typical use of the GENMOD procedure is to perform Poisson regression. The Poisson distribution can be used to model the distribution of cell counts in a multiway contingency table. Aitkin, Anderson, Francis, and Hinde (1989) have used this method to model insurance claims data. Suppose the following hypothetical insurance claims data are classified by two factors: age group, with two levels, and car type, with three levels. data insure; input n c car$ age; ln = log(n); cards; small medium large small medium large 2 ; 3

4 In the preceding data set, N is the number of insurance policyholders, and C is the number of insurance claims. CAR is the type of car involved, classified into three groups, and AGE is the age group of a policyholder, classified into two groups. You can use the GENMOD procedure to perform a Poisson regression analysis of these data with a log link function. Assume the number of claims C has a Poisson probability distribution, and its mean, i,is related to the factors CAR and AGE for observation i by log( i ) = log(n i )+ 0 +CAR i (1) 1 + CAR i (2) 2 + CAR i (3) 3 + AGE i (1) 4 + AGE i (2) 5 CAR i (j) and AGE i (j) are indicator variables associated with the jth level of CAR and AGE: CAR i (j) = 1 if CAR = j 0 if CAR = j for observation i. Thes are unknown parameters to be estimated by the procedure. The logarithm of N is used as an offset, that is, a regression variable with a constant coefficient of 1 for each observation. A log linear relationship between the mean and the factors CAR and AGE is specified by the log link function. The log link function insures that the mean number of insurance claims for each car and age group predicted from the fitted model will be positive. The following statements invoke the GENMOD procedure to perform this analysis. proc genmod data=insure; class car age; model c = car age / dist = poisson link = log offset = ln; run; and AGE variables. That is, the model matrix is X = where the first column corresponds to the intercept, the next 3 columns correspond to CAR, and the last 2 columns correspond to AGE. The response distribution is specified as Poisson, and the link function is chosen to be log. That is, the Poisson mean parameter is related to the linear predictor by log() =x i0 : The logarithm of N is specified as an offset variable, as is common in this type of analysis. In this case the offset variable serves to normalize the fitted cell means to a per policyholder basis, since the total number of claims, not individual policyholder claims, were observed. PROC GENMOD produces the following default output from the preceding statements. Description Model Information Value 3 5 Data Set WORK.INSURE Distribution POISSON Link Function LOG Dependent Variable C Offset Variable LN Observations Used CAR and AGE are specified as CLASS variables so that PROC GENMOD automatically generates the indicator variables associated with CAR and AGE. The MODEL statement specifies C as the response variable and CAR and AGE as explanatory variables. An intercept term is included by default. Thus, the model matrix X (the matrix that has as its ith row the transpose of the covariate vector for the ith observation) consists of a column of 1s representing the intercept term and columns of 0s and 1s derived from indicator variables representing the levels of the CAR Figure 1. Model Information The Model Information table in Figure 1 provides information about the specified model and the input data set. 4

5 Class Level Information Class Levels Values CAR 3 large medium small AGE Analysis Of Parameter Estimates Parameter DF Estimate INTERCEPT CAR large CAR medium CAR small AGE AGE SCALE NOTE: The scale parameter was held fixed. Std Err ChiSquare Pr>Chi Figure 2. Class Level Information The Class Level Information table in Figure 2 identifies the levels of the classification variables that are used in the model. Note that CAR is a character variable, and the values are sorted in alphabetical order. This is the default sort order, but you can select different sort orders with the ORDER= option in the PROC GENMOD statement. Criteria For Assessing Goodness Of Fit Criterion DF Value Deviance Scaled Deviance Pearson Chi-Square Scaled Pearson X Log Likelihood Value/DF Figure 3. Goodness of Fit Criteria The Criteria For Assessing Goodness Of Fit table in Figure 3 contains statistics that summarize the fit of the specified model. These statistics are helpful in judging the adequacy of a model and in comparing it with other models under consideration. If you compare the deviance of with its asymptotic chi-square with 2 degrees of freedom distribution, you find that the p-value is.24. This indicates that the specified model fits the data reasonably well Figure 4. Parameter Estimates The Analysis Of Parameter Estimates table in Figure 4 summarizes the results of the iterative parameter estimation process. For each parameter in the model, PROC GENMOD prints columns with the parameter name, the degrees of freedom associated with the parameter, the estimated parameter value, the standard error of the parameter estimate, and a Wald chi-square statistic and associated p-value for testing the significance of the parameter to the model. If a column of the model matrix corresponding to a parameter is found to be linearly dependent, or aliased, with columns corresponding to parameters preceding it in the model, PROC GENMOD assigns it zero degrees of freedom and prints a value of zero for both the parameter estimate and its standard error. This table includes a row for a scale parameter, even though there is no free scale parameter in the Poisson distribution. PROC GENMOD allows the specification of a scale parameter to fit overdispersed Poisson and binomial distributions. In such cases, the SCALE row indicates the value of the overdispersion scale parameter used in adjusting output statistics. PROC GEN- MOD prints a note indicating that the scale parameter was fixed, that is, not estimated by the iterative fitting process. It is usually of interest to assess the importance of the main effects in the model. Type 1 and Type 3 analyses generate statistical tests for the significance of these effects. You can request these analyses with the TYPE1 and TYPE3 options in the MODEL statement. proc genmod data=insure; class car age; model c = car age / dist run; link offset = ln type1 type3; = poisson = log 5

6 The results of these analyses are summarized in the tables that follow. LR Statistics For Type 1 Analysis Source Deviance DF ChiSquare Pr>Chi INTERCEPT CAR AGE Figure 5. Type 1 Analysis In the table for Type 1 analysis in Figure 5, each entry in the deviance column represents the deviance for the model containing the effect for that row and all effects preceding it in the table. For example, the deviance corresponding to CAR in the table is the deviance of the model containing an intercept and CAR. As more terms are included in the model, the deviance decreases. Entries in the chi-square column are likelihood ratio statistics for testing the significance of the effect added to the model containing all the preceding effects. The chi-square value of.915 for CAR represents twice the difference in log likelihoods between fitting a model with only an intercept term and a model with an intercept and CAR. Since the scale parameter is set to 1 in this analysis, this is equal to the difference in deviances. Since two additional parameters are involved, this statistic can be compared with a chi-square distribution with two degrees of freedom. The resulting p-value (labeled Pr>Chi) of 0 indicates that this variable is highly significant. Similarly, the chi-square value of for AGE represents the difference in log likelihoods between the model with the intercept and CAR and the model with the intercept, CAR, and AGE. This effect is also highly significant, as indicated by the p-value. difference between the log likelihood for the model with INTERCEPT, CAR, and AGE included and the log likelihood for the model with CAR excluded. The hypothesis tested in this case is the significance of CAR in the model with AGE already included. The values of the Type 3 likelihood ratio statistics for CAR and AGE indicate that both of these factors are highly significant in determining the claims performance of the insurance policyholders. SAS/INSIGHT Software You can fit generalized linear models within an interactive graphical environment using SAS/INSIGHT software. The same set of response distributions and link functions, with the exception of user-defined, are available in SAS/INSIGHT software as in the GEN- MOD procedure. Most of the output statistics in PROC GENMOD are also available in SAS/INSIGHT software, and some additional regression diagnostics and automatic plotting of residuals are available. The SAS/INSIGHT data window containing the insurance claims data is shown in Figure. CAR and AGE have been selected as nominal, or CLASS variables. Figure. Data Window LR Statistics For Type 3 Analysis Source DF ChiSquare Pr>Chi CAR AGE Figure. Type 3 Analysis The Type 3 analysis shown in Figure results in the same conclusions as the Type 1 analysis. The Type 3 chi-square value for CAR, for example, is twice the You select Analyze! Fit(YX)to invoke the window shown in Figure 8. There you can select the response variable and covariate variables by selecting the variable names and then clicking the Y button for the response variable and the X button for the covariates. C has been selected as the response, and CAR and AGE have been selected as covariates.

7 Figure 8. Specifying the Model Figure 10. Selecting the Output You can then click the Method button to specify the generalized linear model in the window shown in Figure 9. The Poisson response distribution and log link function have been selected. You specify LN as an offset variable by selecting the variable name and then clicking the Offset button. The results are shown in the analysis output window in Figure 11. These are identical to the results produced by PROC GENMOD. Figure 9. Selecting the Response Distribution and Link Function Figure 11. Analysis Results You can select the output you desire from the analysis by clicking the Output button in Figure 8. This invokes the output window shown in Figure 10. As shown in Figure 10, Type I tests, Type III tests, and Parameter Estimates have been selected. Conclusions The generalized linear model extends the traditional linear model to be applicable to a wider range of statistical modeling problems. The GENMOD procedure in SAS/STAT software fits generalized linear models in a traditional SAS environment, retaining much of the syntax and functionality of linear modeling procedures such as PROC GLM. You can also fit generalized linear models in an interactive graphical interface environment using SAS/INSIGHT software. Both methods produce statistics that allow you to make statistical inference about the model parame-

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

A Tutorial on Logistic Regression Ying So, SAS Institute Inc., Cary, NC ABSTRACT Many procedures in SAS/STAT can be used to perform logistic regression analysis: CATMOD, GENMOD,LOGISTIC, and PROBIT. Each

Chapter 3 Logit Models for Binary Data We now turn our attention to regression models for dichotomous data, including logistic regression and probit analysis. These models are appropriate when the response

Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy

15 Generalized Linear Models Due originally to Nelder and Wedderburn (1972), generalized linear models are a remarkable synthesis and extension of familiar regression models such as the linear models described

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

Chapter 250 Introduction The Chi-square test is often used to test whether sets of frequencies or proportions follow certain patterns. The two most common instances are tests of goodness of fit using multinomial

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS

IDENTIFICATION AND ESTIMATION OF AGE, PERIOD AND COHORT EFFECTS IN THE ANALYSIS OF DISCRETE ARCHIVAL DATA Stephen E. Fienberg, University of Minnesota William M. Mason, University of Michigan 1. INTRODUCTION

Paper 194-29 Head of the CLASS: Impress your colleagues with a superior understanding of the CLASS statement in PROC LOGISTIC Michelle L. Pritchard and David J. Pasta Ovation Research Group, San Francisco,

Chapter 7 The procedure extends the generalized linear model to allow for analysis of repeated measurements or other correlated observations, such as clustered data. Example. Public health of cials can

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

Motor and Household Insurance: Pricing to Maximise Profit in a Competitive Market by Tom Wright, Partner, English Wright & Brockman 1. Introduction This paper describes one way in which statistical modelling

Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes are

Logistic Regression (a type of Generalized Linear Model) 1/36 Today Review of GLMs Logistic Regression 2/36 How do we find patterns in data? We begin with a model of how the world works We use our knowledge

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

Pearson s Goodness of Fit Statistic as a Score Test Statistic Gordon K. Smyth Abstract For any generalized linear model, the Pearson goodness of fit statistic is the score test statistic for testing the

Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome

SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.

Curriculum Map Statistics and Probability Honors (348) Saugus High School Saugus Public Schools 2009-2010 Week 1 Week 2 14.0 Students organize and describe distributions of data by using a number of different

1 Multivariate Logistic Regression As in univariate logistic regression, let π(x) represent the probability of an event that depends on p covariates or independent variables. Then, using an inv.logit formulation

Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

Paper 264-26 Model Fitting in PROC GENMOD Jean G. Orelien, Analytical Sciences, Inc. Abstract: There are several procedures in the SAS System for statistical modeling. Most statisticians who use the SAS

SP10 From GLM to GLIMMIX-Which Model to Choose? Patricia B. Cerrito, University of Louisville, Louisville, KY ABSTRACT The purpose of this paper is to investigate several SAS procedures that are used in

Linear Regression in SPSS Data: mangunkill.sav Goals: Examine relation between number of handguns registered (nhandgun) and number of man killed (mankill) checking Predict number of man killed using number

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

Introductory Guide to HLM With HLM 7 Software 3 G. David Garson HLM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush

University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be

Chapter 537 Binary Diagnostic Tests Two Independent Samples Introduction An important task in diagnostic medicine is to measure the accuracy of two diagnostic tests. This can be done by comparing summary

Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional

Stochastic programming approaches to pricing in non-life insurance Martin Branda Charles University in Prague Department of Probability and Mathematical Statistics 11th International Conference on COMPUTATIONAL

New SAS Procedures for Analysis of Sample Survey Data Anthony An and Donna Watts, SAS Institute Inc, Cary, NC Abstract Researchers use sample surveys to obtain information on a wide variety of issues Many

Survey, Statistics and Psychometrics Core Research Facility University of Nebraska-Lincoln Log-Rank Test for More Than Two Groups Prepared by Harlan Sayles (SRAM) Revised by Julia Soulakova (Statistics)

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

Paper 69-25 PROC FREQ: It s More Than Counts Richard Severino, The Queen s Medical Center, Honolulu, HI ABSTRACT The FREQ procedure can be used for more than just obtaining a simple frequency distribution

Chapter 95 Introduction Most statisticians have a set of probability tables that they refer to in doing their statistical wor. This procedure provides you with a set of electronic statistical tables that

Cool Tools for PROC LOGISTIC Paul D. Allison Statistical Horizons LLC and the University of Pennsylvania March 2013 www.statisticalhorizons.com 1 New Features in LOGISTIC ODDSRATIO statement EFFECTPLOT

PREDICTIVE DISTRIBUTIONS OF OUTSTANDING LIABILITIES IN GENERAL INSURANCE BY P.D. ENGLAND AND R.J. VERRALL ABSTRACT This paper extends the methods introduced in England & Verrall (00), and shows how predictive

Logistic regression modeling the probability of success Regression models are usually thought of as only being appropriate for target variables that are continuous Is there any situation where we might

LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values