Similar presentations

3
Logistic Regression  In Linear regression, the dependent variable is continuous  What if the dependent is dichotomous or binary?  A person will vote for Reagan (1) or Carter (0)?  A woman will give birth to a low weight baby(1) or not (0)?  Does the person have a disease? Yes (1) or No (0)  Outcome of a baseball game? Win (1) or loss (0)  A linear regression model will not be able solve it with acceptable error  Moreover, values above 1 and below 0 do not make any sense  Also, we are more interested in probabilites (or odds ratio) than in 0 or 1 output  What can be done?

6
The Logistic Function  Symmetry property:  Inverse of the logistic sigmoid fn is given by & is called the logit or log odds function because it represents the log of the ratio of the probabilities Sigmoid fn is S-shaped and is also called the squashing fn because it maps the whole real line into a finite interval (maps real a ε (-∞, +∞) to finite (0,1) interval) Plays an important role in many classification algorithms

8
Probabilistic Discriminative Models  First 10 Observations of the Data Set ADMITGRETOPNOTCHGPA

9
Dot-plot: Data from Table 2

10
Logistic regression (2) Table 3 Prevalence (%) of signs of CD according to age group

11
Dot-plot: Data from Table 3 Diseased % Age (years)

12
Logistic Regression  Consider the linear probability model  Issue: π (X i ) can take on values less than 0 or greater than 1  Issue: Predicted probability for some subjects fall outside of the [0,1] range.

14
Logistic Regression  Consider the logistic regression model And the linear probability model Then the graph of the predicted probabilities for different grade point averages:

15
What is Logistic Regression?  In a nutshell: A statistical method used to model dichotomous or binary outcomes (but not limited to) using predictor variables. Used when the research method is focused on whether or not an event occurred, rather than when it occurred (time course information is not used).

16
What is Logistic Regression?  What is the “Logistic” component? Instead of modeling the outcome, Y, directly, the method models the log odds(Y) using the logistic function.

22
T, H, H, T, T, H, H, T, H, H  What is the Pr(Heads) given the data?  Most reasonable data-based estimate would be 6/10.  In fact, is the ML estimator of p. Maximum Likelihood

23
Discrete distribution, finite parameter space How biased an unfair coin is?unfair coin Call the probability of tossing a HEAD p. Determine p. Toss the coin 80 times Outcome is 49 HEADS and 31 TAILS, Suppose the coin was taken from a box containing three coins: one which gives HEADS with probability p = 1/3, one which gives HEADS with probability p = 1/2 and another which gives HEADS with probability p = 2/3. NO labels on these coins Using maximum likelihood estimation the coin that has the largest likelihood can be found, given the data that were observed. By using the probability mass function of the binomial distribution with sample size equal to 80, number successes equal to 49 but different values of p (the "probability of success"), the likelihood function (defined below) takes one of three values:probability mass functionbinomial distribution Maximum Likelihood: Example

24
The likelihood is maximized when p = 2/3, and so this the maximum likelihood estimate for p. Maximum Likelihood: Example Discrete distribution, finite parameter space

25
Discrete distribution, continuous parameter space Now suppose that there was only one coin but its p could have been any value 0 ≤ p ≤ 1. The likelihood function to be maximized is: and the maximization is over all possible values 0 ≤ p ≤ 1. differentiatingdifferentiating with respect to p (solutions p = 0, p = 1, and p = 49/80) The solution which maximizes the likelihood is clearly p = 49/80 Thus the maximum likelihood estimator for p is 49/80. Maximum Likelihood: Example

26
Continuous distribution, continuous parameter space  Do it for Gaussian Distribution yourself!  Two parameters, μ & σ Maximum Likelihood: Example Its expectation value is equal to the parameter μ of the given distribution,expectation value which means that the maximum-likelihood estimator μ is unbiased. This means that the estimator is biased. However, is consistent. In this case it could be solved individually for each parameter. In general, it may not be the case.

27
 The method of maximum likelihood estimation chooses values for parameter estimates (regression coefficients) which make the observed data “maximally likely.”  Standard errors are obtained as a by-product of the maximization process Maximum Likelihood

28
The Logistic Regression Model intercept is the log(odds) of the outcome. model coefficients

29
Maximum Likelihood  We want to choose β ’s that maximizes the probability of observing the data we have: Assumption: independent y’s

43
Probabilistic Discriminative Models  Classification methods work directly with the original input vector  All such algorithms are still applicable if we first make a fixed NL transformation of the inputs using a vector of basis fns ϕ(x)  Decision boundaries linear in the feature space ϕ  In linear models of regression, one of the basis fn is typically set to a constant say, so that the corresponding parameter plays the role of bias.  A fixed basis fn transformation ϕ(x) will be used in