Binary Logistic Regression is used to perform logistic regression on a binary response (dependent) variable (a variable only that has two possible values, such as presence or absence of a particular disease, this kind of variable is known as dichotomous variable i.e binary in nature).

Binary Logistic Regression can classify observations into one of two categories. These classifications can give fewer classification errors than discriminant analysis for some cases.

The default model contains the variables that you enter in Continuous predictors and Categorical predictors. You can also add interaction and/or polynomial terms by using the tools available in model sub-dialog box.

Minitab stores the last model that you fit for each response variable. This stored models can be used to quickly generate predictions, contour plots, surface plots, overlaid contour plots, factorial plots, and optimized responses.

Choose Response in binary response/frequency format, from combobox on top

In Response text box, enter the column that contains the response variable.

In Frequency text box, enter the optional column that contains the count or frequency variable.

If you have summarized data, then follow these steps:

Response in Binary Logistic Regression (Trial Format)

Choose Response in event/trial format, from combobox on top of the dialog box.

In Number of events, enter the column that contains the number of times the event occurred in your sample at each combination of the predictor values.

In Number of trials, enter the column that contains the corresponding number of trials.

Step4: In Continuous predictors, enter the columns that contain continuous predictors. In Categorical predictors, enter the columns that contain categorical predictors. You can add interactions and other higher order terms to the model.

Step5: If you like, use one or more of the dialog box options, then click OK.

The following are options available in the main dialog box of Minitab Binary Logistic Regression:

Response in binary response/frequency format: Choose if the response data has been entered as a column that contains 2 distinct values i.e as a dichotomous variable.Response: Enter the column that contains the response values.Response event: Choose which event of interest the results of the analysis will describe.Frequency (optional): If the data are in two columns i.e. one column that contains the response values and the other column that contains their frequencies then enter the column that contains the frequencies.Response in event/trial format: Choose if the response data are two columns – one column that contains the number of successes or events of interest and one column that contains the number of trials.Event name: Enter a name for the event in the data.Number of events: Enter the column that contains the number of events.Number of trials: Enter the column that contains the number of nonevents.Continuous predictors: Select the continuous variables that explain changes in the response. The predictor is also called the X variable.Categorical predictors: Select the categorical classifications or group assignments, such as type of raw material, that explain changes in the response. The predictor is also called the X variable.

Multivariable / Multiple Regression

Multiple regression (a regression having multi-variable) is referred as a regression model having more than one predictor (independent and explanatory variable) to explain a response (dependent) variable. We know that in simple regression models has one predictor used to explain a single response while for case of multiple (multivariable) regression models, more than one predictor in the models. Simple regression models and multiple (multivariable) regression models can further be categorized as linear or non-linear regression models.

Note that linearity does not based on predictors or addition of more predictors in simple regression model, it is referred to the parameter of variability (parameters attached with predictors). If the parameters of variability having constant rate of change then the models are referred to as linear models either it is a simple regression model or multiple (multivariable) regression models. It is assumed that the relationship between variables is considered as linear, though this assumption can never be confirmed for case of multiple linear regression. However, as a rule, it is better to look at bivariate scatter diagram of the variable of interests, you check that there should be no the curvature in the relationship.

Multiple regression also allows to determine the overall fit (which is known as variance explained) of the model and the relative contribution of each of the predictors to the total variance explained (overall fit of the model). For example, one may be interested to know how much of the variation in exam performance can be explained by the following predictors such as revision time, test anxiety, lecture attendance and gender “as a whole”, but also the “relative contribution” of each independent variable in explaining the variance.

A multiple regression model have the form

\[y=\alpha+\beta_1 x_1+\beta_2 x_2+\cdots+\beta_k x_k+\varepsilon\]

Here y is continuous variables, x’s are known as predictors which may be continuous, categorical or discrete. The above model is referred to as a linear multiple (multivariable) regression model.

For example prediction of college GPA by using, high school GPA, test scores, time gives to study and rating of high school as predictors.

Logistic regression Introduction

Logistic regression was introduced in 1930s by Ronald Fisher and Frank Yates and was first proposed in 1970s as an alternative technique to overcome limitations of ordinary least square regression in handling dichotomous outcomes. It is a type of probabilistic statistical classification model which is non-linear regression model, can be converted into linear model by using a simple transformation. It is used to predict a binary response categorical dependentvariable, based on one or more predictor variables. That is, it is used in estimating empirical values of the parameters in a model. Here response variable assumes value as zero or one i.e. dichotomous variable. It is the regression model of b, a logistic regression model is written as

\[\pi=\frac{1}{1+e^{-[\alpha +\sum_{i=1}^k \beta_i X_{ij}]}}\]

where $\alpha$ and $\beta_i$ are the intercept and slope respectively.

So in simple words, logistic regression is used to find the probability of the occurrence of the outcome of interest. For example if we want to find the significance of the different predictors (gender, sleeping hours, took part in extracurricular activities, etc.), on a binary response (pass or fail in exams coded as 0 and 1), for this kind of problems we used logistic regression.

By using a transformation this nonlinear regression model can be easily converted into linear model. As $\pi$ is the probability of the events in which we are interested so if we takes the ratio of the probability of success and failure then the model become linear model.

\[ln(y)=ln(\frac{\pi}{1-\pi})\]

The natural log of odds can convert the logistics regression model into linear form.

Introduction Odds Ratio

Medical students, students from clinical and psychological sciences, professionals allied to medicine enhancing their understanding and learning of medical literature and researchers from different fields of life usually encounter Odds Ratio (OR) throughout their careers.

Odds ratio is a relative measure of effect, allowing the comparison of the intervention group of a study relative to the comparison or placebo group. When computing Odds Ratio, one would do:

The numerator is the odds in the intervention arm

The denominator is the odds in the control or placebo arm= OR

If the outcome is the same in both groups, the ratio will be 1, implying that there is no difference between the two arms of the study. However, if the OR>1, the control group is better than the intervention group while, if the OR<1, the intervention group is better than the control group.

The ratio of the probability of success and failure is known as odds. If the probability of an event is $P_1$ then the odds is:
\[OR=\frac{p_1}{1-p_1}\]

The Odds Ratio is the ratio of two odds can be used to quantify how much a factor is associated to the response factor in a given model. If the probabilities of occurrences an event are $P_1$ (for first group) and $P_2$ (for second group), then the OR is:
\[OR=\frac{\frac{p_1}{1-p_1}}{\frac{p_2}{1-p_2}}\]

If predictors are binary then the OR for ith factor, is defined as
\[OR_i=e^{\beta}_i\]

The regression coefficient $b_1$ from logistic regression is the estimated increase in the log odds of the dependent variable per unit increase in the value of the independent variable. In other words, the exponential function of the regression coefficients $(e^{b_1})$ in the OR associated with a one unit increase in the independent variable.

Median is the middle most value in the data set when all of the values (observations) in a data set are arranged either in ascending or descending order of their magnitude. Median is also considered as a measure of central tendency which divides the data set in two half, where the first half contains 50% observations below the median value and 50% above the median value. If in a data set there are odd number of observations (data points), the median value is the single most middle value after sorting the data set.

Example: Consider the following data set 5, 9, 8, 4, 3, 1, 0, 8, 5, 3, 5, 6, 3.
To find the median of the given data set, first sort it (either in ascending or descending order), that is0, 1, 3, 3, 3, 4, 5, 5, 5, 6, 8, 8, 9. The middle most value of the above data after sorting is 5, which is median of the given data set.

When the number of observations in a data set is even then the median value is the average of two middle most values in the sorted data.

Example: Consider the following data set, 5, 9, 8, 4, 3, 1, 0, 8, 5, 3, 5, 6, 3, 2.
To find the median first sort it and then locate the middle most two values, that is,0, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 8, 8, 9. The middle most two values are 4 and 5. So median will be average of these two values, i.e. 4.5 in this case.

The Median is less affected by extreme values in the data set, so median is preferred measure of central tendency when the data set is skewed or not symmetrical.

For large data set it is relatively very difficult to locate median value in sorted data. It will be helpful to use median value using formula. The formula for odd number of observations is
$\begin{aligned}
Median &=\frac{n+1}{2}th\\
Median &=\frac{n+1}{2}\\
&=\frac{13+1}{2}\\
&=\frac{14}{2}=7th
\end{aligned}$