Advisor

Committee Member

Second Committee Member

Third Committee Member

Keywords

Abstract

Regression analysis fits predictive models to data on a response variable and corresponding values for a set of explanatory variables. Often data on the explanatory variables come at a cost from commercial databases, so the available budget may limit which ones are used in the final model.

In this dissertation, two budget-constrained regression models are proposed for continuous and categorical variables respectively using Mixed Integer Nonlinear Programming (MINLP) to choose the explanatory variables to be included in solutions. First, we propose a budget-constrained linear regression model for continuous response variables. Properties such as solvability and global optimality of the proposed MINLP are established, and a data transformation is shown to signicantly reduce needed big-Ms. Illustrative computational results on realistic retail store data sets indicate that the proposed MINLP outperforms the statistical software outputs in optimizing the objective function under a limit on the number of explanatory variables selected. Also our proposed MINLP is shown to be capable of selecting the optimal combination of explanatory variables under a budget limit covering cost of acquiring data sets.

A budget-constrained and or count-constrained logistic regression MINLP model is also proposed for categorical response variables limited to two possible discrete values. Alternative transformations to reduce needed big-Ms are included to speed up the solving process. Computational results on realistic data sets indicate that the proposed optimization model is able to select the best choice for an exact number of explanatory variables in a modest amount of time, and these results frequently outperform standard heuristic methods in terms of minimizing the negative log-likelihood function. Results also show that the method can compute the best choice of explanatory variables affordable within a given budget. Further study adjusting the objective function to minimize the Bayesian Information Criterion BIC value instead of negative log-likelihood function proves that the new optimization model can also reduce the risk of overfitting by introducing a penalty term to the objective function which grows with the number of parameters.

Finally we present two refinements in our proposed MINLP models with emphasis on multiple linear regression to speed branch and bound (B&B) convergence and extend the size range of instances that can be solved exactly. One adds cutting planes to the formulation, and the second develops warm start methods for computing a good starting solution. Extensive computational results indicate that our two proposed refinements significantly reduce the time for solving the budget constrained multiple linear regression model using a B&B algorithm, especially for larger data sets.

The dissertation concludes with a summary of main contributions and suggestions for extensions of all elements of the work in future research.