Article Title

Authors

Document Type

Article

Abstract

This article attempts to improve the performance of classification algorithms used in the bank customer marketing response prediction of an unnamed Portuguese bank using the Random Forest ensemble. A thorough exploratory data analysis (EDA) was conducted on the data in order to ascertain the presence of anomalies such as outliers and extreme values. The EDA revealed that the bank data had 45, 211 instances and 17 features, with 11.7% positive responses. This was in addition to the detection of outliers and extreme values. Classification algorithms used for modelling the bank dataset include; Logistic Regression, Decision Tree, Naïve Bayes and the Random Forest ensemble. These algorithms were applied to both the balanced and original bank data using Orange 3.2 data mining application following the Cross Industry Standard for Data Mining (CRISP-DM), and the ten-fold cross-validation method. Results from the experimental methods revealed that the performance of the Random Forest ensemble improved when the data was balanced. Results also showed that the features duration, poutcome, contact, month and housing were the most important features that contribute to the success of the bank customer marketing campaign for deposit subscription. The study also revealed that the duration of call to clients, response to past promotions, and the use of cell phone contribute positively to the success of the campaign. While the months of September, November, March and April recorded higher subscription rates. Those in management cadre and technicians were found to have responded more positively to the campaign than those in other job categories.