8 Key Steps to Follow When Solving A Machine Learning Problem

This article represents some of the key steps one could take in order to create most effective model to solve a given machine learning problem, using different machine learning algorithms. Please feel free to comment/suggest if I missed to mention one or more important points. Also, sorry for the typos.

8 Key Steps for Solving A Machine Learning Problem

Gather the data set: This is one of the most important step where the objective is to as much large volume of data set as possible. Given that features have been selected appropriately, large data set helps to minimize the training data set error and also, enable cross-validation and training data set error to converge to the minimum value. In case the features have not been selected appropriately, after a certain size of data set, there may not be much impact on error because of high bias or under-fitting case.

Split the data set into following three classes of data sets:

Training data set

Cross-validation data set

Test data set

The reason why this should be done is the scenario when test data set ends up fitting well with new features that is developed based on evaluation of test data set error. One could adopt the 60-20-20% split for training, cross-validation and test data set.

Choose the most appropriate algorithm. There are guidelines based on which one could select a particular machine learning algorithm based on the problem at hand. For example, if this is about creating predictive model for estimating number such as price etc, one can choose one of the regression algorithm. If this is about classifying the input to one of the labels, it has to be a classification algorithm.

Start with a very simplistic model with minimal and most prominent set of features. This would help one to get started very quickly without spending time in exploring the correct and most appropriate features set. Many a times, lot of time is spent on identification of most appropriate features.

Plot learning curves to measure how error (prediction vs observed) varies with respect to some of the following:

Ajitesh has been recently working in the area of AI and machine learning. Currently, his research area includes Safe & Quality AI. In addition, he is also passionate about various different technologies including programming languages such as Java/JEE, Javascript and technologies such as Blockchain, mobile computing, cloud-native technologies, application security, cloud computing platforms, big data etc.