The steps in the machine learning workflow

We outline preprocessing steps for finding, removing, and cleaning data to prepare it for machine learning and how tools like MATLAB can help with data exploration, identification of key traits, and communicating the findings.

By Seth DeLand, Product Marketing Manager, Data Analytics, MathWorks

Machine learning is ubiquitous. From medical diagnosis, speech, and handwriting recognition to automated trading and movie recommendations, machine learning techniques are being used to make critical business and life decisions every moment of the day. Each problem is unique, so it can be challenging to manage raw data, identify the right data to include in the model, train multiple types of models, and perform model assessments.

Choosing Your Algorithm and Technique

Machine learning uses algorithms that learn from data to help make better decisions; however ,it is not always obvious what the best machine learning algorithm is going to be for a particular problem.

Luckily, information such as variable importance and model assessment tools can help us decide which machine learning techniques to apply. Examples of machine learning techniques include clustering, where objects are grouped into bins with similar traits; regression, where relationships among variables are estimated; and classification, where a trained model is used to predict a categorical response.

Getting to Know Your Data

Before you get started, you’ll need to think about what data you have available and where it’s stored. It could be stored in spreadsheets, databases, binary files, or big data systems. You’ll also need to think about what preprocessing the data is going to need to make it useful for machine learning. In large datasets, missing values and outliers are common. As a result, a series of preprocessing steps are required for finding, removing, and cleaning data in order to prepare it for machine learning.

The use of data analysis and visualization tools, such as MATLAB, can assist with data exploration, identification of key traits, and communicating the findings. Watch this 3-minute video Machine Learning with MATLAB Overview to learn more about the steps in the machine learning workflow.

Fig. 1: Examples of machine learning include clustering, where objects are grouped into bins with similar traits, and regression, where relationships among variables are estimated.