Summary Statistics & Cleaning Missing Data – Let’s understand the aggregate behavior of our features further by looking at summary statistics. Azure Machine Learning gives us easy access to mean, median, mode, min, and max. Let’s look at each measure to see what it means to the interpretation of the data.

The summarize data module also gives us a count for each feature with missing values. We can then formulate a strategy for cleaning missing data. The cleaning functions used in this tutorial is not the optimal way to clean data, but we must learn to crawl before we walk. We’ll drop each row that has a missing value in our response class. Then use one of the measures of central tendency to fill in the other features; median for numeric features and mode for categorical features.