Use IBM Watson Studio Local to automate the building and training of a machine learning model to classify wines

Summary

This code pattern demonstrates how data scientists can leverage IBM Watson Studio Local to automate the building and training of a machine learning model to classify wines. It applies Principal Component Analysis (PCA) on a wine dataset to extract features. These components are then used to create a classification model that predicts wine categories.

Description

Using the IBM Watson Studio Local suite of tools, this code pattern provides an example data science workflow which attempts to classify wine into three categories based on their chemical properties.

Feature engineering is used to limit the number of properties needed to classify a wine. Using Pricipal Component Analysis (PCA), two principal components are extracted from the wine dataset to build our classification model.

Our classification model will apply Logistic regression on the extracted components to predict the wine categories.

After completing this code pattern, you’ll understand how to:

Use Watson Studio Local and to extract features using PCA and other techniques.

Build, train, and save a model from the extracted features using Watson Studio Local.

Use the Watson Machine Learning feature to deploy and access your model in batch and API mode

Automate the feature extraction and model scoring using the scripts that are deployed as a service in batch and API mode.

Flow

Use Spark DataFrame operations to clean the dataset and use Spark MLlib to train a PCA classification model.

Save the resulting model into IBM Watson Studio Local.

The user can run the provided notebooks in Watson Studio Local.

Use the IBM Watson Machine Learning feature to deploy and access the model to generate wine classification.

Instructions

Get the detailed instructions in the README file. These steps will show you how to: