Introduction to Machine Learning with IBM Watson Studio

Introduction to Machine Learning with IBM Watson Studio

Machine learning is a type of artificial intelligence (AI) that enables computers to learn without being explicitly programmed. Algorithms identify patterns found in data to generate predictive models.Typically machine learning tasks fall into three categories:

Unsupervised Learning – Data fed into the computer is not labeled. The goal is to explore and find structure. Popular unsupervised learning algorithms include Cluster Analysis and Market Basket Analysis.

In this lesson, we will walk through creating a supervised learning model with IBM SPSS Modeler, a service within IBM Watson Studio. Let’s get started.

Getting Started with IBM SPSS Modeler

With IBM SPSS Modeler, you can build machine learning models with drag and drop ease. Using a visual canvas, you can load data, sample it, transform it, apply algorithms and evalute predictive model performance through a series of nodes to find hidden patterns or variables that influence outcomes.

For our first foray into machine learning, we will download and explore Titanic. Titanic is a publicly available dataset from Kaggle about the infamous shipwreck.Titanic sank after colliding with an iceberg killing 1502 out of 2224 passengers and crew. Unfortunately, the ship did not carry enough lifeboats for everyone. To predict what groups of people were more likely to survive than others, we will create a supervised learning model.

1. Creating a Project and Loading Data

After logging into Watson Studio, select New Modeler Flow. Enter a name, keep the default settings, and then click Create.

2. Loading Training Data

Next expand the Import menu, drag the Data Asset node onto the stream canvas and select Titanic training data file (train.csv) in the node settings to load data into the project. Right-click the node and select Preview to see your detailed dataset.

3. Designing a Stream

To build a modeler stream look under Record Operations. Pick Sample and drag it onto the canvas. Then click on the circle on the right side of the Data Asset node and drag the line to the left side of the Sample node to connect the operations. Now right-click on Sample to view the settings. For Titanic, we will use the First n defaults.

4. Choosing Model Algorithms

Now we will experiment with algorithms. Expand the Modeling menu, explore the vast library of available machine learning models. For classifying Titanic survivors, we will pick Decision List, Classification & Regression Tree (C&R Tree), and Neural Net. Drag those three nodes onto the canvas and connect them to the Data Types node. Now let’s run the stream.

To run the stream, click the small blue triangle on the stream canvas top menu. SPSS will process the data through the selected machine learning models. Notice upon run completion, new orange nodes appear. These nodes contain model performance results.

5. Evaluating Model Performance

To review the findings, right-click each of the model results nodes and investigate the evaluation menus. Note each algorithm has different options. For the Titanic C&R Tree Model, females with 1st class tickets had the highest 97.33% probability of survival. Other groups did not fare nearly as well.

6. Deploying and Using Models

Now that you created several simple supervised machine learning models with IBM SPSS Modeler, you would begin testing those models with the unlabeled Titanic test dataset (test.csv) to see if they continue to remain highly accurate for predicting survival outcomes on new datasets.

Keep in mind that finding an optimal machine learning model on your first run is unusual. Typically you will continue to iteratively experiment by refining machine learning model input and algorithm settings to improve predictive accuracy.

After a strong performing model is built, it can be used for predicting new data. To deploy a machine learning model, right-click a final output node and then click Save branch as a model. Navigate to your model list on your Watson Studio project overview page. On the right side of that list, click Add Deployment and choose Web Deployment, Batch Prediction, or Real-time Streaming Prediction. That’s all there is to it.

For More Information

In this tutorial, we introduced how to get started building machine learning models using IBM SPSS Modeler. If you’d like to learn more, please review the following recommended resources.

Tags

Jen Underwood is a Senior Director at DataRobot and founder of Impact Analytix, LLC. She has a unique blend of product management and “hands-on” experience in data warehousing, reporting, visualization, and advanced analytics. In addition to keeping a constant pulse on industry trends, she enjoys digging into oceans of data to solve complex problems with machine learning.
Over the past 20 years, Jen has held worldwide product management roles at Microsoft and served as a technical lead for system implementation firms. She has experience launching new products and turning around failed projects. Most recently she provided advisory, strategy, educational content development, and marketing services to 100+ technology vendors through her own firm. She has been mentioned by KD Nuggets, Information Management and Forbes for her work. She also has written for InformationWeek, O’Reilly Media, and numerous other tech industry publications.
Jen has a Bachelor of Business Administration – Marketing, Cum Laude from the University of Wisconsin, Milwaukee and a post-graduate certificate in Computer Science – Data Mining from the University of California, San Diego. She was also honored to be a former IBM Analytics Insider, Tableau Zen Master, and Top 10 Women Influencer.