Novice-Centric Visualizations for Machine Learning

View/ Open

Date

Author

Metadata

Abstract

This thesis focuses on visualizations for machine learning tasks. More specifically, we create a taxonomy for existing machine learning visualizations, and design a system to help machine learning novices perform labelling tasks.
There are many mature visualizations to help people understand the performance of current classifiers, including scatterplots, confusion matrices and ROC curves. However, most machine learning researchers are unaware of the visualization possibilities that exist, and many published visualizations are too task-oriented or dataset-oriented to be easily applied to other tasks. This thesis defines a taxonomy for machine learning visualizations in three dimensions: the data displayed, the advanced features to add for a specific task, and the goal of the visualizations. This taxonomy seeks to help machine learning researchers select a better visualization method to analyze their data.
Previous machine learning tools focus on presenting comprehensive information to experts, treating machine learning as a black-box for end-users, or explaining the reason behind the prediction in a simple and clear way. However, to build a machine learning system, one needs to label data first, and a lot of machine learning novices want to build a classifier themselves simply by labelling data. This inspired our idea to design and implement the Label-and-Learn system, which includes five visualizations to help users better understand their data, the likelihood of the classifier's success, and to improve their user experience.
To evaluate the utility of our Label-and-Learn system, we ran user studies to compare the visualization system and traditional system in the quality of the labels, the user's mental model about the task, and the user experience. The results from the experiment show that visualizations have no negative effect on the quality of the labels, but do improve the user's mental model and the user experience. The success of the Label-and-Learn system should inspire further research in using visualizations to improve the user experience of data labelling in machine learning tasks.