Evaluating a Data Model

1 Hour 7 Credits

GSP204

Overview

In this lab you will learn the process for partitioning a data set into two separate parts, a training set that will be used to develop a model, and a test set that can then be used to evaluate the accuracy of the model and then independently evaluate predictive models in a repeatable manner. Then you'll re-create the model developed in a previous lab in this quest using the training data set and evaluate it against the test data set. The data is stored in Google BigQuery and the analysis will be performed using Google Cloud Datalab.

The data set that is used provides historic information about internal flights in the United States retrieved from the US Bureau of Transport Statistics website. This data set can be used to demonstrate a wide range of data science concepts and techniques and will be used in all of the other labs in the Data Science on GCP quest.

Cloud Datalab is a powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud Platform. It runs on Google Compute Engine and connects to multiple cloud services such as Google BigQuery, Cloud SQL or simple text data stored on Google Cloud Storage so you can focus on your data science tasks.

Google BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage.

How can we help you?

Get Help

Learn the process for partitioning a data set into a training set that will be used to develop a model, and a test set that can then be used to evaluate the accuracy of the model and then independently evaluate predictive models in a repeatable manner.