Bracketology with Google Machine Learning

GSP461

Overview

In this lab you will predict the winner of a NCAA Men's Basketball tournament game using BigQuery, Machine Learning (ML), and the NCAA Men's Basketball dataset.

This lab uses BigQuery Machine Learning (BQML), which allows you to use SQL to create ML models for classification and forecasting.

What you'll do

In this lab, you will learn how to:

Use BigQuery to access the public NCAA dataset.

Explore the NCAA dataset to gain familiarity with the schema and scope of the data available.

Prepare and transform the existing data into features and labels.

Split the dataset into training and evaluation subsets.

Use BQML to build a model based on the NCAA tournament dataset.

Use your newly created model to predict NCAA tournament winners for your bracket.

Prerequisites

This is a fundamental level lab. Before taking it, you should have some experience with SQL and the language's keywords. Familiarity with BigQuery is also recommended. If you need to get up to speed in these areas, you should at a minimum take one of the following labs before attempting this one:

Once you're ready, scroll down to learn about the services you will be using and how to properly set up your lab environment.

BigQuery

BigQuery is Google's fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes and terabytes of data without managing infrastructure or needing a database administrator. BigQuery uses SQL and takes advantage of the pay-as-you-go model. BigQuery allows you to focus on analyzing data to find meaningful insights.

There is a newly available dataset for NCAA basketball games, teams, and players. The game data covers play-by-play and box scores back to 2009, as well as final scores back to 1996. Additional data about wins and losses goes back to the 1894-5 season in some teams' cases.