Data Science on the Google Cloud Platform

Advanced10 Steps10h 11m60 Credits

This is the first of two Quests of hands-on labs is derived from the exercises from the book Data Science on Google Cloud Platform by Valliappa Lakshmanan, published by O'Reilly Media, Inc. In this first Quest, covering up through chapter 8, you are given the opportunity to practice all aspects of ingestion, preparation, processing, querying, exploring and visualizing data sets using Google Cloud Platform tools and services.

DataMachine Learning

Prerequisites

This Quest assumes you have access to the O’Reilly book Data Science on the Google Cloud Platform as the labs only include the exercises from the end of each chapter and do not contain the concepts or teaching from the text itself. The labs use GCP Services and Tools for data storage, transformation and warehousing, so it is recommended that the student also has earned Badges for the Baseline: Data, ML, and AI and the GCP Essentials Quests before beginning.

In this lab you will simulate a real-time real world data set from a historical data set. This simulated data set will be processed from a set of text files using Python and Google Cloud DataFlow, and the resulting simulated real-time data will be stored in Google BigQuery.

Use Google Dataflow to process real-time streaming data from a real-time real world historical data set, storing the results in Google BigQuery and then using Google Data Studio to visualize real-time geospatial data.

Learn the process for partitioning a data set into a training set that will be used to develop a model, and a test set that can then be used to evaluate the accuracy of the model and then independently evaluate predictive models in a repeatable manner.