Launching Dataproc Jobs with Cloud Composer

Connection Details

7 积分

info_outline

This lab costs
7 积分
to run. You can purchase credits or a subscription under My Account.

Launching Dataproc Jobs with Cloud Composer

GSP286

Overview

In this lab you'll use Google Cloud Composer to automate the transform and load steps of an ETL data pipeline. The pipeline will create a Dataproc cluster, perform transformations on extracted data (via a Dataproc PySpark job), and then upload the results to BigQuery. You'll then trigger this pipeline using either:

HTTP POST request to a Cloud Composer endpoint

Recurring schedule (similar to cron job)

Cloud Composer workflows are comprised of DAGs (Directed Acyclic Graphs). You will create your own DAG, including design considerations, as well as implementation details, to ensure that your prototype meets the requirements.