I agree to TechTarget’s Terms of Use, Privacy Policy, and the transfer of my information to the United States for processing to provide me with relevant information as described in our Privacy Policy.

Please check the box if you want to proceed.

I agree to my information being processed by TechTarget and its Partners to contact me via phone, email, or other means regarding information relevant to my professional interests. I may unsubscribe at any time.

Please check the box if you want to proceed.

By submitting my Email address I confirm that I have read and accepted the Terms of Use and Declaration of Consent.

However, the cloud can also complicate data processing pipelines, as it requires analysts to build complex workflows that extract data from multiple sources, funnel it through various filters and then feed it to different data warehouses and analytics services.

To simplify data pipeline development, Google users can deploy Cloud Composer, a managed workflow orchestration service based on the open source Apache Airflow project.

Initially developed by Airbnb, Airflow automates data processing workflows that were previously written as long, intricate batch jobs. Users construct Airflow job pipelines as a directed acyclic graph (DAG), written in Python, to dynamically instantiate them via code.

Airflow's primary components are:

the Python job definitions;

a command-line interface to run, pause, schedule and test jobs, along with various commands to manipulate the DAG, metadata and variables;

Google Cloud Composer features

Cloud Composer provides functionality similar to that of infrastructure-as-code services, such as Google Cloud Deployment Manager or AWS CloudFormation, for operations teams. It includes a library of connectors, an updated UI and a code editor.

Google Cloud Composer also includes:

full integration with various Google Cloud Platform (GCP) data and analytics services, such as BigQuery, Cloud Storage and Cloud Machine Learning Engine;

access controls to the Airflow web UI using the Google Identity-Aware Proxy;

support for connections to external environments, both on premises and on other clouds; and

compatibility with open source Airflow; and

support for other community-developed integrations.

Composer basics and pricing

The Composer Airflow environment uses Google Compute Engine instances to run as a Kubernetes cluster. During configuration, users specify the instance type, node count, VM disk size and various network parameters and can optionally set up email notifications with the GCP SendGrid service.

As mentioned above, Google Cloud Composer workflows are described as DAGs, which are a set of tasks to be run, as well as their order, relationships and dependencies. To cite an example from Google, if a three-node DAG has tasks A, B and C, the workflow might specify that task A needs to run before B, but C can run whenever. A DAG might also specify constraints, such as, if task A fails, it can be restarted up to five times. Tasks A, B and C could be anything, such as running a Spark job on Google Cloud Dataproc.

Like other GCP services, Google Cloud Composer pricing is based on resource consumption, which is measured by the size of its environment and the duration of workflow operations. Specifically, users are billed per minute based on the number and size of web server nodes, database storage and network traffic. Users pay for Composer, as well as for the underlying Kubernetes nodes, which run the Airflow worker and scheduler processes, and for Google Cloud Storage buckets, which store DAG workflows and task logs. To more accurately estimate your costs, refer to the GCP documentation's pricing example.

Join the conversation

1 comment

Register

I agree to TechTarget’s Terms of Use, Privacy Policy, and the transfer of my information to the United States for processing to provide me with relevant information as described in our Privacy Policy.

Please check the box if you want to proceed.

I agree to my information being processed by TechTarget and its Partners to contact me via phone, email, or other means regarding information relevant to my professional interests. I may unsubscribe at any time.