Impartido por:

Google Cloud Training

Transcripción

Cloud Dataproc is great when you have a data set of known size or when you want to manage your cluster size yourself. But what if your data shows up in real time or it's of unpredictable size or rate? That's where Cloud Dataflow is particularly good choice. It's both a unified programming model and a managed service and it lets you develop and execute a big range of data processing patterns; extract, to transform, and load batch computation and continuous computation. You use Dataflow to build data pipelines. And the same pipelines work for both batch and streaming data. There's no need to spin up a cluster or to size instances. Cloud Dataflow fully automates the management of whatever processing resources are required. Cloud Dataflow frees you from operational tasks like resource management and performance optimization. In this example, Dataflow pipeline reads data from a big query table, the source. Processes it in a variety of ways, the transforms. And writes it's output to a cloud storage, the Sync. Some of those transforms you see here are map operations and some or reduce operations. You can build really expressive pipelines. Each step in the pipeline is elastically scaled. There is no need to launch and manage a cluster. Instead, the service provides all resources on demand. It has automated and optimized worked partitioning built-in which can dynamically re-balance lagging work that reduces the need to worry about hotkeys. That is situations where just proportionately large chunks of your input get mapped to the same cluster. People use Dataflow in a variety of use cases. As we've discussed, it's a general purpose ETL tool and its use case as a data analysis engine comes in handy in things like fraud detection and financial services, IoT analytics in manufacturing, healthcare and logistics and click stream, point of sale and segmentation analysis in retail. And because those pipelines, we saw can orchestrate multiple services even external services. It can be used in real time applications such as personalizing gaming user experiences.