The Evolution Path of Spark/Hadoop on the Cloud

分会场： 爆款架构/数据平台/工程实践

案例来源 :

案例讲师

姚依非

Google
senior software engineer

Graduated from Carnegie Mellon University, Yifei has worked on multiple large-scale services and platforms at Amazon, Apple and Google.
Yifei is currently a senior software engineer on the Google Cloud Dataproc team. Cloud Dataproc is a managed Spark and Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning on Google Cloud Platform. Prior to joining Google, he has worked on centralized machine learning platform at Apple and marketplace platform at Amazon.

扫描二维码分享案例

案例简述

In this talk, I will introduce how Google is able to integrate open-sourced data processing frameworks such as Hadoop and Spark within the entire cloud platform. We will go through details of service components and ways to best integrate the frameworks into cloud ecosystem. We will also discuss improvements and features that would bring the best of Hadoop/Spark and cloud to achieve optimal performance.

案例目标

The amount of data that is generated everyday by the users and the services have grown exponentially throughout the years. In this case, it means that huge amounts of data will need to be processed in the cloud, and it will put a great pressure on the cloud analytics systems. The goal of this talk is to showcase the integration of Hadoop/Spark on the cloud platform, and discuss the ways to evolve the architecture to provide better performance and experience.

成功（或教训）要点

案例ROI分析

By successfully integrating Hadoop/Spark clusters on Google Cloud, we enabled many large customers to be able to successfully migrate their data processing pipeline and workloads onto the cloud. The improved performances and tight integration with other cloud products gave us the performance and user experience edge over other solutions.