Sign up or log in to save this to your schedule and see who's attending!

Big Data processing in real-time is on the rise at Apache with projects like Apache Spark, Apache Flink or Apache Apex. However at this moment we don’t have a unified framework to evaluate the correctness and the performance of these systems. Apache Beam implements a unified model to write both Batch and Streaming jobs with a single API and execute them independently in any of the supported platforms (runners), this makes Beam an ideal candidate to support an evaluation framework.

In this talk we will present Nexmark, a benchmark framework to evaluate queries over data streams. An implementation of Nexmark was donated by Google as part of the Apache Beam incubation process. Nexmark bridges the gap for evaluating data processing frameworks, but also serves as a rich integration test to evaluate the correct implementation of both the Beam runners and the new features of the Beam SDK.

Etienne has been working in software engineering and architecture for more than 13 years in domains such as retail or financial groups. He has been focusing on Big Data for a few years on technologies such as Apache Cassandra, ElasticSearch or Apache Spark. He is an Open Source fan... Read More →

Software Engineer with more than ten years of experience designing and developing information systems for financial groups, telecom companies and startups. Focused on Big Data and Cloud architectures (aka Distributed Systems). He works at Talend France as an Open Source Software Engineer... Read More →