Hitting the rooftop of modern data architecture

Description

Please register for a ballot ticket here:
Due to the popularity of Data Science Festival events, we are now allocating event tickets via a random ballot. Registering here enters you into the ticket ballot for the Data Science Festival Event at Qubole on May 22nd, 2018, the ballot will be drawn on the 18th May 2018. Those randomly selected will then be e-mailed tickets for the event, with the joining details.
If you get an allocated ticket, please bring a copy of your paper ticket or your ticket on your phone to the event to check in with your QR code. Tickets are non-transferable.
Our next meetup is being supported by Cervello and Qubole and will be hosted at one of the most impressive rooftops in Shoreditch, The Ace Hotel. Aimed at data scientists and data experts, the interactive tech talks &amp; whiteboard sessions will cover the make-up and experiences of how to build and utilise a Modern Data Architecture.
You’ll see key insights of how some of the most data-driven companies in the UK have broken down siloed data and limited access. On top of this, you’ll see real-world Retail data analytics use cases that use real-time POS data that is used to create a pipeline that prepares ingested data for reporting and Machine Learning.
Agenda
5:45-6:15 Mingling – beers &amp; refreshments
6:15-6:30 Introductions &amp; meet up group announcements
6:30-7:00 Talk #1 - Enabling Data Science through a Modern Data Architecture
7:00-7:15 Ask an Architect – grab a beer and pizza and join one of the many whiteboard sessions (data science / modern data architecture)
7:15-7:45 Talk #2 - 2018 Big Data Cloud Trends
7:45-8:15 Real-time Retail Data Science Pipeline Deep-Dive
8:15-9:00 Ask an Architect – grab a beer and pizza and join one of the many whiteboard sessions (data science / modern data architecture)
Here’s what you’ll take away from this meetup:
Best practices for building an end-to-end data operation in AWS from ingestion to reporting and data science
The constructs of a Modern Data Architecture (MDA) in the cloud without breaking the bank
Benefits of decoupled storage and compute for big data workloads, and how Qubole fits in the MDA
Right tool right use case: Hadoop, Spark and Presto to advance your business insights - from real time analytics to machine learning and stream processing, while reducing costs on your data warehouse with proper ETL methodologies.
Optimization of data pipelines using EC2 Spot Instances and other automation technologies for big data
Techniques that Cervello uses for building their AWS architecture for customer-facing products and internal analytics
Session #1 - Enabling Data Science through a Modern Data Architecture
Glyn Heatley from Cervello will talk through the principles of a Modern Data Architecture (MDA) and about the different roles Qubole plays in MDA, from data ingestion pipelines, ETL to ML. The principles of a Modern Data Architecture take into consideration the demands of high growth, innovative, acquisitive organisation that is flexible, extendable and scalable. Qubole enables flexibility in a MDA that extends beyond the rigidity of the tenets of legacy data architecture solutions. Glyn will highlight how Qubole enables businesses to leverage both governed and semi-governed data to quickly derive insights and build complex models without building out huge on-premise enterprise data infrastructure.
Session #2 - 2018 Big Data Cloud Trends
Ajith Ramanath, Qubole’s Technical Director of Solutions Architecture EMEA, will dive into how different Data Engineering and Analytics teams use Hadoop, Presto and Spark and why you should look at these engines for different use cases (ML, ETL, concurrent interactive analysis, etc.). As well as industry trends that are coming out of these engines for improved reliability, performance, and cost for different workloads in the cloud. Ajith will highlight some examples in which data teams are leveraging Modern Data Architectures to consume, read and productionise information, as well as how Qubole is enabling them to achieve 70-90% in cost savings at petabyte-scale.
Real-time Retail Data Science Pipeline Deep-Dive
Hira Virdee &amp; Sudha Regmi, Cervello’s Data Scientists will give a demo of a Data Science pipeline in Qubole. They will use real world retail company data to showcase the data lifecycle from ingestion to analysis. During the demo, the group will use some of the most commonly used data engineering and analytics languages (Scala, SQL, PySpark) as well as demonstrate insights and analysis from the data.