Case Study Media Math

Building a world-class digital advertising analytics platform using Qubole Data Service

I am very happy with Qubole! Our goal at MediaMath was to take our existing industry leading infrastructure to the next level handling new complex analytics tasks. Qubole has helped us enable this goal with minimal risk.

Renee Englehardt

VP, Analytics

About MediaMath

MediaMath, a 750-employee company based out of New York City, founded in 2007, is the leading global digital media-buying platform. MediaMath develops and sells tools for Digital Marketing Managers under the TerminalOne brand. TerminalOne allows Marketing Managers to plan, execute, optimize, and analyze marketing programs. This is a case study written by MediaMath for Qubole.

Background

The Analytics and Insights team at MediaMath is responsible for delivering decision-making infrastructure and advisory services to our clients. The team does this by helping clients answer complex business questions using analytics that produce actionable insights. Examples of the team’s work includes but is not limited to:

Segmenting audiences based on their behavior including such topics as user pathway and multi-dimensional recency analysis

Building customer profiles (both uni/multivariate) across thousands of first party (i.e., client CRM files) and third party (i.e., demographic) segments

Objectives

Challenges

Complexity of transforming Semi-Structured data

Repeatable Data Pipelines

Low Risk Apache Hadoop

Hadoop on-premise vs cloud

The Challenge

Our flagship product captures all kinds of data that is generated when our customers run digital marketing campaigns on TerminalOne. This data amounts to a few terabytes of structured and semi-structured data in a day. It consists of information on marketing plans, ad campaigns, ad impressions served, clicks, conversions, revenue, audience behavior, audience profile data, etc. At MediaMath, we are always looking to enhance our cutting edge infrastructure. We were looking to take our existing capabilities to the next level to manage new innovative analytics tasks.

Processing this raw data to segment the audience, optimize campaign yield, compute revenue attribution, etc., is a non-trivial problem for some of the following reasons:

Complexity of transforming Semi-Structured data

Transforming session log data to construct user sessions and click-path analysis for further analysis is a complex process. We knew that Apache Hadoop was an attractive alternative but we wanted a solution that our analysts could easily use and get started with quickly and did not have to worry about the operational management of such technical options. We wanted a solution where analysts could focus on their data and transformations without having to think about issues such as cluster sizes, Apache Hadoop versions, machine types and other elements of cluster operations.

Repeatable Data Pipelines

We needed a service to develop data pipelines that repeated the same transformations, day-after-day, week-after-week, without much intervention from my team once it was setup. Automating the execution of the data pipeline, while honoring the interdependencies between the pipeline activities was a crucial requirement! We had learned our lessons via prior experiments with cron that this wasn’t the best approach.

Low risk Apache Hadoop

We needed something that was reliable and easy to learn, setup, use and put into production without the risk and high expectations that comes with committing millions of dollars in upfront investment.

The Solution

Big Data Analytics Solution

During our trial, we quickly created an account on Qubole and the team helped us upload sample data. We started using the system and immediately started to see the value of it. Within hours, we were able to re-use a number of very useful, business-critical, custom Python libraries that we had developed, matured, and stabilized. These libraries computed revenue attribution by customer and by campaign by mashing together semi-structured and relational data, as well as other useful tricks.

Cloud

We also noticed that the cloud-based Qubole clusters automatically grew the number of compute nodes as we started to run more queries and scaled the cluster down as the number of queries went down. This operational efficiency was a plus as we didn’t have to continually reach out to our partners in Engineering who have the complex task of managing our mission critical production systems.

Data Pipelines

Qubole’s engineering team worked with our team to build a custom data collector from our Oracle Database to my Amazon S3 account. Using their S3 Loader and Sqoop-as-a-Service offering, they setup a pipeline that loaded the S3 data into Qubole’s Big Data Analytics Solution, did all kinds of processing, and pushed the resulting summaries into a MySQL instance that both our customers and we could query using our BI tools. We were set up and running in a few days.

Risk Free

Qubole‘s interfaces, including its easy to use GUI that really simplifies big data and its support for SQL with easy ways of embedding custom libraries, made it easy to learn. Using their GUI, setting up and tearing down clusters was totally transparent — as an analyst I did not have to take on such an operations headache. We saved the company a few million dollars of upfront investment by going with Qubole. Also, the Qubole guys are a seasoned bunch who seem know what they are doing, and have credible answers and solutions to the team’s questions. They are a Skype-chat or a phone call away whenever my team needs help with issues or change requests. I don’t feel I am taking on a huge risk by going with Qubole. Over time, they have become a partner in my team’s success, one to whom I delegate my big data platform needs.

“We needed something that was reliable and easy to learn, setup, use and put into production without the risk and high expectations that comes with committing millions of dollars in upfront investment.”

Qubole is the leader in Big Data SaaS. Qubole was founded by Ashish Thusoo and Joydeep Sen Sarma, former leaders of Facebook’s data infrastructure organization and long-time contributors to Apache Hadoop and creators of Apache Hive. Qubole is trusted by the largest brands in social media, online advertising, entertainment, gaming and other data-intensive ventures.

Qubole Data Service (QDS) provides a turnkey SaaS solution for Big Data teams, running on the top performing elastic Hadoop engine on the cloud and includes a library of data connectors with graphical user-interface for Hive, Pig, Ooze and Sqoop. QDS makes it easy to inspect data, author and execute queries, and convert queries into scheduled jobs. With QDS, the power of Big Data meets the simplicity of the cloud.

Prakash Janakiraman, Co-Founder and VP Engineering

Qubole is a significantly more polished product than EMR. Data scientists can explore their data in S3, create tables and query those tables all via an easy-to-use web UI

Yali Sassoon, Co-founder

Snowplow Analytics

Qubole’s fantastic support has been key in our successful deployment. They continue to deliver of new features and revisit the ones that we ask for

Joris Spermon, VP Tech & Development

YD World

Our goal at MediaMath was to take our existing industry leading infrastructure to the next level handling new complex analytics tasks. Qubole has helped us enable this goal with minimal risk.

Marc Rossen, Sr. Director Data and Analytics

MediaMath

Instead of worrying about provisioning clusters of machines or job flows or whatever, Qubole lets you focus on your data and your queries … The Qubole guys have been extremely helpful!

Nicholas Andonakis, Senior Product Analyst

BigCommerce

The service spins up users’ clusters only when a job is started, then automatically scales or contracts them based on the workload, and spins the servers down once the job is done.

Derrick Harris, Senior Writer

GigaOM

Qubole’s Hadoop and Hive interfaces are vastly superior to the default CLIs, which scare business analysts and hinder meaningful analyses of the gaming logs that we collect. With Qubole, business analysts are self-sufficient in using a Big Data platform to meet their advanced analytic needs.

Senior Director, Game Dev Ops and Analytics

Online Gaming Company

top-performing technologies in the data industry are definitely taking aim at democratizing data tools and bringing the power of data to smaller businesses. This is a major change in the data industry, and Qubole Data Service is a great example

Geoff Domoracki, Founder and CEO

DataWeek

I’m very happy to be using Qubole in production. Qubole has saved me a lot of time, effort, and trouble in getting my data processing pipelines up and running. My data pipelines process Appnexus data in Amazon S3 which is then stored in Vertica. The engineering team understands the complexities and provided awesome support!

Chief Engineer

Real-time Ads Retargeting Startup

There’s a whole world of web companies, SMBs and other non-Facebooks or Yahoos that will want to use Hadoop but not want to run it in-house…offering a cloud service makes it easier for these users to get started with the platform and for Qubole to keep improving.

Derrick Harris, Senior Writer

GigaOM

Qubole offers a big data ETL and exploration service through auto-scaling Hadoop clusters with a web user interface for data exploration and integration with various data sources. The service can do (nearly) everything EMR can do, and it goes further

Christian Prokopp, Contributor

George Chow, CTO

Simba Technologies

“The integration of Tableau and Qubole makes it faster and easier for our customers to operationalize Big Data…lowers the resource barriers to deriving the benefits of Big Data because customers can deploy our joint solution seamlessly and cost effectively.”