Challenge:

Data needed to be collected from a multitude of MVPDs and OTT partners, many of which lacked APIs

Data was inconsistent and often incorrect

Network lacked nomenclature to standardize against

Lack of internal skills to collect, correct, and standardize data for business analysis

Solution

Collect performance data via APIs or custom Python scripts

Import all data into Hadoop environment within the network’s private cloud

Create standard nomenclature, normalize all data against it and push to network enterprise database

Create interactive dashboards, with aggregation of show and episode data, for analysis by content-distribution and research groups

Client so impressed with initial dashboards that it requested additional shows

Background

This large and well known network has multiple brands, all of which creates original content that is consumed on its own domain, as well as distributed to myriad MVPD partners, such as Comcast and Verizon, as well as OTT providers, such as Roku.

The content distribution network is the new normal for content creators. Consumers are increasingly turning to video-on-demand to watch shows and channel surf. Although this network had a robust network of partners in place, its content distribution team needed to answer key business questions, such as:

How are individual shows and episodes perform on each partner’s domain?

How does that performance compare to the business terms agreed to by each partner?

How does that performance compare to the performance of its own domain?

Additionally, the team wanted to plot show performance over time, as well as identify the ebbs and flows of consumption within set periods of time, such as business quarters.

The network needed to pull viewership data from a multitude of MVPD and OTT providers, centralize it in a Big Data environment, and come up with a way to do comparisons that people with minimal data-science skills can understand.

Challenge

The network partnered with comScore Rentrak to obtained this data, but found that the data often contained variations and errors. That meant to get 100% of the date, the network needed to collect data from its MVPDs and OTT partners directly. Many offered an API for data collection, but many didn’t, which meant a web-scraping solution, customized to each partner, would be required.

And the data was far from standard or accurate. All sorts of data fields, such as show titles, varied from partner to partner, and often contained manual errors. Complicating matters further, the network lacked a nomenclature for standardizing data. For instance, there was no authoritative definition – much less an authoritative ID – for shows and episodes.

This clearly needed a big data solution and a data science exercise which exceeded the skills of the content-distribution team.

There were other inherent challenges the dashboard itself. To begin, the network wanted one tool for both frontline managers and senior management. The managers wanted a visualization tool that would allow them to answer a range of specific questions their shows and episodes across partners, while senior management wanted to take the data up a level, and drill down into the data based on a different sent a criteria. Creating a single dashboard that allowed both constituencies get the answers they need quickly and easily requires a high degree of art.

Solution

This network engaged Elasticiti to create a solution, as well as train and hand it off the internal staff for ongoing management. The solution consisted of:

Data Retrieval from Partners

Because the client was interested in show- and episode-level data, Elasticiti needed to get raw data from each partner. In some cases, the partner offered an API which Elasticiti could use to APIs to collect the data. But all too often, a partner lacked one. In these cases, Elasticiti data engineers wrote custom web-scraping scripts to collect the data. “These scripts are difficult and time-consuming to write, especially now that web pages are becoming more complex.” explained Ben Reid, CEO of Elasticiti. “But if it’s the only way to collecting a full set of data, then that’s what we’ll do.”

Since there were dozens of scripts to write, Elasticiti opted to use Python, which offers a rich library and is quick to write and is very scalable and readable. In fact, Elasticiti could write Python code in half the time it takes to write code in Java, which saved the client time and money.

Centralize Data in Big Data Environment

Once the data was retrieved from each partner, Elasticiti imported it into a Hadoop environment within the network’s private cloud. This environment was robust enough to handle the tasks of normalizing and analyzing the data, as well as offered the security the network wanted.

Creation of a Standard Nomenclature

Because content distribution via third parties is the new normal of networks, Elasticiti was keen to create a standard nomenclature for show and performance data, regardless of whether it stemmed from the network’s domain or from a partner’s.

Correct and Normalize All Data

Once the nomenclature was created, Elasticiti normalized all of the raw data from all of the partners against it. For instance, Elasticiti eliminated variations in show names (e.g. Show A vs. ShowA), as well as mis-categorizations, such as determining that the record is really attached to the correct show. This also applied to episode names. This was a massive data munging exercise, requiring Elasticiti to eliminate inconsistent formats between thousands of datasets. Upon completion, Elasticiti exported the cleaned up data to the network’s enterprise database so many departments could use it.

Create Interactive Dashboards for Analysis

Finally, Elasticiti created a persona-based interactive dashboards using Tableau Server, a solution that allows over 400 users to access and interact with the reports and download copies to their own laptops via a browser. Put another way, Tableau Server made these reports available at scale, even though none of the business users had Tableau software on their laptops.

The dashboard also included a deep palette of filtering so that each user can easily get the insights of most interest to them. The content-distribution and research teams could easily query the data for business answers, such as:

How is my show trending period over period?

Regardless of scale, are we up or down quarter over quarter?

What is the net impact on my viewership?

For senior managers, data was aggregated up to higher levels.

Tableau excels at data visualization, which makes it easy for non-data science people to glean the insights they need.

Results

Although retrieving and munging all of the data was a huge and time-consuming process, the network was able to see the first of its analysis within 6 months of engaging Elasticiti.

For the first time ever, this network now has a complete picture of its digital/VOD business. This has enabled the content distribution team to assess which partners are top performers by show, as well as track if contractual obligations are met.

And thank to the interactive dashboards, the research team is also able to understand the ebbs and flows of their viewership, and to get a complete picture to report to the sales and editorial team as needed.

“The content distribution is now a fact of life for content creators,” explained Mr. Reid. “Getting up to date and accurate data isn’t optional, it’s table stakes. Without it, networks have no insight into how huge portions of their viewership responds to their shows, and whether their partners are performing as expected.”

The client was so impressed with the initial dashboard and the insights it revealed that it has retained Elasticiti to continue the project.