Return To Sender: How Big Data Alone Can Be Biased and Unrepresentative

In a day and age of device and audience fragmentation, it’s clear that every viewer is a potential consumer of importance, even if the personalization of their content consumption, as well as the content itself, is much more granular.

Being able to measure in a way that fairly represents all races, ages, ethnicities and behaviors is crucial for the industry to transact with confidence. It’s also the only way to make sure that content choices reflect the diversity of a given station’s community.

Whether it’s programmers seeking to uncover the composition of their true audience diversity, to make scheduling decisions, advertisers looking to reach specific segments with pinpointed messages or media owners making more of an effort for on-screen inclusion by casting with diversity in mind, all operators in the industry have a business imperative for knowing what the true audience makeup is. That’s why it’s essential that any measurement insights they are relying on be fully representative of the rich pastiche of the U.S. population. No group or groups should be knowingly or unknowingly excluded or underrepresented.

In short, there is no longer such a thing as “niche” viewers or networks, and no audience should be left behind because of measurement processes that fail to account for them or, worse, even consider them. When it comes to measurement, inclusiveness is an imperative and not an option.

And while there is a lot of upside to big data, there is also a downside if companies don’t treat it responsibly. An approach that leverages the strength that this data has to offer, such as providing measurement stability in a high-fragmented viewing environment, with true persons-level measurement is crucial. Simply put, big data as a standalone resource is unfit for fully understanding audience dynamics.

A recent Nielsen analysis looked at how big data, built without representation in mind, could obscure what those true audiences are because of inherent bias, as the data included, such as people WITHOUT set top boxes, people who leverageover-the-air (OTA) signals and stream over-the-top (OTT) content to watch premium television programming.

Specifically, the analysis sought to understand the audience measurement differences between return path data (RPD)—homes that have set top boxes capable of returning data—and homes with viewer data that has been calibrated based on Nielsen’s panel of viewers. The analysis found uncalibrated RPD data that uses dubious weighting methodologies undercounts minority audiences and is inherently biased. Likening it to “census” data is a methodological leap of faith.

After all, Americans are no longer approaching their video programming needs the same. Some don’t have the income to spend on premium entertainment content; others opt for OTA programming in light of improving digital technology. Widespread technological advancements have fueled a steady growth of broadband-only (BBO) homes as well. The combination of OTA and BBO homes have swelled in the U.S. from 15 million homes in 2014 to nearly 28 million homes in 2018. When you take into account that 41% of the consumers in those 28 million homes are multicultural (either Hispanic, African American or Asian) and 10% are a younger demographic (18-24), it’s clear that an RPD sample would significantly under-represent these audiences and skew the total audience measurement.

RPD-capable data alone consistently under-represents Hispanic and African American homes compared to other household types. Compared with official U.S. Census estimates and Nielsen’s representative national panel, RPD-capable homes under-represent Hispanics by 33%, Spanish-language dominant Hispanics by 49% and African Americans by 34%. When you compare RPD-capable with OTA/BBO homes, the representation disparity is even bigger. RPD-capable measurement under-represents Hispanics by 50%, Spanish-language dominant Hispanics by 68% and African Americans by 38%. Weighting alone does not cure this problem, and the fact that millions of RPD homes are counted doesn’t matter. A large biased sample is still biased.

And it’s not just multicultural audiences that these sources skew.

From an age perspective, RPD-capable data under-represents younger demos and over-represents older age groups. For instance, consumers 25-34 are undercounted by 26%, while Persons 50+ are actually over-represented by 15%. What about the larger key demo of 18-34? Nielsen’s national panel and Census data also shows there are 69.8 million adults 18-34 within TV households as of December 2018. This demo is leading the cord-cutting revolution and accounts for the largest share of cord-cutters by demo. But RPD-capable homes are 17% less likely to typify adults 18-34 accurately then a representative panel.

By under-counting adults 18-34, marketers, media owners and everyone in between have fewer people to reach if they rely solely on RPD data. Weighting for this issue may hide the inherent problem with RPD data, but it won’t fix the issue or uncover the unique viewing behaviors of these audiences. RPD homes do not represent the viewing of non-RPD homes. Occasional online surveys done every few years applied to complex daily viewing records is a cheap and careless way to look like something has been corrected.

A look at consumers who belong to the RPD group, a non-RPD-capable group (meaning these consumers might have a set-top box that doesn’t return data back) and the growing OTA/BBO group reveals marked differences in their behaviors and lifestyles. This is something only gleaned by direct observation no matter how much weighting is being done and no matter the size of the big data inputs, be it a sample of 30 million, billion or trillion.

So, what does this mean for actual programming that is powered by multicultural audiences? It means that all sources need to be considered and all types of audiences need to be observed in order to be counted in and calibrated with any big data set.

For example, a show like Fox’s Empire, where the audience composition is predominantly multicultural, the analysis found that these audiences were anything but “niche,” considering the show’s history as a program near the top of the ranks. In fact, diverse audiences made up 75% of Empire in December 2018 and these audiences certainly helped drive ratings success when using a representative panel.

But because of its inherent bias of under-representation, these multicultural audiences were not fairly reflected resulting in significant undercounts of Empire’s audience when looking at this show through an RPD lens. The differences are quite large. Looking at a rank among 25-54 year old viewers, Empire ranked 16th using Nielsen’s representative panel, but dropped to 38 in RPD homes. Conversely, Empire ranked third among OTA homes, which, while not surprising because these homes are more diverse, demonstrates the critical nature of actually including these homes and accurately measuring their behavior in any sample.

In the end, finding an approach that relies on anything that is less than full, accurate and inclusive measurement and the foundational elements and core principle of inclusion could become compromised. Counting these “niche” viewers and their behaviors out by definition can have far reaching implications that could destabilize the market—and marketers—with misinformation and perhaps even set back on-screen inclusion.