More On The Set-Top-Box Data Mystery

In an unauthorized and completely biased plug for this blog, I must admit that I have been amazed at the sheer number of emails and calls I have received over the last few weeks. While the subject matter has been as diverse as the location of the people sending the messages -- Dushanbe, Tajikistan; Christchurch, New Zealand; Victoria, Australia; Ahmadabad, India; Santiago, Chile -- the most common question from those outside the United States has mirrored that from those in this country, "Can you please explain set-top-box data to me?"

The short answer -- set-top-box data is the present and future for television audience research -- will undoubtedly spark cries of outrage by some in the industry. Perhaps the easiest way to address the mystery of set-top-box data is to answer questions and address issues. We can begin with the list assembled by Rainbow Media's Charlene Weisler that appeared on a recent TV Board. My comments follow:

Footprint issue: Some cable operators provide all set-top box data in a market and others provide a subset of a market. Until there are standards, the resulting analysis must be directional. We have diaries in some markets; tuner meters and diaries in others. Still a third group has local people meters tied to tuning. There are three standards locally. How many households in today's national television panel come from Omaha? How many come from St. Louis? How about San Diego? It is a secret. We know virtually nothing about the national panel except what the company that manages it tells us -- that it is representative and selected randomly, without bias of any sort. Any researcher providing analysis based on set-top-box data can be just as forthcoming as the current ratings provider.

Data collection issue: Some data is pulled and other data is pushed. I have collected data from telecommunication networks for nearly twenty years. Believe me when I say that the act of pulling, polling or pushing has no effect on data quality. With respect to set-top-box data, every return path system I have been involved with since 2001 has incorporated an error identification and correction system that renders the resulting viewing data very robust and extremely reliable.

Rating or delivery: Polling a set-top box is different than polling a tuner meter box. Polling is polling regardless of the device being polled. That having been said, I cannot recall a single, modern set-top-box data collection system that polled what channel was tuned at a particular time. Every system that I have studied captured device state changes, which is a fancy way of saying every time something happens in a set-top box related to television viewing, the box phones home. However, on the flip side, most panel-based tuning meters do poll. In fact there are a number of interesting historical notes related to how many times a household counts when multiple sets are tuned to different channels within a given 15-minute period. But that is a subject for another column.

Smallest viewing increment: polling does not provide insight into actual viewing. Again, I think the issue is twisted around. Typical set-top-box data provides the researcher with the ability to assemble perfectly homogeneous recreations of viewing history -- at a second-by-second level. Traditional tuner-meter data has been based on polling and has often been summarized at a 30-second, minute, average minute or quarter-hour interval.

Trick play: DVR metrics need to be decided. DVR data is not mysterious. DVR data is identical to linear data, except that there are two timeframes: real-time (e.g. time and date the program was recorded and/or played) and content-time. When a viewer is watching linear television, the two times are identical. When it is delayed, they are not. Some linear viewers channel surf during commercials; some DVR viewers fast-forward through commercials. To make sense of the data, all DVR viewing must be matched to content-time. Reporting will be based on when (or if) the program was played. Some advertisers will be interested in same-day viewing, while others will be interested in viewing up to one week later. There is no mystery here.

Latency: Content-time and real-time are not always the same everywhere, even with respect to linear television. This is a great question and should be the subject of a full discussion, but suffice it to say the vast majority of timing error is stable by nature, which is engineer speak for "once we figure it out, the solution doesn't change much over time." Broadcast and distribution propagation error as well as DVR cache are the primary sources of latency.

Tuning Events and Dwell Times: How long does someone have to be watching to be counted? We could argue at length, but most people would agree that the answer is somewhere between three and 10 seconds. Most set-top boxes take less than one second to tune,and a viewer spends anywhere from two to eight seconds to decide whether to continue watching or surf away. Analysis based on set-top-ox data tends to involve hundreds of thousands, if not millions, of data points and as such, small differences in dwell time for most programs tend not to be material.

Picture-in-Picture and Differences in Set-top boxes: STB data is inconsistent and DVR data reflects behavioral differences. The whole point of audience measurement is to understand what audiences are doing. I challenge anyone to estimate DVR behavior without DVR data. (And to anyone who might buy such analysis, I want to sell you a great condo down in Miami -- cheap!) Just because some set-top boxes do not provide access to the second tuner (used for picture-in-picture), does not mean the entire data set is suspect.

Set-top box on, TV off. There is an assumption that this poses a problem. If I only had a nickel for every time I heard this objection to using set-top box data... Set-top box on, TV off is a simple noise reduction problem and typically involves a small percentage of set-top boxes. The issue would be inherently more complicated if everyone tuned to a particular network or program before they went to sleep, but that simply does not happen. I think this particular objection can be traced to a single person who has an axe to grind over set-top-box data, not that I am pointing a finger at Paul Donato.

Lack of demographics. Nielsen will match to its sample. Others will do something different. Nielsen has spent a great deal of money extolling its approach as the "gold" standard, but compromising high quality, set-top-box data with demographics obtained from a biased, small sample to "fix" demos is bad science. Demographics are important, but more reliable solutions to the demographic problem will emerge -- as long as companies believe they are participating in a free market. If we embrace new approaches to the problem, we are much more likely to get a better solution. Behavior-based metrics are interesting and are arguably more reliable than demographic ascription will ever be.

Box availability within the home. Not all homes have boxes. The last time I checked, not all homes had Nielsen boxes either. Let's do a quick count. Millions and millions of advanced set-top boxes, thousands of Nielsen boxes. In every market where I have analyzed set-top-box data, a significant percentage of the set-top boxes were primarily tuned only to broadcast channels. A subset of set-top boxes that tuned to broadcast channels for more than 95% of their viewing minutes could be used as a surrogate for broadcast-only homes. In a small market like Hawaii, that subset was measured in the thousands. Cable and satellite companies spend millions of dollars every year to market to these broadcast-only homes. Reams of data are available on this market segment that would assist in modeling such viewing. Compare that to a recruited panel. What do we know about those households who say no to being recruited-by some counts, three in four? Typically, precious little data is published about these households. We must assume that the reasons they do not participate do not affect their television viewing.

Back channel. A work in progress. Getting all the data back is the goal. However, the fact that all the data is not coming back should not deter the industry from learning and moving forward with the data we have at hand.

Advertising on television has become complicated; there is no getting around it. As the industry moves toward an on-demand, delayed content delivery model that embraces addressable advertising, panel data no longer provides an acceptable level of accuracy for ratings. There is no firm ground from which to argue that set-top-box data is NOT the panacea. Progressive television researchers have questions and are searching for answers. Hopefully, more of us will do as Charlene Weisler has done and explore this new environment.

Unfortunately, despite the looming quagmire, a large number of industry insiders have made known their desire to remain aboard the panel bandwagon until the bitter end. This is a mistake. I have seen the digital people popping up in areas they have not been seen before. They understand that transactional data is very much like set-top-box data. No doubt some 20something up and comers with graduate degrees will make these "complicated" problems go away; let us hope that they do not send some of us along with it.