Voices of CMB: The Chadwick Martin Bailey Research Blog

Sig Testing Social Media Data is a Slippery Slope

During a recent social media webinar, the question was raised “How do we convince clients that social media is statistically significant?” After an involuntary groan, this question brought two things to mind:

There are a lot of people working in social media research who do not understand the fundamentals of market research; and

Why would anyone want to apply significance testing to social media data?

Apparently, there’s much debate in online research forums about whether significance testing should be applied to social media data. Proponents argue that online panels are convenience samples and significance testing is routinely applied to those research results – so why not social media? Admittedly that is true, but the ability to define the sample population and a structured data set should provide some test/retest reliability of the results. It’s not a fair comparison.

I’m all for creative analysis and see potential value in sig testing applied to any data set as a way to wade through a lot of numbers to find meaningful patterns. The analyst should understand that more things appear to be significant with big data sets so it might not be a useful exercise for social media. Even if it can be applied, I would use it as a behind-the-scenes tool and not something to report on.

Anyone who has worked with social media data understands the challenging, ongoing process of disambiguation (removing irrelevant chatter). There are numerous uncontrollable external factors including the ever-changing set of sites the chatter is being pulled from. Some are new sites where chatter is occurring but others are new sites being added to the listening tool’s database. Given the nature of social media data, how can statistical comparisons over time be valid? Social media analysis is a messy business. Think of it as a valuable source of qualitative information.

There is value in tracking social media chatter over time to monitor for potential red flags. Keep in mind that there is lot of noise in social media data and more often than not, an increase in chatter may not require any action.

Applying sig testing to social media data is a slippery slope. It implies a precision that is not there and puts focus on “significant” changes instead of meaning. Social media analysis is already challenging – why needlessly complicate things?

Cathy is CMB’s social media research maven dedicated to an “eyes wide open” approach to social media research and its practical application and integration with other data sources. Follow her on Twitter at @VirtualMR

Comments

I believe the MIT PHD's are better suited to evaluate the general sentiment from social media than traditional market researchers.

Either way, why not view social media as a enormous pool of people that obviously care about certain brands and topics, and in turn, entice those people to share their opinions about the questions that you as a market researcher are looking to answer?

I have personally observed that many managers quickly forget all the caveats about tests of significance and the assumptions behind such tests that people might typically use.

Every good stat professor, even in intro courses, spends a lot of time explaining the many caveats, but as students enter the real world they often forget them.

That's when the 'slippery slope' begins.

On the other hand, statistical analysis and modeling can be used heuristic ally, to provoke new ways of looking at field data, etc., but don't cite 'significsnce' in reports to clients or management. Just share the provocative insights and see if you can get buy-in to do a 'real' experiment, or at least , a better structured field observation.

But, if someone is really tempted to do statistics around behavioral/observational data sets, at least brush up on 'non-parametric' statistics (which may be more appropriate, though not frequently taught in intro stat course for undergrads or MBA candidates).