Sliders: Good for White Castle, Bad for Research

Editor’s Note: Ron Sellers wades into the great “respondent user experience” debate in his usual down-to-earth and irascible way. Now this one should start a good debate!

By Ron Sellers

If you have participated in a survey online, you probably have used sliders at some point. Survey designers often include sliders in order to enhance respondent engagement, or to make a larger scale (e.g. 1 – 100) seem more approachable and natural than asking people to type in a number.

Even if you’ve used sliders as a participant, I’m hoping you haven’t done so as a researcher. A new report from Grey Matter Research shows if you did, your sliders very well may have biased your data.

In an online survey of 1,700 adults (with a demographically representative general population sample from an online panel, conducted in English and Spanish), Grey Matter included a couple of question sets with sliders. One used a seven-point scale, and one a five-point scale.

The problem with sliders is that, unlike radio buttons, they require a starting point on the screen for the slider. The slider button respondents move must start somewhere on the scale – at the low point, at the mid-point, at the high point, or somewhere else.

This creates a couple of problems. First, let’s say you decide to start your slider at the mid-point of a seven-point scale (a 4). What do you do if the respondent wants to select a 4 as her answer? You can accept the lack of movement of the slider as a legitimate response, but then you can’t differentiate between people who purposely wanted to choose a 4 and those who just didn’t bother to move the slider. Or you can force movement of the slider in order for the response to be recorded, but then someone who wants to choose a 4 must move it off the 4 and back on again. (This is the approach we took in our study.)

But a far more significant problem is that our research found that people’s answers depended significantly on where the slider started. We randomized the starting point (one-third saw it at the bottom of the scale, one-third in the middle, and one-third at the top). After about 500 completes, the data was evaluated.

Through five questions using the five-point scale and nine questions using the seven-point scale, we found pervasive bias according to the starting points. People who started in the middle of the scale were more likely to choose a mid-point answer. Those who started at the top of the scale were more likely to choose a higher number.

But the effect was particularly strong at the bottom end of the scale. People who saw their slider start at the bottom were strongly biased to choose a low number on the scale. Up to three times more likely, in fact, than people who started elsewhere. It doesn’t take a research genius to see the problem here, nor to realize how much worse it would be if we hadn’t randomized the starting points on our sliders.

All of the nasty details are in the report How Sliders Bias Survey Data, which is available upon request from Grey Matter Research. You’ll read why Grey Matter no longer uses sliders in any survey.

Although our latest work focused specifically on sliders, there’s a bigger issue here – how much are attempts at respondent engagement corrupting the data we get? When we move away from tried-and-true questionnaire design in quantitative studies and start using things such as drag-and-drop, gamification, cartoon icons that “guide” respondents through the questions, thermometer-style graphic measures, and other approaches, are we sure that we’re getting the benefits of respondent engagement without the downside of simply getting wrong data?

And even bigger than that is the issue of why we need respondent engagement in the first place – is it because people are bombarded with too many extremely long questionnaires, surveys that don’t really apply to them, repetitive question sets, lengthy and boring grids, and other things that are making participation tedious and causing respondents to lose interest?

How much better would it be if we simply design a good, simple, relatively brief questionnaire that respects our respondents and doesn’t require us to resort to tricks and gimmicks in order to keep them engaged?

Sliders may bring intense brand loyalty to White Castle, but they’re probably best left to the fast food industry rather than the research world.

11 responses to “Sliders: Good for White Castle, Bad for Research”

Hi Ron, glad to see you keeping this issue alive. A few of us have contributed significant research-on-research on these topics for the past few years – and have shared our findings at conferences around the world … including ESOMAR, the MRA, CASRO, MRIA, Market Research Event and the ARF.

We are all in agreement that putting sliders or other engaging survey interfaces on a 25+ minute online survey is tantamount to lipstick on a pig. Our learning has similarly concluded that almost any changes to an interface will lead to data inconsistencies/differences.
– This is why our approach has always focused on “purpose-based” use of innovation … for example, are you trying to mitigate straightlining? Big ‘ol ugly grids can be replaced by other layouts – so think about the reasons/purpose behind making changes to surveys, not “pretty for the sake of pretty”.

The recommendation should be calibrate each of the different question formats you use so you understand the balance of answers each delivers before using them. Because as Bernie has said you only have to blow on a likert scale for the answers to change.

You can get equally large distortions in the answers with normal button question if you simply change the wording of the range options, or alter the number of anchor labels you show, or change where you place these anchor labels or If you show the choices monadically or in a comparative set or if you use images to visualize the anchor points or if you place the same question at the start or end of a survey, or if you compare the first 5 answers to the last 5 answers, or in you interview people on a Monday morning or Saturday night or if you are interviewing people in Japan v India or if you are doing a face to face interview compared to an online interview or if you are using standard panel or river sample.

Sliders are very good at allowing respondents to make refined relative judgement, they have an important role to play in many surveys, particularly when you are asking them to compare how much the like or rate similar things. But yes using them naively as every day substitute for standard likert scale questions with no understanding of the impact of the slider position effects is not recommended.

A big problem with sliders is that they provide an anchor. The anchoring bias that Ron’s study is illustrating is a well known psychological phenomenon… and it’s not a newly discovered bias (I was studying about it in the 70’s).

There’s a lot to be said about the benefits of unmarked continuous scales with labels only at the two opposing poles, however, sticking an anchoring point on them defeats the purpose. Can’t the technology allow to simply click anywhere on the scale to mark one’s answer and then slide the marker if needed?

All questions, all types of scales and data capture devices have their inherent biases so I guess the real moral of the tale – with this article – is “at least use sliders knowing their effects and characteristics. In any case we can still see which statements score RELATIVELY strongly or less strongly.

I’m more troubled by the anchoring effects inherent in large batteries of questions where – University of Adelaide research found there to be something like a 30% skew in the answers depending on whether the respondent had given a very positive, or very negative answer on their first two 5pt or 7pt scale questions. Once a pattern is established (hmmmn, these seem to basically positive…) the respondent tends to ‘satisfice’ and answer the rest of the battery up (or down) that particular end of the scale.

And we should not think that randomising the battery really solves the problem. In aggregate it smooths out and disguises biases of different respondents, but at a respondent level the biases are still there.

Getting back to the use of sliders, I’ve found them to be wonderful to use – especially in questions where we’re trying to get respondents to apportion market share, or “what percentage of your meals involve cooking…” or similar questions where we’re trying to get respondents to add things up to 100. Respondents get it – it is an intuitive device much better suited than radio buttons that over discrete choices.

Do sliders deliver biased data? For sure. But we bias the data the moment we put pen to paper and start expressing questions in the subtly diffrerent ways that we do. This is just part of the fuzzy territory we work in. As the great John Tukey pointed out – stats falls into two categories. One involves certainty and hundreds of decimal points. The other involves rough guesstimates and estimations and the fact that people are involved. He asked rhetorically whether our job is to get things roughly right…or precisely wrong. Tukey suggested that being roughly right was a much better outcome.

I think that Sliders are as good as the more boring single select radio button answer scale if used appropriately. That is – if the start point is an issue in sliders then it is important to randomise the start point or not even have a start point.

we have measured radio button scales that return significantly different results depending on the order you create the scale in – that is: if the scale (5 or 7 point) starts high (eg Strongly agree to Strongly Disagree) will render a different result if the scale starts at the other end (regardless of the direction of the radio buttons – left to right or top to bottom) .

The cure is to rotate the order – from high to low then low to high, randomly. Similarly with sliders the cure is to randomise the start point or not have a start point at all.

However – using a slider that merely captures where a respondent sits on a 1 to 7 scale is silly anyway. you have lost all the distance information in between the points which in marketing can be critical. The more powerful way to measure scales is with a visual drag and drop scale which renders highly accurate detailed information of exact scale positioning.

Interesting comments and observations. A few thoughts on some of the things that have been said…

Randomizing the start point on sliders does not work sufficiently; our research showed a much greater bias if the start point is at the low end of the scale than at the mid-point or the high end, so you still get bias if the start points are randomized.

I fail to see the use of sliders without a start point on smaller scales – if someone is going to click on the “5” on a 10-point scale, are they really going to make any use of the ability to slide it to a 4 or a 6? Why not just have the radio button?

It may seem silly to employ sliders on 5-point or 7-point scales, but I see it being done quite a bit, which is one of the reasons Grey Matter measured this.

As for order bias on radio button scales, has anyone done research to show whether there’s a stronger bias if low-to-high than if high-to-low (or the other way around)? If there’s no difference, then while randomizing the order isn’t a perfect solution, but the low-to-high bias would balance out the high-to-low bias…unlike what we found with the sliders, which was that the low-to-high bias was substantially stronger than with other starting points.

We recently conducted some research on the issue of order bias on radio button scales, that demonstrated some bias depending on whether the scale was ordered low-to-high then high-to-low by using two matched samples and reversing the scale direction. The results may be of interest as it also depended on the products being rated.