Whenever I recommend online surveys to my customers, the number one question I get is whether or not the results are trustworthy. My customers ask this question because they fear that website visitors typically won’t bother to fill out questionnaires online and that the collected data are therefore unlikely to be representative.

In this post I will present a case study which confirms that fear. My study shows that there are in fact significant differences between respondents and other visitors: they tend to be more engaged with the website, they see different content and they even come from different geographical areas.

As such the results from online surveys should not be generalized to your entire website without first correcting for sampling error. The way you should do this, however, depends heavily on the purpose of your survey. In some cases, you can get by without correcting for sampling error at all – for example if your purpose is to analyze the (behavioral) causes of satisfaction / dissatisfaction.

Why use online surveys in web analytics?
According to my own definition, web analytics is primarily concerned with measuring online behavior. Web analytics can tell you exactly what people do on your website as well as how often they do it. However, it cannot tell you who the visitors are, why they act as they do and what they think or feel while browsing your site.

The classical example of this limitation in web analytics is the ambiguous meaning of popular metrics such as “Page Views per Visit” or “Average Visit Duration”. High values for these metrics are often seen as signs of satisfaction. In principle, however, they could just as well mean that it was hard for the visitors to find what they were looking for.

Imagine a stubborn visitor who repeatedly tries to complete a check-out procedure. Such a visitor may produce many page views and spend a lot of time on your site even though he or she never completes the task. When you think about it, any given web metric or KPI based purely on behavioral data can always be interpreted positively or negatively.

Given this weakness of traditional web analytics, more and more web analysts are turning to online surveys as a supplement to behavioral data (see here and here). Such surveys enable you to ask the visitors directly instead of guessing from their clickstreams. If they are conducted continuously, you can even use them to include opinion scores in your dashboard alongside your conversion rate and other traditional web metrics.

If done right, online surveys make up a powerful tool for gaining access to the minds of your visitors. They are a perfect companion to web analytics, because they add a subjective dimension to the otherwise purely objective observation of behavior.

The problem of representativenessAlthough online surveys complement web analytics in a powerful way, they have one major drawback: Not all visitors want to fill out questionnaires online and as such data are always sampled. I recently calculated the average response rate for all our survey customers and found that it was only around 8%.

The question is, therefore, whether or not survey data are representative. To answer this we first need to know what is required for a sample to be representative. Contrary to common belief, the small response rates of online surveys are not in themselves a problem. As long as the entire visitor population is relatively large, the sample can be proportionally very small and still be representative.

For example, if you have a total of 5,000 visitors on your website in a given period, you only need 357 respondents in order to be 95% sure that your results are correct +/- 5%. This corresponds to a response rate of only 7% (i.e. less than our average of 8%). If your visitor population increases, the ratio gets even better. Thus, if you have 30,000 visitors on your website, the required sample size is 380, corresponding to a response rate of only 1%.

In most cases the response rate is not the main problem. What is much more problematic is whether or not there is a systematic difference between respondents and other visitors. For a sample to be representative, all members of the population must have an equal chance to be drawn.

This is not the case if certain groups of visitors are more inclined than others to participate in online surveys. In this case, your sample will be biased even if you tried to increase the response rate (e.g. by offering an incentive to participate such as a gift or a chance to win a prize). If there is a systematic difference between respondents and non-respondents, an increased response rate will do little more than underscoring this difference.

A case studyIt is normally difficult to compare online survey respondents with other visitors on a website. In most cases, we have no information about visitors who do not respond. At Netminers, however, we have developed an integrated web analytics and online survey tool. This tool enables us not only to see what survey respondents do online, but also how their behavior differs from non-respondents. We are therefore in a unique position to study the sampling bias of online surveys.

The following study compares respondents with all visitors on 12 websites belonging to the same company (a customer of ours who has kindly allowed us to use their data in an anonymous form). The websites are similar in structure and content, but differ in terms of language. A total of 55 surveys were launched across all of these websites. This resulted in 59,957 respondents from a variety of countries, including Denmark, Sweden, Norway, Finland, Holland, Germany, Poland, France, Italy, Spain, United Kingdom and United States.

In order to make the data more comparable, all repeat visitors were filtered out. This brought the base of respondents down to 43,154 and the total population of visitors to 8.6 million. The reason why repeat visitors should be disregarded here is to avoid counting respondents multiple times. Returning respondents are not invited to participate in the survey again and would therefore wrongly be considered non-respondents. This would distort the comparison.

Let us now look at the results. The following charts show that respondents and other visitors do indeed differ in terms of their behavior. The first chart shows the difference in traffic sources. As we can see, respondents tend to enter the site directly, whereas the rest of the visitors more often come from search engines.

This means that respondents are more likely to know the website beforehand and to visit with a particular purpose. They are unlikely to enter “by chance” because a particular search word happened to bring them to the website.

If we look at the next chart, we see another interesting difference, namely that respondents are less likely to “bounce” when they land on the site. A “bounce” is here defined as a single-page visit, whereas “retained” means a visit which views at least two pages. The chart shows a huge difference: whereas the general bounce rate for the website is 52%, it is only 23% for respondents!

Respondent seems to be much more engaged in the website: They both enter directly and delve deeper into the content after arrival. This is underscored by the fact that if we look only at “retained” visitors, respondents view in average 2 pages more per visit. That is to say, retained respondents view 12 page views per visits, whereas all retained visitors view 10 page views per visit.

The level of engagement is certainly higher for respondents than for all visitors. However, respondents also tend to see different content. The next chart shows the difference in exit pages for respondents and all visitors. More specifically, it shows which content section the visitors exited from.

The website in this case study is in the travel business and its two biggest content sections are called “Inspiration” and “Tourist Information”. Both of these sections have an over-representation of respondents. The reverse is true for the rest of the sections, where respondents are under-represented. It is especially noteworthy that the “Online Booking” section which includes the company’s conversion pages has proportionally fewer respondents.

The last chart compares the geography of respondents and all visitors. Again we see considerable differences. In general, the biggest target groups for the website (i.e. the Nordic countries and Germany) tend to have lower response rates than the smaller ones. What is especially interesting is that the UK stands out with an extremely high response rate. For some reason, Britons are much more likely to accept participation in online surveys than visitors from other countries.

Consequences for analyzing engagementIn this post I have shown that online surveys are indeed biased. The most striking difference is that survey respondents tend to be much more engaged than non-respondents: they know the site beforehand, they bounce less often and they see more pages during their visits.

This is perhaps not surprising: the more involved you are in a website, the more of an incentive you have to provide a feedback which could lead to improvements. In contrast, if you find the website irrelevant from the beginning, and perhaps bounce as a consequence, you have less of an incentive to answer.

What this means is that online surveys are weak when it comes to measuring or analyzing the causes of engagement (or lack thereof). We cannot simply ask visitors why they do not engage since these visitors have no intention of answering. We probably even cannot correct this sampling error by weighting the data since the difference is too big. The bounce rate, for example, is so much lower for respondents that it is doubtful whether bounced respondents and bounced non-respondents are comparable at all.

Consequences for analyzing satisfactionIt could be argued that satisfaction scores are likely to be artificially high among online survey respondents. Given that respondents are more engaged than non-respondents, you might think that they are also more satisfied. This is certainly true if engagement is caused by satisfaction or vice versa.

However, in my view, the relationship between engagement and satisfaction is not that simple. Engagement can be defined as an intensive or sustained focus on something (which is often accompanied by intensive use). This focus is not the same as satisfaction; rather, it is the act of building an experience with the object which eventually leads to an evaluation. If the evaluation turns out positive the person is likely to continue being engaged, whereas if it turns out negative he or she is likely to stop. This is why it is sometime possible to observe a correlation between satisfaction and engagement (measured by use intensity) over longer periods of time.

However, in a short term perspective, such as during a visit, engagement and satisfaction are not correlated. They are only related in the sense that satisfaction presupposes engagement. As such, it could be argued that respondents, who do not engage at all (e.g. those who bounce), should be disregarded entirely when calculating satisfaction scores. Given that such respondents have almost no experience with the website their “evaluation” of it must be considered unreliable. By the same token, it could be argued that highly engaged respondents, who still express dissatisfaction, should be given more weight insofar as their evaluations are more reliable.

If the above argument is true, then online surveys are not weak when it comes to analyzing the causes of satisfaction with a website. By comparing the page views of satisfied and dissatisfied respondents it becomes possible to identify those areas of the website which tend to cause this satisfaction / dissatisfaction. It is less important to correct for sampling error here because those visitors who respond to online surveys are exactly the most reliable ones.

Still, it might be relevant to weight data under certain circumstances. If your aim is to measure the overall satisfaction as accurately as possible (rather than analyzing the causes of satisfaction), you need to make sure that respondents are exposed to more or less the same content as other visitors. As I have demonstrated in this post, this is far from always the case. If possible, you should therefore apply weights to those respondents who have visited areas where respondents are generally under-represented.

Consequences for analyzing demographyFinally, an important reason to weigh your survey data is that respondents tend to differ in terms of demography. In this post I have shown considerable geographical differences between respondents and non-respondents. These differences are likely to skew other, underlying demographic data. It is probably always a good idea to correct for this type of sampling error. However, if your aim is to analyze the demography of your visitors, it becomes imperative.

Did you find this post helpful? Do yo have experience yourself with online surveys? Perhaps you have tried to integrate web analytics and online surveys? Please share your thoughts or experience by leaving a comment!

5 Responses to “Are Online Surveys Reliable?”

Christian. Very interesting post! We too have used our own web analytic product (WebAbacus) to integrate online survey information with clickstream data (for UK sites). We were looking to use the survey responses, along with key clickstream actions, to help build persona based visitor segments for use alongside usability studies.

What we found was that whereas traditional personas have very clearly defined characteristics, it was extremely difficult to replicate this based on the online behaviour. Although we could directly integrate the clickstream data with survey responses, we found little or no significant difference between the various visitor segments in terms of their actual online behaviour.

We have found that the design and form of the survey has a significant effect on the response rate, with exit (or simulated exit) surveys typically delivery the best results. They key with online surveys is to ensure that you are targeting the people you wish to hear from. Additionally surveys, should not be used in isolation, for the exact reasons you mention above. The possibility of bias is high, and comments should therefore be treated and weighed against other research methods.

Overall, survey information can be very helpful to create a richer view of your customer segments, by including responses as part of wider visitor profiling. For example, you can use RFML (Recency, Frequency, Monetary, and Latency) techniques to create an overall score for each customer and when combined with the online survey data you can start to see some interesting patterns in terms of engagement - essentially visitors that respond to surveys are likely to have higher scores, i.e.: have been to the site before, tended to visit regularly, and have likely completed more desirable actions on the site than those that did not respond. As you point out, a large proportion of non-respondents are likely to bounce on the site, and so will also skew the behavioural results.

Although surveys will likely have a skewed and biased response, they are likely to give you feedback from your higher value visitors. The key is then to ensure that this feedback is actioned in some way, else you will be alienating your most profitably visitors!

I think one important conclusion is missing. Looking at the difference between respondents and non-respondents in comparison to sections on the website it seems to me that one of the main reasons why people respond to questionnaires is that they are simply just browsing (non-respondents wants to take action on the site, not including answering surveys) - not looking for anything in particular. Beat me if the respondents does not have a much higher visit duration.

How exciting to get a comment from somebody who also has experience with integrating online surveys and clickstream data!!

I’m not surprised, though, that this comment comes from a person working at Foviance. I have the greatest respect for Foviance and the company’s combined focus on usability and web analytics, which, I suppose, originates from the merger between The Usability Company and Web Abacus some years ago. I think usability is often ignored by the web analytics industry, so it’s really refreshing to hear that there are in fact some exceptions - apart from Netminers of course :-).

You say you had some difficulties finding differences between survey-based “persona” in terms of their actual behavior. I don’t know if this is because of the specific variables you used, but at Netminers we usually find striking, and very interesting, differences.

We typically pick one, primary variable, e.g. the question “Did you find what you were looking for”, and analyze how this correlates with viewed pages. Then we add supplementary variables to “construct” persona around the answers to the primary question: i.e. Yes, I found everything I was looking for; I found some of it, but not all; No, but I found some other interesting information; No, I didn’t find anything of value, etc.

This approach typically reveals interesting differences in terms of where on the website the most positive/negative experiences occur. This is great as a primer for more in-depth, qualitative usability studies, which are guided by, and can elaborate on, the statistical findings. We often recruit respondents for these in-depth studies on behalf of our qualitative usability partners.

Also, thanks for the tips on RFML technique. I’m glad to hear that you have found the same types of bias for survey data. There must be some truth to it, then!

Finally, I have a question for you: In this case study presented in this post there was an extremely high response rate for website visitors based in the UK. Do you have any idea why this might be the case? Could it be that online surveys are rarely used in the UK, meaning that fewer internet users have developed an online questionnaire aversion?

That’s an interesting hypothesis: Some non-respondents might reject the survey because they are simply too busy converting! This might cover up the fact that a sub-group of non-respondents could be very engaged, namely some of those 45% who, in my case study, do not bounce. This would explain why the sections “Online Booking” and “Find accommodation” (which both are strong indicators of booking intention) have more non-respondents than respondents. In total around 20% end their visit there. Notice, however, 20% of 45% is only 9%, meaning that the overall conclusion – that non-respondents are less engaged than respondents – probably still holds (provided that the bounce rate is more or less the same across all content sections).