The impact of academic sponsorship on Web survey dropout and item non-response

This paper reports two experiments in which the prominence of university sponsorship on Web surveys was systematically manipulated, and its effects on dropout and item non-response were observed. In Study 1, 498 participants were randomised to online surveys with either high or low university sponsorship. Overall, 13.9 percent of participants commenced, but did not complete the surveys, and there was no difference between the proportions of participants dropping out of each condition. However, counter to our predictions, participants in the high sponsorship condition displayed significantly higher item non-response. In Study 2 (N = 159), which addressed a rival explanation for the findings in Study 1, the overall dropout rate was 23.9 percent and sponsorship prominence had no effect on either outcome variable. Overall, these findings suggest that hosting information pages on university Web sites, placing university logos on survey pages, and including the name of the university in survey URLs do not reliably impact on dropout or item non-response. Although it may seem disappointing that enhancing sponsor visibility is not sufficient to reduce dropout and item non-response, researchers without ready access to university Web servers or branding will appreciate these findings, as they indicate that minimally visible sponsorship does not necessarily compromise data quality.

Since the 1993 public release of the first major graphical interface for the World Wide Web, the Mosaic browser, global Internet penetration has increased rapidly (Zakon, 2011). By the end of 2011, around one-third of the world’s population was defined by the Internet Telecommunications Union (ITU, 2011a) as “Internet users”, although access remains heavily skewed in favour of developed nations (with penetration exceeding 90 percent in parts of Europe; ITU, 2011b), the wealthy, educated and young (Australian Bureau of Statistics [ABS], 2014). In Australia, where the current research was conducted, around 79 percent of the population have regular access to the Internet, mostly at broadband speeds (ABS, 2011). The situation is similar in the U.K. and the U.S. (ITU, 2011b).

As the role of the Internet in everyday life has increased, researchers have sought to exploit the opportunities it affords for data collection (Skitka and Sargis, 2006; Reips, 2007; Lee, et al., 2008). Although a wide variety of different types of research are now conducted either partially or completely online (including qualitative, non-reactive and experimental studies), Web surveying is currently dominant (Reips, 2008; Buchanan and Hvizdak, 2009; Krantz and Williams, 2010), and is continuing to grow in popularity (Lee, et al., 2008).

The popularity of Web surveying can be linked to the many advantages it provides over telephone or paper-based surveying. These include the ability to rapidly access large samples (Skitka and Sargis, 2006; Rentfrow, et al., 2008), which are often more diverse and ‘representative’ than traditional samples (Gosling, et al., 2004; Lewis, et al., 2009); the ability to connect with rare, geographically disparate or otherwise difficult to access participants (e.g., Hildebrandt and colleagues, 2006, large sample of anabolic steroid users); reduced social desirability and experimenter expectancy effects (Hewson and Laurent, 2008); and, the ability to easily randomize and impose conditional logic on the presentation of survey items and stimuli (Best and Krueger, 2004).

Despite these advantages, there are also a number of challenges associated with Web surveying. For example, researchers cannot easily exert control over the conditions under which participants complete Web surveys, and consequently it’s difficult to know if and how divided their attention is during completion (Stieger and Reips, 2010). There are also unique ethical considerations (Allen and Roberts, 2010; Buchanan and Williams, 2010; Roberts and Allen, 2015); concerns about multiple submissions (Reips, 2002); relatively low response rates (e.g., 10–11 percent lower than other surveying methods in two recent meta-analyses; Lozar Manfreda, et al., 2008; Shih and Fan, 2008); higher levels of item non-response (i.e., missing data; Heerwegh and Loosveldt, 2008; Scott, et al., 2011, but see Denscombe, 2009; Dillman, et al., 2010) and relatively high dropout rates (Peytchev, 2009; Rossman, et al., 2011). It is these latter two concerns — item non-response and dropout — that are the focus of the current research.

Dropout and item non-response

Dropout (also referred to as break-off or non-completion) rate can be defined as the proportion of participants who start, but do not finish a Web survey (Heerwegh and Loosveldt, 2006; Ekman, et al., 2007). It is the inverse of retention rate, which is the proportion of participants who reach and complete the final page of a survey (Göritz, 2006b). In a face-to-face or telephone setting, social pressures can inhibit a participant’s desire to say ‘I want to stop now’, regardless of a researcher’s assurances that they can withdraw at any time (Buchanan and Williams, 2010). No such pressures exist in an online context, and consequently, drop out rates are often quite high. For example, in the 20 Web experiments described by respondents in Musch and Reips’ (2000) survey of online researchers, the mean dropout rate was 34 percent, and ranged from one percent to 87 percent. In a methodologically similar study of Web surveys, dropout rates ranged from 0 percent to 73 percent, with a mean of 16 percent (N = 68; Lozar Manfreda and Vehovar, 2002).

Research has identified many factors that either cause, or can predict dropout from Web surveys. Causal factors, which have captured most research attention thus far, include the provision of incentives (Göritz, 2010, 2006a, 2006b; Sauermann and Roach, 2013); the stated and actual length of the survey (Galesic and Bosnjak, 2009; Hoerger, 2010; Yan, et al., 2011) and the burden it places on participants (Crawford, et al., 2001); the use of individual invitations versus general requests for participation during recruitment (Lozar Manfreda and Vehovar, 2002; Heerwegh and Loosveldt, 2006; Sánchez-Fernández, et al., 2012); if and how progress indicators are used (Matzat, et al., 2009; Conrad, et al., 2010; Yan, et al., 2011); the use of the forced-response feature available in most Web surveying applications (Fuchs, 2003; Heerwegh, 2005; Stieger, et al., 2007); how the survey is structured (e.g., one versus many items per page; Lusinchi, 2007); and how items are presented and ordered (Heerwegh and Loosveldt, 2002; O’Neil, et al., 2003; Ekman, et al., 2007). Individual differences factors correlated with dropout include the education level of participants (Ekman, et al., 2007; Peytchev. 2009), their student status (O’Neil, et al., 2003; O’Neil and Penrod, 2001) and level of interest in the survey topic (Roßmann, et al., 2011).

Item non-response occurs when participants do not answer survey questions they have been exposed to, and are eligible to complete (Bosnjak and Tuten, 2001). For practical purposes, response options like “don’t know” and “prefer not to say” are typically also treated as item non-response by most researchers, even though it’s recognised that these are not perfectly equivalent types of missing data (Albaum, et al., 2011). Research comparing the extent of item non-response across different surveying modes has produced mixed findings. For example, Heerwegh and Loosveldt (2008) found that both item non-response and endorsement of the “don’t know” response options were significantly higher for Web survey respondents than face-to-face respondents. However, their effects were small and can be partially attributable to the absence of a “don’t know” option on their response cards (as is common practice in face-to-face surveying) coupled with the interviewers’ use of probing techniques to elicit responses from participants. When researchers have compared Web to mail-surveys in both experimental (Kwak and Radler, 2002; Bech and Kristensen, 2009; Messer, et al., 2012) and quasi-experimental designs (Haraldsen, et al., 2002; Denscombe, 2006; Lorenc, 2010; Israel and Lamm, 2012; Lesser, et al., 2012), they have tended to find less item non-response in the Web mode. However, this finding is not unequivocal, with Millar and Dillman (2012) and Wolfe, et al. (2009) both reporting no differences between modes. When a modal difference is observed, it is typically small for fixed-choice items, but larger for open-ended items (Huang, 2006; Denscombe, 2009), although again this is not always the case (Millar and Dillman, 2012).

One intuitively appealing remedy for item non-response is the use of the forced-response feature that is available in most online surveying applications. When deployed, this feature prevents a respondent from continuing to the next item or page until the current one has been completed. However, using it often results in significantly higher dropout. For example, Stieger and colleagues (2007) displayed a ‘hard prompt’ error message each time a respondent attempted to skip a survey item, and found that those exposed to the message were three times more likely to drop out than those who were not. Fuchs (2003) and Heerwegh (2005) reported similar trends, although they did not reach statistical significance in the latter’s research, where the overall level of dropout was also much lower.

Because of the undesirable consequences associated with the forced-response feature, we need to consider other techniques and strategies that can be employed to reduce item non-response and dropout. One such potential strategy involves enhancing the prominence of survey sponsorship, which is the independent variable (IV) in the current research.

Survey sponsorship

The sponsor of a survey is the agency or organisation responsible for “funding part or all of the sampling and data collection activities and typically has first or exclusive rights to the data” [1]. Research has indicated that offline surveys with university or government sponsorship tend to yield higher response rates than those sponsored by commercial entities (Heberlein and Baumgartner, 1978; Fox, et al., 1988; Edwards, et al., 2002; Groves and Peytcheva, 2008; but see Yammarino, et al., 1991). It has been argued that this effect can be attributed to the higher prestige, moral authority or legitimacy that university or government sponsorship tends to convey (Groves, et al., 1992; Boulianne, 2008). Furthermore, compared to surveys by commercial organisations, those by university or government departments are less likely to be sales calls disguised as ‘research’, and more likely to contribute to the advancement of science or well-being of the community (Boulianne, 2008). Although no research has directly examined the effects of sponsor type on Web survey response rates, intent to respond to a Web survey has been predicted by trust in its sponsor (Fang, et al., 2009) as well as the survey sponsor’s reputation (Fang, et al., 2012). Finally, when Boulianne, et al. (2011) manipulated Web survey sponsor prominence, such that members of a university community were invited to complete a survey about transportation issues by either the university’s transportation department or its survey centre, it had no impact on response rate. The absence of any effect in Boulianne and colleagues’ study is perhaps unsurprising, considering the subtlety of their manipulation, in which both survey invitations came from different departments within the same university, and the invitation from the survey centre clearly indicated that the transportation department was actually conducting the research.

Research examining the effects of sponsorship on survey dropout and item non-response is more limited, and the findings are mixed. In an off-line context, Peterson (1975) reported higher item non-response for a business sponsored survey compared to a university-sponsored survey, whereas Jones and Linda (1978) reported no differences in item non-response between business, university and government sponsored surveys. When Etter, et al. (1996) compared mail surveys sponsored by either a private medical practice or a university, they similarly found no item non-response differences. Online, it is possible to also study dropout behaviour in relation to survey sponsorship, although few researchers have done so, and again their findings have been inconsistent. For example, both Heerwegh and Loosveldt (2006) and Boulianne and colleagues (2011) manipulated the prominence of Web survey sponsorship, although only the latter observed a reliable effect. However, Heerwegh and Loosveld’s study, the experimental condition in which participants were exposed to the sponsoring university’s logo on every page of the survey did exhibit lower dropout, albeit not significantly lower.

Rationale and hypotheses

There is an absence of experimental research examining the effects of Web survey sponsorship on both dropout and item non-response. The online studies that have been conducted have only examined item non-response, and have produced inconsistent findings. Furthermore, the available off-line studies have predominantly manipulated the nature of the survey sponsor, rather than its prominence or intensity. However, in authentic research situations it is far more likely that a researcher will be able to enhance or reduce sponsorship prominence (e.g., by adding or removing sponsor branding) than be able to change the nature of the sponsor (e.g., from a commercial to a non-commercial sponsor) in his or her efforts to reduce dropout and item non-response.

The current research addresses these deficits in the literature, and also tackles a practical question that we have asked over several years as supervisors of undergraduate psychology dissertation research, which has increasingly become survey-based and online at our university. Specifically, we have wanted to know if the extra time and effort involved in securing permission to use corporate university branding on Web surveys and hosting information pages on university Web sites actually have a reliable impact on the quality of Web survey data. To answer this question, we describe two experiments in which the prominence of university sponsorship on Web surveys was systematically manipulated, and its effects on dropout and item non-response were observed.

At a time when online methods are increasingly dominating survey research, it is important that we expend effort on studying easily manipulated variables that have the potential to affect data quality, which has long been a concern to survey researchers (e.g., Blasius and Thiessen, 2012). Dropout and item non-response are two key indicators of data quality. They reduce the overall volume of data available for analysis (with consequent implications for statistical power), impact on the representativeness and generalizability of findings, and can also raise difficult ethical issues (e.g., should a respondent’s data be included in analyses if they drop out on the final page of a Web survey? de Leeuw, et al., 2003; Denscombe, 2009; Roßmann, et al., 2011). One such easily manipulated variable is sponsorship prominence and, based on prior empirical and theoretical (e.g., the authority principle; social exchange theory; see Heerwegh and Loosveldt, 2006) work, we hypothesised that, compared to Web surveys displaying a low level of university sponsorship, those displaying a high level would result in reduced (H1) dropout and (H2) item non-response.

Study 1

Method

Participants

A convenience sample of 498 adults were recruited via face-to-face (e.g., flyers) and electronic (e.g., e-mail messages and links on Web sites) methods. Amongst those who provided demographic data (over 80 percent of the sample), the mean age was 24.49 years (SD = 7.89) and gender was evenly split. Sixty percent of the sample identified as students, while less than six percent identified as either unemployed or retired. The majority (95 percent) of the sample reported that they access the Internet over a broadband connection.

Prior to recruiting participants, this study was reviewed and approved by our local Human Research Ethics Committee (HREC). Participants were treated in accordance with the Australian National Health and Medical Research Council’s (2007) statement on ethical conduct in human research. No compensation was provided for participation, however participants were offered the opportunity to enter a prize draw as a token of our appreciation for their time.

A 78–item, 10–page online ‘Internet Piracy Survey’ was used to collect the data reported in this study. The survey contained measures of privacy concern (16 items; Buchanan, et al., 2007), psychological reactance (11 items; Hong and Faedda, 1996), perceived behavioural control for pirating digital content (7 items; Cronan and Al-Rafee, 2008; Wang, et al., 2009), intent, attitudes and subjective norms regarding piracy (19 items; Cronan and Al-Rafee, 2008), the perceived legitimacy of digital publishing companies (12 items; Wolf, 2009), piracy behaviour (six items), and seven demographic questions. These measures primarily used Likert-type response formats, although several items used check-boxes and text-fields instead. In the current study, we are not concerned with participants’ substantive responses to these measures; only whether or not they completed the survey, and the total number of items responded to.

Four versions of the survey were developed that were identical in content, but differed in presentation format. The first pair of surveys were hosted on our faculty Web server using LimeSurvey (http://limesurvey.org), and were preceded by an information page on our school Web site. Our university logo featured prominently on every page of these surveys, representing “high” university sponsorship, which is the first level of the IV in this study. The second level of the IV, “low” university sponsorship, was characterised by a pair of information sheets and surveys that were hosted on SurveyMonkey.com. Our university logo did not appear anywhere on these surveys, although its name was mentioned twice in the information sheets, in accordance with institutional ethical requirements. One version of each pair “forced” participants to answer every question on each page before continuing, whereas the other did not (i.e., all questions were “optional”). Both LimeSurvey and SurveyMonkey were selected due to their popularity (Allen and Roberts, 2010) and comparable feature sets.

The information page that preceded each version of the survey described the research as investigating factors influencing Internet piracy and survey completion behaviours, but did not explicitly mention the experimental manipulation. However, at the end of each survey, participants were automatically re-directed to a page on our school Web site that revealed the full nature of the study.

Dropout, the first dependent variable (DV) in this study, was operationalised as whether or not the participant clicked the “submit” button at the end of the survey. Item non-response, the second DV, was operationalised as the number of items (out of 78) that the participant provided a response to.

Procedure

Prospective participants were initially directed to a page on our school Web site, which simply thanked them for their interest in the study and requested that they click on a link to continue to the survey. Attached to this link was a Perl script (Wright, 1996), which automatically randomised each participant to one of the four versions of the survey. The only way to detect the presence of the Perl script on this page was to examine its source code. Participants then read through the information sheet, worked through the 78 items on the relevant version of the survey and, if applicable, were re-directed back to the school Web site on completion. The four survey groups were of a statistically equivalent size, and did not differ on any of the demographic characteristics measured, suggesting that the randomisation was successful.

Results and discussion

Overall, 13.9 percent of participants commenced, but did not complete the surveys. The proportion of participants who completed the high sponsorship surveys (.856) did not differ from the proportion who completed the low sponsorship surveys (.867), 95 percent CI of the difference between proportions [-.050, .072], χ2 (1, N = 498) = 0.13, p = .718, two-tailed, ϕ = .016.

Of those who completed the optional format surveys (n = 216, representing 87.10 percent of participants exposed to this format), members of the high sponsorship condition (Mdn = 77.00) answered significantly fewer items than members of the low sponsorship condition (Mdn = 78.00), Hodges-Lehman 95 percent CI of the median difference [-1.00, 0.00], U = 3734.50, z = -5.29 (corrected for ties), p < .001, two-tailed. This difference could be described as medium-sized, r = .36. There was no such difference between members of the high (Mdn = 21.50) and low (Mdn = 27.00) conditions who did not complete the optional response surveys (n = 32, representing 12.9 percent of participants exposed to this format), Hodges-Lehman 95 percent CI of the median difference [-27.00, 1.00], U = 90.50, z = -1.17 (corrected for ties), p = .243, two-tailed. However, this difference was non-trivial (r = .21), and thus non-significance should be interpreted with caution.

Counter to our predictions, there is some evidence to suggest that reducing the prominence of university sponsorship may increase the number of items that participants respond to in online surveys utilising an optional response format. However, LimeSurvey and SurveyMonkey differ in terms of basic page formatting, load speeds and several other factors, which could be responsible for these findings. These confounds were addressed in Study 2, which delivered both high and low sponsorship surveys on the same survey platform (Qualtrics.com).

Study 2

Method

Participants

A convenience sample of 159 adults were recruited via face-to-face (e.g., flyers) and electronic (e.g., e-mail messages and links on Web sites) methods in mid-2011. Amongst those who provided demographic data (over 90 percent of the sample), the median age range was 21–30 years, and 70 percent were female. They were treated in accordance with local ethical guidelines, and were not offered any incentives or compensation for participation.

Measures

A 65-item, seven-page online ‘Internet Behaviour Survey’ contained five demographic items as well as measures of Internet use (six items), privacy concern (16 items; Buchanan, et al., 2007), perceived credibility of the survey sponsor (three items; Rifon, et al., 2004), attitudes toward the survey sponsor (three items; MacKenzie and Lutz, 1989), trust in the survey sponsor (three items; Fang, et al., 2009), and willingness to disclose personal information in online surveys (29 items; Joinson, et al., 2008). These measures used a variety of response formats, including Likert-type, semantic differential, check boxes and text fields. Like Study 1, we are not concerned with participants’ substantive responses to these measures; only whether or not they completed the survey, and the total number of items responded to.

To operationalise the IV for this study, two versions of the survey were developed that were identical in content, utilised an optional response format, and were hosted on Qualtrics.com. The first survey represented a high level of university sponsorship, was preceded by an information page on our school Web site, and had the university name and logo featured prominently on every page, and in the survey URL. The information sheet for the second survey was hosted on Qualtrics.com, along with the survey itself. There were no university logos displayed on this version of the survey, and the survey URL did not contain the university name. It should be noted however that the university name was mentioned twice in the information sheet, in accordance with institutional ethical requirements.

The information page that preceded each version of the survey described the research as investigating Internet behaviour and factors influencing how people respond to online surveys, but did not explicitly mention the experimental manipulation. However, at the end of each survey, participants were automatically re-directed to a page on our school Web site that revealed the full nature of the study.

Dropout, the first DV in this study, was operationalised as whether or not the participant clicked the “submit” button at the end of the survey. Item non-response, the second DV, was operationalised as the number of items (out of 65) that the participant provided a response to. Note that 26 items offered participants a “prefer not to say” option, which was coded as the absence of a response for the purposes of data analysis.

Procedure

Prospective participants were initially directed to http://internetbehavioursurvey.com (no longer live), which simply thanked them for their interest in the study and requested that they click on a link to continue to the survey. Attached to this link was the same Perl script (Wright, 1996) used in Study 1, which automatically randomised each participant to one of the two versions of the survey. Participants then read the information sheet, worked through the 65 items on the relevant version of the survey and, if applicable, were re-directed back to the school Web site on completion. The two survey groups were of a statistically equivalent size, and did not differ on age or gender, suggesting that the randomisation was successful.

In this paper we have described two studies in which the prominence of university sponsorship on Web surveys was systematically manipulated, and its effects on dropout and item non-response were observed. In the first study, findings indicated that a high level of university sponsorship might actually increase item non-response. However, when alternative plausible explanations for this effect were ruled out in Study 2, it disappeared. In neither study did we observe any effect of sponsorship prominence on dropout. Overall, these findings lead us to conclude that hosting information pages on university Web sites, placing university logos on survey pages, and including the name of the university in survey URLs do not reliably impact on dropout or item non-response. However, these measures may provide other benefits, such as enhancing the honesty and candidacy of responding, or improving response rates. These issues will require investigation in future research.

On the surface, these findings may seem disappointing, as when viewed in conjunction with Heerwegh and Loosveldt (2006) they suggest that simply enhancing sponsor visibility is not sufficient to reliably reduce dropout and item non-response. However, researchers, and particularly student researchers, without ready access to university Web servers or branding will appreciate these findings, as they indicate that minimally visible sponsorship does not necessarily compromise data quality.

It should be noted that our low sponsorship condition did not reflect a complete absence of sponsorship, as pragmatic and institutional considerations meant that our affiliation with a university would still have been obvious to most participants. For example, invitations to participate were sent from university e-mail addresses, and the information sheet clearly indicated that the study was being conducted by university-based researchers, and had been approved by a university HREC. However, these are minimum standards that all legitimate researchers ought to be able to meet, and thus it could be argued that attempting to reduce the prominence of sponsorship below this level would only decrease the ecological validity of our experimental manipulation.

It should also be noted that this study only investigated manipulating the prominence of sponsorship by one Australian university, which typically ranks towards the bottom of the top third of Australian universities on commonly cited indices of performance (e.g., Times Higher Education Supplement, 2012). It says nothing about the prominence of other types of sponsorship, such as that of internationally renowned universities [3], or different types of government or commercial entities. Nor does it speak to possible interactions between sponsor prominence, affiliations with the sponsor, knowledge or it, or attitudes towards it. Again, these are possible topics for future research.

Conclusion

In summary, this paper presents two studies in which the prominence of a university sponsor on a Web survey was systematically varied, and its effects on dropout and item non-response were observed. Their findings lead to the conclusion that the prominence of university sponsorship affects neither, although it is yet unknown whether it affects the initial decision about whether or not to participate in a piece of research, or decisions about how to respond to specific survey items. It is anticipated that future research will shed light on these issues.

About the authors

Peter J. Allen is Lecturer in the School of Psychology and Speech Pathology at Curtin University in Perth, Western Australia.Direct comments to: p [dot] allen [at] curtin [dot] edu [dot] au

Lynne D. Roberts is an Associate Professor in the School of Psychology and Speech Pathology at Curtin University, and an OLT National Teaching Fellow.E-mail: lynne [dot] roberts [at] curtin [dot] edu [dot] au

2. For these two Mann-Whitney U tests, “prefer not to say” (which was a response option for 26 of the items on the Joinson, et al., 2008, measure) was coded as the absence of a response. Treating “prefer not to say” as a response yields essentially equivalent results. Interestingly, the proportion of our participants making use of “prefer not to say” was far lower than that observed by Joinson, et al. (2008). On average, it was used by less than 2.5 percent of participants on each item, and was selected most frequently for the items about participants’ previous sexual partners (12.4 percent of responses), visits to their doctor (9.9 percent) and support for the death penalty (9.1 percent).

3. Although it should be noted that the sponsor in Heerwegh and Loosveldt’s (2006) research is the top ranking university in Belgium, and one of the top 20 in Europe, according to the Times Higher Education Supplement (2012).

Tom Buchanan, Carina Paine, Adam N. Joinson, and Ulf-Dietrich Reips, 2007. “Development of measures of online privacy concern and protection for use on the Internet,” Journal of the American Society for Information Science and Technology, volume 58, number 2, pp. 157–165.doi: http://dx.doi.org/10.1002/asi.20459, accessed 17 January 2016.

Brian A. Nosek, N. Sriram, and Emily Umansky, 2012. “Presenting survey items one at a time compared to all at once decreases missing data without sacrificing validity in research with Internet volunteers,” PLoS ONE, volume 7, number 5, e36771 (17 May).doi: http://dx.doi.org/10.1371/journal.pone.0036771, accessed 17 January 2016.