Survey data collection is a form of data collection using surveys. With the application of probability sampling in the 1930s, surveys became a standard tool for empirical research in social sciences, marketing and official statistics.[1] The methods involved in survey data collection are any of a number of ways in which data can be collected for a statistical survey. These are methods that are used to collect information from a sample of individuals in a systematic way. First there was the change from traditional paper-and-pencil interviewing (PAPI) to computer-assisted interviewing (CAI). Now, face-to-face surveys (CAPI), telephone surveys (CATI), and mail surveys (CASI, CSAQ) are increasingly replaced by web surveys.[2]

Contents

There are several ways of administering a survey. Within a survey, different methods can be used for different parts. For example, interviewer administration can be used for general topics but self-administration for sensitive topics. The choice between administration modes is influenced by several factors, including 1) costs, 2) coverage of the target population, 3) flexibility of asking questions, 4) respondents’ willingness to participate and 5) response accuracy. Different methods create mode effects that change how respondents answer. The most common modes of administration are listed under the following headings.[3]

Mobile data collection or mobile surveys is an increasingly popular method of data collection. The survey, form, app or collection tool is on a mobile device such as a smart phone or a tablet. These devices offer innovative ways to gather data regardless of time and location of the respondent. Apart from the high mobile phone penetration, further advantages are quicker response times and the possibility to reach previously hard-to-reach target groups.[4]

Online (Internet) surveys are becoming an essential research tool for a variety of research fields, including marketing, social and official statistics research. According to ESOMAR online survey research accounted for 20% of global data-collection expenditure in 2006.[1] They offer capabilities beyond those available for any other type of self-administered questionnaire.[5]Online consumer panels are also used extensively for carrying out surveys but the quality is considered inferior because the panelists are regular contributors and tend to be fatigued.

There are also concerns about what has been called "ballot stuffing" in which employees make repeated responses to the same survey.
Some employees are also concerned about privacy. Even if they do not provide their names when responding to a company survey, can they be certain that their anonymity is protected? Such fears prevent some employees from expressing an opinion.[6]

Web surveys are faster, simpler and cheaper.[2] However, lower costs are not so straightforward in practice, as they are strongly interconnected to errors. Because response rate comparisons to other survey modes are usually not favourable for online surveys, efforts to achieve a higher response rate (e.g. with traditional solicitation methods) may substantially increase costs.[1]

The entire data collection period is significantly shortened, as all data can be collected and processed in little more than a month.[2]

Interaction between the respondent and the questionnaire is more dynamic compared to e-mail or paper surveys.[5] Online surveys are also less intrusive, and they suffer less from social desirability effects.[2]

Complex skip patterns can be implemented in ways that are mostly invisible to the respondent.[5]

Pop-up instructions can be provided for individual questions to provide help with questions exactly where assistance is required.[5]

Questions with long lists of answer choices can be used to provide immediate coding of answers to certain questions that are usually asked in an open-ended fashion in paper questionnaires.[5]

Online surveys can be tailored to the situation (e.g. respondents may be allowed save a partially completed form, the questionnaire may be preloaded with already available information, etc.).[2]

Online questionnaires may be improved by applying usability testing, where usability is measured with reference to the speed with which a task can be performed, the frequency of errors and user satisfaction with the interface.[2]

Sampling. The difference between probability samples (where the inclusion probabilities for all units of the target population is known in advance) and non-probability samples (which often require less time and effort but generally do not support statistical inference) is crucial. Probability samples are highly affected by problems of non-coverage (not all members of the general population have Internet access) and frame problems (online survey invitations are most conveniently distributed using e-mail, but there are no e-mail directories of the general population that might be used as a sampling frame). Because coverage and frame problems can significantly impact data quality, they should be adequately reported when disseminating the research results.[1]

Invitations to online surveys. Due to the lack of sampling frames many online survey invitations are published in the form of an URL link on web sites or in other media, which leads to sample selection bias that is out of research control and to non-probability samples. Traditional solicitation modes, such as telephone or mail invitations to web surveys, can help overcoming probability sampling issues in online surveys. However, such approaches are faced with problems of dramatically higher costs and questionable effectiveness.[1]

Non-response. Online survey response rates are generally low and also vary extremely – from less than 1% in enterprise surveys with e-mail invitations to almost 100% in specific membership surveys. In addition to refusing participation, terminating surveying during the process or not answering certain questions, several other non-response patterns can be observed in online surveys, such as lurking respondents and a combination of partial and item non-response. Response rates can be increased by offering monetary or some other type of incentive to the respondents, by contacting respondents several times (follow-up), and by keeping the questionnaire difficulty as low as possible.[1]

Questionnaire design. While modern web questionnaires offer a range of design features (different question types, images, multimedia), the use of such elements should be limited to the extent necessary for respondents to understand questions or to stimulate the response. It should not affect their responses, because that would mean lower validity and reliability of data. Appropriate questionnaire design can help lowering the measurement error that can arise also due to the respondents or the survey mode itself (respondent’s motivation, computer literacy, abilities, privacy concerns, etc.).[1]

Post-survey adjustments. Various robust procedures have been developed for situations where sampling deviate from probability selection, or, when we face non-coverage and non-response problems. The standard statistical inference procedures (e.g. confidence interval calculations and hypothesis testing) still require a probability sample. The actual survey practice, particularly in marketing research and in public opinion polling, which massively neglects the principles of probability samples, increasingly requires from the statistical profession to specify the conditions where non-probability samples may work.[1]

Researchers can combine several above methods for the data collection. For example, researchers can invite shoppers at malls, and send willing participants questionnaires by emails.
With the introduction of computers to the survey process, survey mode now includes combinations of different approaches or mixed-mode designs. Some of the most common methods are:[11]

Computer-assisted personal interviewing (CAPI): The computer displays the questions on screen, the interviewer reads them to the respondent, and then enters the respondent's answers.

Audio computer-assisted self-interviewing (audio CASI): The respondent operates the computer, the computer displays the question on the screen and plays recordings of the questions to the respondents, who then enters his/her answers.

Computer-assisted telephone interviewing (CATI)

Interactive voice response (IVR): The computer plays recordings of the questions to respondents over the telephone, who then respond by using the keypad of the telephone or speaking their answers aloud.