In this chapter we discuss issues pertaining to methodology of data collection. The discussion aims at raising the kind of questions that are likely to arise within any dialect syntactic investigation, and at providing some potential answers. For expository purposes, we will use the methodology developed for the SAND project as a point of reference throughout [fn. 2]. The options taken up during SAND were deemed as optimal given the particular language situation in the Dutch speaking area. In other words, the information provided here should not be read prescriptively. Rather, researchers should bear in mind that the methodological choices in each case are determined by the sociolinguistic situation in the relevant language area. <br /> <br />Let us first briefly outline the structure of SAND. Afterwards we will move on to particular methodological questions. The SAND project (Syntactic Atlas of the Dutch Dialects, 2000-2005) mainly focused on geographical (i.e. dialectal) variation in four empirical domains: the left periphery of the clause, pronominal reference, negation and quantification and the right periphery of the clause. Variation along social dimensions was deliberately and explicitly outside the scope of the project. SAND was carried out in five stages. The first stage involved research on the existing literature on Dutch dialects. Stages two to four comprised the actual data collection. Stage two (conducted in 2000) involved a pilot study in the form of a written questionnaire consisting of 393 test sentences. This was sent to informants of the Meertens Institute at 321 locations in the Netherlands and Belgium, with mostly one informant per location. At this stage social variables were not controlled for. The informants had to judge whether the test sentence was attested in their dialect, or were asked to translate or complete it. The aim of this pilot study was to get a first impression of the syntactic variation in the dialects of Dutch and of its distribution across the language area.<br /><br />The second round of data collection (stage three, 2001-2002) consisted of oral interviews (involving elicited, not spontaneous speech) conducted at 267 locations spread across the Netherlands, Belgium and French Flanders. Each location was represented by at least two informants. These were mostly different informants from those used for the written questionnaire. The informants of the oral questionnaire had to meet the following criteria: both they themselves as well as their parents had to have been born and raised in the place of the interview; aged between 55 and 70 years; not highly educated; they should not have moved away from the place of interview for longer than 7 years, and they should be active users of the dialect in at least one social domain [fn. 3]. In the Netherlands, to reduce accomodation, the actual oral interview was conducted between two dialect speakers (one of whom had earlier been instructed), while the fieldworker stood aside as much as possible. Due to the different sociolinguistic situation in Belgium, this option was not used there. Instead the fieldworker conducted the interviews him/herself, and the questions were posed in a sub-standard variety of Belgian Dutch (usually the regiolect). In total, 456 test sentences were asked in this round. Each interview lasted approximately 90 minutes. On average, every questionnaire contained about 100 sentences. Informants had to judge whether a test sentence was attested in their dialect, or were asked to translate or complete it. On the basis of the knowledge obtained in the first stage, a core of sentences was tested at every location, while some test sentences were only tested at a restricted number of locations, resulting in a regionalized oral questionnaire.<br /><br />The third round of interviews (stage 4, 2003) was an inquiry by telephone conducted with informants from 246 locations who had taken part in the oral interviews. In total, 331 sentences were tested in this round. These were either sentences that had been tested in the oral interview but had not received a clear answer or new sentences that were required to get a more complete picture of some particular phenomenon.<br /> <br /> During stage 5 (2004-2005) the data collected in the previous three stages was transcribed, digitized and stored in a database. In this last phase of SAND, the Dynamic Atlas (Dynasand) was constructed. Dynasand makes it possible to conduct different kinds of searches within the database (as well as access the audio files). Finally, it is possible to construct online maps that depict the geographical distribution of chosen linguistic variable(s).

+

In this chapter we discuss issues pertaining to methodology of data collection [fn. 1]. The discussion aims at raising the kind of questions that are likely to arise within any dialect syntactic investigation, and at providing some potential answers. For expository purposes, we will use the methodology developed for the SAND project as a point of reference throughout [fn. 2]. The options taken up during SAND were deemed as optimal given the particular language situation in the Dutch speaking area. In other words, the information provided here should not be read prescriptively. Rather, researchers should bear in mind that the methodological choices in each case are determined by the sociolinguistic situation in the relevant language area. <br /> <br />Let us first briefly outline the structure of SAND. Afterwards we will move on to particular methodological questions. The SAND project (Syntactic Atlas of the Dutch Dialects, 2000-2005) mainly focused on geographical (i.e. dialectal) variation in four empirical domains: the left periphery of the clause, pronominal reference, negation and quantification and the right periphery of the clause. Variation along social dimensions was deliberately and explicitly outside the scope of the project. SAND was carried out in five stages. The first stage involved research on the existing literature on Dutch dialects. Stages two to four comprised the actual data collection. Stage two (conducted in 2000) involved a pilot study in the form of a written questionnaire consisting of 393 test sentences. This was sent to informants of the Meertens Institute at 321 locations in the Netherlands and Belgium, with mostly one informant per location. At this stage social variables were not controlled for. The informants had to judge whether the test sentence was attested in their dialect, or were asked to translate or complete it. The aim of this pilot study was to get a first impression of the syntactic variation in the dialects of Dutch and of its distribution across the language area.<br /><br />The second round of data collection (stage three, 2001-2002) consisted of oral interviews (involving elicited, not spontaneous speech) conducted at 267 locations spread across the Netherlands, Belgium and French Flanders. Each location was represented by at least two informants. These were mostly different informants from those used for the written questionnaire. The informants of the oral questionnaire had to meet the following criteria: both they themselves as well as their parents had to have been born and raised in the place of the interview; aged between 55 and 70 years; not highly educated; they should not have moved away from the place of interview for longer than 7 years, and they should be active users of the dialect in at least one social domain [fn. 3]. In the Netherlands, to reduce accomodation, the actual oral interview was conducted between two dialect speakers (one of whom had earlier been instructed), while the fieldworker stood aside as much as possible. Due to the different sociolinguistic situation in Belgium, this option was not used there. Instead the fieldworker conducted the interviews him/herself, and the questions were posed in a sub-standard variety of Belgian Dutch (usually the regiolect). In total, 456 test sentences were asked in this round. Each interview lasted approximately 90 minutes. On average, every questionnaire contained about 100 sentences. Informants had to judge whether a test sentence was attested in their dialect, or were asked to translate or complete it. On the basis of the knowledge obtained in the first stage, a core of sentences was tested at every location, while some test sentences were only tested at a restricted number of locations, resulting in a regionalized oral questionnaire.<br /><br />The third round of interviews (stage 4, 2003) was an inquiry by telephone conducted with informants from 246 locations who had taken part in the oral interviews. In total, 331 sentences were tested in this round. These were either sentences that had been tested in the oral interview but had not received a clear answer or new sentences that were required to get a more complete picture of some particular phenomenon.<br /> <br /> During stage 5 (2004-2005) the data collected in the previous three stages was transcribed, digitized and stored in a database. In this last phase of SAND, the Dynamic Atlas (Dynasand) was constructed. Dynasand makes it possible to conduct different kinds of searches within the database (as well as access the audio files). Finally, it is possible to construct online maps that depict the geographical distribution of chosen linguistic variable(s).

===2. Discussion points===

===2. Discussion points===

Latest revision as of 11:14, 1 February 2012

In this chapter we discuss issues pertaining to methodology of data collection [fn. 1]. The discussion aims at raising the kind of questions that are likely to arise within any dialect syntactic investigation, and at providing some potential answers. For expository purposes, we will use the methodology developed for the SAND project as a point of reference throughout [fn. 2]. The options taken up during SAND were deemed as optimal given the particular language situation in the Dutch speaking area. In other words, the information provided here should not be read prescriptively. Rather, researchers should bear in mind that the methodological choices in each case are determined by the sociolinguistic situation in the relevant language area.

Let us first briefly outline the structure of SAND. Afterwards we will move on to particular methodological questions. The SAND project (Syntactic Atlas of the Dutch Dialects, 2000-2005) mainly focused on geographical (i.e. dialectal) variation in four empirical domains: the left periphery of the clause, pronominal reference, negation and quantification and the right periphery of the clause. Variation along social dimensions was deliberately and explicitly outside the scope of the project. SAND was carried out in five stages. The first stage involved research on the existing literature on Dutch dialects. Stages two to four comprised the actual data collection. Stage two (conducted in 2000) involved a pilot study in the form of a written questionnaire consisting of 393 test sentences. This was sent to informants of the Meertens Institute at 321 locations in the Netherlands and Belgium, with mostly one informant per location. At this stage social variables were not controlled for. The informants had to judge whether the test sentence was attested in their dialect, or were asked to translate or complete it. The aim of this pilot study was to get a first impression of the syntactic variation in the dialects of Dutch and of its distribution across the language area.

The second round of data collection (stage three, 2001-2002) consisted of oral interviews (involving elicited, not spontaneous speech) conducted at 267 locations spread across the Netherlands, Belgium and French Flanders. Each location was represented by at least two informants. These were mostly different informants from those used for the written questionnaire. The informants of the oral questionnaire had to meet the following criteria: both they themselves as well as their parents had to have been born and raised in the place of the interview; aged between 55 and 70 years; not highly educated; they should not have moved away from the place of interview for longer than 7 years, and they should be active users of the dialect in at least one social domain [fn. 3]. In the Netherlands, to reduce accomodation, the actual oral interview was conducted between two dialect speakers (one of whom had earlier been instructed), while the fieldworker stood aside as much as possible. Due to the different sociolinguistic situation in Belgium, this option was not used there. Instead the fieldworker conducted the interviews him/herself, and the questions were posed in a sub-standard variety of Belgian Dutch (usually the regiolect). In total, 456 test sentences were asked in this round. Each interview lasted approximately 90 minutes. On average, every questionnaire contained about 100 sentences. Informants had to judge whether a test sentence was attested in their dialect, or were asked to translate or complete it. On the basis of the knowledge obtained in the first stage, a core of sentences was tested at every location, while some test sentences were only tested at a restricted number of locations, resulting in a regionalized oral questionnaire.

The third round of interviews (stage 4, 2003) was an inquiry by telephone conducted with informants from 246 locations who had taken part in the oral interviews. In total, 331 sentences were tested in this round. These were either sentences that had been tested in the oral interview but had not received a clear answer or new sentences that were required to get a more complete picture of some particular phenomenon.

During stage 5 (2004-2005) the data collected in the previous three stages was transcribed, digitized and stored in a database. In this last phase of SAND, the Dynamic Atlas (Dynasand) was constructed. Dynasand makes it possible to conduct different kinds of searches within the database (as well as access the audio files). Finally, it is possible to construct online maps that depict the geographical distribution of chosen linguistic variable(s).

The choice of which elicitation method to use depends on the phenomenon under investigation. For instance, to investigate word order variation in verbal clusters, SAND informants were given all logically possible word orders (in the written questionnaire). They were first asked whether these sentences occur in their dialect (see Question/Answer 7 below for discussion of the options in the formulation of this question). In addition, informants had to rate each sentence on a scale 1-5 depending on how common the sentence was. Finally, informants were asked to translate the most frequently occurring word order into their dialect. Asking for a translation also functioned as a check on the judgments given by the informant. In case of a discrepancy between judgment and translation, the sentence had to be tested again. If none of the given options corresponded to what occurs in their dialect, the informant had to indicate how they would express the given sentence. An example is given in (1).

(1) Encounter? Uncommon/Common

a. Ik weet dat Jan hard moet kunnen werken. Y / N 1 - 2 - 3 - 4 - 5

I know that J hard must can-inf work-inf

b. Ik weet dat Jan hard moet werken kunnen. Y / N 1 - 2 - 3 - 4 - 5

c. Ik weet dat Jan hard kunnen moet werken. Y / N 1 - 2 - 3 - 4 - 5

d. Ik weet dat Jan hard kunnen werken moet. Y / N 1 - 2 - 3 - 4 - 5

e. Ik weet dat Jan hard werken kunnen moet. Y / N 1 - 2 - 3 - 4 - 5

f. Ik weet dat Jan hard werken moet kunnen. Y / N 1 - 2 - 3 - 4 - 5

When using this task, a thing to note is that the chosen scale should be as ‘neutral’ as possible, and in any event not recall scales used in e.g. grading systems, as that is likely to trigger normative linguistic behaviour on behalf of the informants.

The completion task involves an incomplete sentence accompanied by a picture illustrating the situation expressed in the sentence. This technique was employed in the SAND to investigate (among other things) the distribution of weak and strong reflexive anaphors. A picture showing a man washing himself was shown to informants, along with the sentence Jan wast _____ (‘John washes _____’). A note of caution is in order here. In particular in oral interviews, cloze tests often lead to incomplete or irrelevant answers, so if one wants to use this task, it would be advisable to test the efficacy of the task in advance. As regards meaning questions, they turn out to be extremely difficult to answer and at best give an impressionistic result, so in investigating phenomena that implicate semantic/pragmatic distinctions the best method seems to be to provide contexts that force one reading or the other.

2. Q: Written questionnaires: pro’s and con’s.

A: Using written questionnaires:

Allows us to test a relatively large number of sentences.

Allows us to test sentences that are relatively complex.

Optimally allows comparison of the data, since questions are asked in exactly the same way to all informants.

Is very economical in terms of time, money and human resources.

As a stage prior to conducting oral interviews it can make the latter more efficient: on the basis of the knowledge gained from the written questionnaire, the oral questionnaire can be structured according to regional linguistic properties (regionalized and multi-stage questions).

Enables the participants to respond in his/her own time and pace.

At the same time the use of written questionnaires has a number of disadvantages, which in general may compromise the validity of the data:

A sentence may be rejected on irrelevant grounds, like the use of standard (instead of dialectal) lexical items.

Given that for most (though not all) dialects there are no orthographic conventions available, asking informants to translate into their dialect in a written questionnaire entails asking them to invent their own orthography. They may therefore be reluctant to perform the task simply due to the difficulty of having to invent an orthographic system.

Written mode may trigger more formal, hence less dialectal behavior, in particular in those situations, mentioned above, where no orthographical conventions for the dialect are in existence.

It is impossible for the researcher to observe and immediately respond to the answers and reactions of informants.

Informants filling out a written questionnaire obviously need to be literate. This may not be an option in certain language areas.

3. Q: Oral questionnaires: pro's and con's.

A: Conducting oral interviews requires more time, money and people than using a written questionnaire, but it enables the researcher to observe and directly respond to the answers and reactions of the informants, which, quite generally, enhances the validity of the data. The risk of accomodation/interference of the standard variety is still present in the context of oral interviews; we discuss one possible measure against this risk under Question/Answer 6. Moreover, and especially in the absence of a rigorous protocol for the execution of the interview, there is a risk concerning the comparability of the data obtained, given the variation that exists across fieldworkers’ personal style of conducting an interview. Finally, it may be difficult within an oral interview to test sentences that require longer reflection.

4. Q: Web-based questionnaires: pro’s and con’s

A: An option that has become available in recent years is internet-based dissemination of questionnaires. This method combines properties of oral and written questionnaires (see above). For instance, the presentation of the data need not only involve the written mode, but can also be done through use of audio files (thus resembling an oral interview). Using audio files ensures that every sentence is presented to every informant in exactly the same way. Moreover, through web-based investigations it becomes possible to collect data from a considerably large(r) number of informants. This in turn makes potential noise in the data much less relevant. (However, social variables are extremely hard to control for with the use of web-based questionnaires.) Finally, the data can be stored in a database automatically, which makes the use of web-based questionnaires extremely economical, in that it eliminates the need for an otherwise necessary step, namely manually extracting the data from the written questionnaires and putting them in a database.

5. Q: Which properties should we look for in selecting our informants?

A: Selecting informants is an important aspect of this kind of research. The better the informant, the higher the reliability of the data, but also the higher the rate at which our knowledge of the underlying linguistic system will advance. Informants need to be able to understand the task which is given and the context (e.g. hypothetical situations) in which a given sentence is to be judged. They should also be consistent in their judgments, and they should not seek to please the fieldworker (e.g. make up sentences when a translation is not available in their dialect). Particularly good informants are those who exhibit a higher level of metalinguistic awareness. They may, for example, independently provide contexts for test sentences, or for what seem to be optional phenomena. (Obviously, that is not to say that researchers should expect their informants to themselves provide analyses or explanations of their linguistic behaviour.) Those informants may also be in a position to offer insight as to what occurs in neighbouring areas; regardless of the accuracy of their judgment of those dialects, such behaviour is suggestive of enhanced metalinguistic awareness.

Though hightened metalinguistic awareness is a desirable property of informants, it should be pointed out that frequently this property combines with a normative attitude. Informants with a tendency to give normative judgments, either on the basis of norms for the standard language or norms for the dialect should be dispreferred. (School teachers and dialect amateurs are sometimes likely to have a normative attitude towards their dialect.) A way of excluding such informants from the sample of speakers used, is to inquire about their normative attitude in, for instance, a telephone conversation. A question like “Do you think people speak the dialect correctly?” may provide a clear indication of whether the speaker has a normative attitude towards his/her dialect.

Informants who exhibit inconsistency in their judgments, or who reject sentences on irrelevant grounds are also likely not to prove helpful. However, it should be noted that inconsistency may only be apparent: it could instead be the result of subtle differences in the contexts or of wrong choices of lexical items, overlooked by the researcher. As for the rejection of sentences on what seem to be irrelevant grounds, the researcher should take care not to judge such situations too hastily: it could for instance be that the informant is right in rejecting the sentence on, for instance, phonological grounds, and that the researcher had erroneously overlooked the role of intonation when administering the oral questionnaire.

6. Q: What are potential problems that I should be aware of before going on a fieldtrip? (Researcher equipment)

A: Fieldworker effects:

Noise, i.e. data that for one reason or other is not usable. This may be the result of questions in the questionnaire having been omitted, or of inconsistencies/errors in the administration of the questionnaire.

In order to minimize the risk of such fieldworker effects, the use of a pre-designed questionnaire is required. In addition, a clearly outlined fieldwork protocol is useful. Such a protocol will contain, alongside the numbered test sentences, instructions for the interviewer. Instructions relate to, for instance, how high the priority of asking a particular test sentence is, the potential dependency on other test sentences, and the locations where each test sentence has to be asked. Another strategy that may prove helpful is to keep fieldworker logs and check them regularly, especially in the earlier stages of data collection. These logs may contain information about the specifics of each interview (e.g. location, number of informants), as well as potential sources of uncertainty on the part of the fieldworker. Checking these notes (during early stages of data collection) can radically improve the quality of the interviews (in later stages) by preventing the recurrence of certain mistakes.

Accomodation, i.e. the danger that the answers the informants give do not characterize the dialect system itself, but rather involve interference from the more prestigious (standard) variety. Also known as ‘observer’s paradox’ (Labov 1972).

This is an extremely prominent danger in collecting data from subordinate (less prestigious) language varieties. During the SAND project the strategy employed in the Netherlands was to let a dialect speaker conduct the interview with another informant, while the fieldworker interferred as least as possible (see Question/Answer 8 below). In Belgium the linguist/fieldworker conducted the interview in the local regiolect. Even though in this way dialect speakers were less inclined to let the standard variety interfere in their speech, this method may have facilitated accomodation towards the regiolect (due to the small distance between regiolects and regional dialects). However, the risk of such accomodation was judged by the Belgian researchers to be low, and any occurrences of accomodation towards the regiolect were thought to be detectable by the linguists undertaking the fieldwork.

Rutledge effect: individual fieldworkers may have an effect on the distribution of the linguistic properties under investigation (Tillery & Bailey 2003).

In the absence of a specific and explicit protocol, fieldworkers’ personal ways of eliciting data may vary considerably. For instance, one fieldworker may directly ask for an acceptability judgment, while a different fieldworker may only register spontaneous occurrences of the same construction. At the very least, this leads to non-uniform, non-comparable data. This effect can be avoided by formulating the questions in advance and by having a clearly outlined research protocol which ensures that all fieldworkers ask the same question, in a similar manner, so as to keep the responses of the dialect speakers comparable. Also, data processing should be undertaken according to protocol to control for undesired effects in the handling of the data (this presupposes that fieldworker/transcriber details are included in the transcriptions).

7. Q: Formulating the test sentence: do we use “does this occur in your dialect” or “is this possible in your dialect”?

A: As mentioned above, in the SAND project the formulation “does it occur” was chosen. However, one should be aware of the potential complication that informants may interpret this question as relating to frequency, which is obviously not what we are after. Since what we are after is knowledge of what the system allows, a different way of formulating the question is “is this possible”. However, this choice may be considered less than perfect as well, as it can be argued that it suggests reliance on grammatical consciousness on the part of the informants. Ultimately this question is about how much researchers can and should rely on the metalinguistic awareness of the speakers they interview. Possibly, reliance on metalinguistic awareness is not altogether undesirable, but can simply only prove effective with one’s ‘best’ informants. In that case, it may be preferrable to avoid asking “is this possible” with speakers who have shown less metalinguistic awareness, and instead ask them whether they encounter the sentence in question in their area.

8. Q: How can we instruct the informant to conduct the interview him/herself, as in the Dutch part of SAND?

A: In the Dutch part of SAND, one of the informants was trained in order to conduct the actual interview. This training session, which was recorded, lasted for about two hours. The session had three purposes. First of all, the dialect speaker was trained to become an interviewer. It was explained to him/her that (s)he was supposed to interview the other informant during the real interview and that the researcher wouldn't interrupt, in order to keep the interference of the standard language/non-dialect minimal. This means that it was the task of the dialect-speaking interviewer to make sure all questions were asked and answered. A second goal of the training session was to make the interviewer acquainted with the questionnaire by discussing every question with him/her in detail. This had another benefit as well, namely that there is extra information (i.e. the discussion between interviewer and researcher) about all the questions. This information can be used to check the consistency of the information provided during the real interview. The third and final goal was to translate the interview into the dialect, taking into account dialectal idiosyncracies like proper names and other lexical information. This means that all instructions, like ‘translate this question into your own dialect’ or ‘does the following sentence occur in your dialect’ were translated into the dialect. Furthermore, all judgement tasks were translated, word by word, into the dialect. The translation questions were, for obvious reasons, not translated into the dialect. (with thanks to Marjo van Koppen, p.c.)

Recommendation

Before asking informants to complete a questionnaire, researchers may find it instructive to complete it themselves first. (This can also be done in the case of an oral questionnaire.) This will bring out potential problems and unclarities that can be amended before the questionnaire reaches its target. It will also pinpoint questions which the informant may find trickier than others (because, for instance, they require a particular kind of context or intonation). Knowing this can give important insight into the quality of the answers, and lead to a better understanding of the informants’ reactions.

[1] The discussion that follows is heavily based on the presentations and discussions that took place during the workshop European Dialect Syntax III, which was carried out in Venice in September 2008. In addition to the participants of this workshop, we thank Leonie Cornips for her feedback on this document.

[2] For extensive discussion of the methodology used in SAND, see Cornips & Jongenburger (2001). See also Barbiers et al. (2007), Cornips (2006), and Cornips & Poletto (2005, to appear).

[3]In addition, fast-growing cities that had been recently explanding, like Almere, were not included in the list of locations.