This presentation will report on the ongoing development and evaluation of a sentence matching algorithm which is used to generate corrective feedback in a tutorial CALL system on the basis of crowdsourced data.

First, we will present the case within which the algorithm will be evaluated. The case concerns an online application in which learners play the role of a detective and gather clues by formulating (written) responses in scripted dialogue tasks. These tasks focus on a number of specific grammatical problems in English, but since the unit of response is at the level of the utterance, many alternative responses are possible. Learning support is available as feedback through metalinguistic prompts and model responses. The learners’ responses are logged by the system, and are subsequently evaluated by peers, as a form of educational crowdsourcing.

Next, we will present a review of existing methods for analysing learner output and generating metalinguistic feedback in similar (half-)open tasks. Most state-of-the-art algorithms use some sort of robust parsing and a wide variety of language dependent linguistic resources, such as lexicons and grammars (e.g. Heift, 2003; Nagata, 2002; Schulze, 1999; Dodigovic, 2005). Such techniques allow to detect linguistic errors without having the correction at hand, but unfortunately they are language dependent, often hard to construct and not foolproof (Dodigovic, 2005; Fowler, 2006).

Finally, we will describe an alternative approach that uses more simple techniques, is less language dependent and can address a wider range of target language errors. The proposed algorithm leverages the crowdsourced data and uses approximate string matching, POS tagging, and lemmatisation in order to

a) detect similarities and differences between the student’s response and (correct and incorrect) alternatives, and

b) to provide metalinguistic feedback for a number of grammatical problems.