Gap-tagger corpus

Gap-tagger corpus contains data for assessing correctness of automatically generated alternatives for filling a gap (missing word). To get clearly interpretable results, we conducted modified version of A/B testing where the user had to choose between the original word and an alternative. The user has an option either to pick one of the two proposed words, or to report both words as appropriate. Since we know the right answer, we can objectively assess the suitability of alternative answers without formally specifying what classifies as a correct answer. Experiments were run using gap-tagger tool https://github.com/estnltk/gap-tagger.

In the corpus file, each line correspond to one question. The file is in csv format with the following columns:sentence: sentencegap_start: start position of the gap word in the sentencegap_end: end position of the gap word in the sentencegap_word: correct gap wordvariant: gap variant wordcorrect_selected: indicates if correct word is selectedboth_selected: indicates if user reported both words as appropriateannotator: user idtime: time in milliseconds which took user to answer a question