Dataset

The dataset consists of keystroke samples from 64 students answering questions on 3 online exams over a semester. The first exam required student to type normally with both hands. For the second and third exam, students were required to type with their left hand and right hand only. This was done to simulate a serious handicap in which a user is only able to type with one hand. Each student provided at least 500 keystrokes for each sample.

The dataset is split into labeled and unlabeled samples. The goal is to identify the user of the unlabeled samples.

Important: not all of the users in the training dataset appear in the testing dataset.

The testing dataset contains 471 500-keystroke samples from the same population under three different typing conditions: normal typing with both hands, typing with just the left hand, and typing with just the right hand. All samples from within the same user are at least 50 keystrokes apart to avoid classification by grammatical structures in the student's response. Timestamps are also normalized by subtracting the first keypress timestamp, to remove any correlation between the time of the attempt in the training and testing datasets. The columns in the testing dataset are:

Testing dataset columns

Column

Description

sample

Globally unique label for each sample

condition

Typing condition (both hands, left hand, right hand)

timepress

Press timestamp in milliseconds

timerelease

Release timestamp in milliseconds

keyname

Name of the key

Submission format

The goal is to identify the user of each sample in the testing dataset. Submissions should contain a classification for each sample in the testing dataset. Submissions should be a csv file with 2 columns and 472 lines (header + 471 sample classifications). The first column is the sample and the second is the classification label.

Ground truth

The competition has ended and the correct labels for the test set are available in the dataset repository.