Evaluation

Performance is evaluated with the Brier score, i.e.

where N is the number of test sequences, C is the number of classes, wc is the weight for each class, p_{n,c} is the predicted probability of instance n being from class c, and y_{n,c} is the proportion of annotators that labelled instance n as arising from class c. Lower Brier score values indicate better performance, with optimal performance achieved with a Brier score of 0.

The class weights have been selected to place more emphasis on the less frequent activities, so that the prediction of infrequent activities is rewarded more than the prediction of common activities. The values of the class weights are given in the section titled "Class Weights" of the benchmark blog post.

Challenge hosted by drivendata.org

We are delighted to announce that the SPHERE Challenge, which takes place in conjunction with ECML-PKDD 2016, is now hosted at DrivenData. All challenge features (including the leaderboard, forum, and user/team registration) are now managed at the drivendata website: