UCF50 - Action Recognition Data Set

UCF50 is an action recognition data set with 50 action categories, consisting of realistic videos taken from youtube. This data set is an extension of YouTube Action
data set (UCF11) which has 11 action categories.

Most of the available action recognition data sets are not realistic and are staged by actors. In our data set, the primary focus is to provide the computer vision
community with an action recognition data set consisting of realistic videos which are taken from youtube. Our data set is very challenging due to large variations in
camera motion, object appearance and pose, object scale, viewpoint, cluttered background, illumination conditions, etc. For all the 50 categories, the videos are
grouped into 25 groups, where each group consists of more than 4 action clips. The video clips in the same group may share some common features, such as the same
person, similar background, similar viewpoint, and so on.

Results on UCF50

If you happen to use UCF50, send us an email with the following details and we will update our webpage with your results.

Performance (%)

Experimental Setup (In order to keep the reported results consistent, please follow "Leave One Group Out Cross Validation" which will lead to 25 cross-validations. This would eliminate randomness in the experimental setup. Please note that some action categories might have more than 25 groups. In this experimental setup, please consider only the first 25 groups in each action category)

Paper details

Performance

Experimental Setup

Paper

76.90%

Leave One Group Out Cross-validation (25 cross-validations)

Reddy and Shah.(MVAP), 2012

57.90%

5-fold group-wise cross-validation

Sadanand and Corso.(CVPR), 2012

76.40%*

Video Wise Cross-validation (*Since videos belonging to a group are obtained from a single long video, similar videos can end up in both training and testing in "video-wise cross-validation" leading to high performance)

Sadanand and Corso.(CVPR), 2012

81.03%*

2/3 training and 1/3 testing for each class (*From the details given in the paper, we are not sure if videos belonging to the same group are kept seperate in training and testing sets and the paper does not give details on number of cross-validations)

Todorovic.(ECCV), 2012

73.70%

Leave One Group Out Cross-validation (25 cross-validations)

Solmaz, et al.(MVAP), 2012

72.60%

Leave One Group Out Cross-validation (25 cross-validations)

Kliper-Gross, et al.(ECCV), 2012

Note:It is very important to keep the videos belonging to the same group seperate in training and testing. Since the videos in a group are obtained from single long video, sharing videos from same group in training and testing sets would give high performance.