Towards primary object segmentation in aerial videos, we construct a large-scale dataset for model training and benchmarking, denoted as APD.
If you want to learn more about the APD dataset, please read the paper.
[arXiv][project]
The following are results of models evaluated on their ability to predict ground truth on APD test set containing 125 aerial videos.
We post the results here.

To assess performance, we rely on the standard Jaccard Index, commonly known as the PASCAL VOC intersection-over-union metric
IoU = TP / (TP+FP+FN) [1], where TP, FP, and FN are the numbers of true positive, false positive, and false negative pixels.
For evaluate the performance of video data, we report mean IoU: mIoU = sum(IoU(frame(i))) / numFrames, where frame(i) means the i th frame,
0 < i < numFrames, numFrames is the total frame number of video.