Structure-measure: A New Way to Evaluate Foreground Maps

Abstract

Foreground map evaluation is crucial for gauging the progress of object segmentation algorithms, in particular in the filed of salient object detection where the purpose is to accurately detect and segment the most salient object in a scene. Several widely-used measures such as Area Under the Curve (AUC), Average Precision (AP) and the recently proposed Fωβ(Fbw) have been utilized to evaluate the similarity between a non-binary saliency map (SM) and a ground-truth (GT) map. These measures are based on pixel-wise errors and often ignore the structural similarities. Behavioral vision studies, however, have shown that the human visual system is highly sensitive to structures in scenes. Here, we propose a novel, efficient, and easy to calculate measure known an structural similarity measure (Structure-measure) to evaluate non-binary foreground maps. Our new measure simultaneously evaluates region-aware and object-aware structural similarity between a SM and a GT map. We demonstrate superiority of our measure over existing ones using 5 meta-measures on 5 benchmark datasets.

Motivation

Region perspectives: Although it is difficult to describe the object structure of a foreground map, we notice that the entire structure of an object can be well illustrated by combining structures of constituent object-parts (regions).

Object perspectives: In the high-quality SMs, the foreground region of the maps contrast sharply with the background regions and these regions usually have approximately uniform distributions.

Current Evaluation

Current evaluation measures (AP, AUC, Fbw) are based on pixel-wise manner and consider each pixel as independent. Hence, they all ignore the structure of the foreground maps, thus result the same score.

S-measure FrameWork

Evaluate examples

(a) Image (b) GT (c) state-of-art map (d) generic map

Meta-measure 2: Generic vs. state-of-the-art. A evaluation measure should give the FM which generated by the state-of-the-art method (c) a higher score than the generic map (d) that do not consider the content of the image. Unfortunately, all of the current evaluation measure give the map in (d) a higher score than (c). Only our measure correctly ranks the state-of-the-art result higher.

Quantitative Evaluation

Table 1. Quantitative comparison with current measures on 3 meta-Measures. The best result is highlighted in blue. MM:meta-Measure.