Describing Visual Categories by Attributes

Abstract

This thesis focuses on one of the most challenging problems in the field of computer vision, i.e. general object recognition. In the introduction we first delineate the problem and possible solutions that have been formulated over the past thirty years. We proceed by concentrating on mostly one of the latest approaches introduced by Ali Farhadi et al, who have suggested a system which ascribes certain semantic and discriminative attributes to each object, which then act as a basis for performing quite satisfactory object recognition. We observe that this system has many additional advantages such as faster learning of greater number of categories, reporting unusual object traits as well as even learning and recognizing objects on the basis of word description alone. Given this we believe that the suggested approach indicates a step into the right direction, therefore this thesis offers a more detailed presentation of the attribute learning concept. In addition, we repeated certain experiments with our version of implementation, demonstrating that the produced results are similar to those of Ali Farhadi et al. Doing this we observed that it would be logical to introduce additional improvements as for example transformed implementation of discriminative attributes and the use of separate localization, which on one hand enables better learning of semantic attributes, and on the other hand better category leaning. In practically every case, our improvements turned out to be useful, which we have supported with appropriate experiments. The attribute system also showed to have certain similarities with the LHOP method; therefore we decided to combine both the attribute system and the LHOP method by employing LHOP parts up to the third level in the base feature instead of the edge descriptor and HOG descriptor. This approach illustrated that it works similarly well as the original attribute learning method; however, in view of the fact that there is always room for improvements, we suggest some possible directions for future research.