Pixel Club: Viewpoint Estimation - Insights & Model

Speaker:

Gilad Divon (EE, Technion)

Date:

Wednesday, 11.4.2018, 11:30

Place:

EE Meyer Building 1061

This thesis addresses the problem of viewpoint estimation of an object in a given image,
where the objects belong to several known categories. Convolutional Neural Networks
were recently applied to this problem, leading to large improvements of state-of-the-art
results. Two major approaches have been pursued: a regression approach, which
handles the continuous values of view points naturally, and a classification approach,
which discretized the space of viewpoints. We follow the second approach and present
five key insights that should be taken into consideration when designing a CNN that
solves the problem. These insights regard all three components of any network: the
architecture, the training data, and the loss function. Based on these insights, the thesis
proposes a network in which (i) The architecture jointly solves detection, classification,
and viewpoint estimation, using the most advanced CNN for performing the two former
tasks. (ii) New types of data are added and trained on, in order to address the shortage
in labeled data. Specifically, we propose to utilize both flipped images and video clips.
(iii) A novel loss function, which takes into account both the geometry of the problem, as
well as the new types of data, is propose. Our network improves the state-of-the-art
results for this problem on PASCAL3D+ by 9.8%. The influence of each component is
rigorously analyzed.