Résumé: The abundance of image-level labels and the lack of large scale detailed annotations (e.g. bounding boxes, segmentation masks) promotes the development of weakly supervised learning (WSL) models. In this work, we propose a novel framework for WSL of deep convolutional neural networks dedicated to learn localized features from global image-level annotations. The core of the approach is a new latent structured output model equipped with a pooling function which explicitly models negative evidence, e.g. a cow detector should strongly penalize the prediction of the bedroom class. We show that our model can be trained end-to-end for different visual recognition tasks: multi-class and multi-label classification, and also structured average precision (AP) ranking. Extensive experiments highlight the relevance of the proposed method: our model outperforms state-of-the art results on six datasets. We also show that our framework can be used to improve the performance of state-of-the-art deep models for large scale image classification on ImageNet. Finally, we evaluate our model for weakly supervised tasks: in particular, a direct adaptation for weakly supervised segmentation provides a very competitive model.