Datasets built on top of VQA

Full-Sentence Visual Question Answering (FSVQA) consists of nearly 1 million pairs of questions and full-sentence answers for images, built by applying a number of rule-based natural language processing techniques to the original VQA dataset and captions in the MS COCO dataset.