MMID packages

Below are links for the full MMID image/word dataset for each language (100 images), a smaller view of MMID with only 1 image per word (1 image), the metadata of all images and the webpages they showed up on, and the dictionary containing just the words we have images for in each language, as well as their canonical MMID ID within the language. For more information, see our documentation page.

Code

To replicate the experiments in Learning Translations via Images, you’ll need the code at this github repo.
It contains scripts for reading in CNN image feature files and predicting translations as described in the paper.

CNN package Downloads

For these 30 languages, we extracted CNN features and plaintext for all words of a language. Using these, you can recreate or improve on the translation results of our ACL paper. As a warning, each download is as much as 11 GB per language!
The metadata files relate images to their URLs.