Hypernymy, textual entailment, and image captioning can be seen as special
cases of a single visual-semantic hierarchy over words, sentences, and images.
In this paper we advocate for explicitly modeling the partial order structure
of this hierarchy. Towards this goal, we introduce a general method for
learning ordered representations, and show how it can be applied to a variety
of tasks involving images and language. We show that the resulting
representations improve performance over current approaches for hypernym
prediction and image-caption retrieval.

Captured tweets and retweets: 11

Made with a human heart + one part enriched uranium + four parts unicorn blood