Vortrag Graduiertenkolloquium Steffen Thoma

Aus Aifbportal

Multi-modal Data Fusion based on Latent Representations

Veranstaltungsart: Graduiertenkolloquium

Many web pages include structured data which can directly be processed and used. Due to the decentralized nature of the web, multiple structured data sources can provide similar information about an entity. But data from different sources may involve different vocabularies, modeling choices, and even modalities, which makes integration difficult. In our approach, we identify similar statements about entities across sources, independent of the vocabulary, data modeling choices, and modalities.
As a first step, we build upon RDF label information to align claims which already achieves better results than comparable systems without utilizing the label information. In a next step, we reconciliate claims from different data sources by using latent representations. Besides showing the benefits of using textual latent representations alone, we investigate the potential of complementing the captured knowledge by learning a shared latent representation that integrates information across three modalities images, text, and knowledge graphs. Thereby, we leverage years of research in different domains: In Computer Vision, visual object features are learned from large image collections, in Computational Linguistics, word embeddings are extracted from huge text corpora capturing their distributional semantics, and in the Semantic Web, embeddings of Knowledge Graphs effectively capture explicit relational knowledge about individual entities.

Our hypothesis is that by fusing the representations, we attain a more holistic representation for identifying similarities as the modalities cover different aspects of an entity, e.g. visual attributes of entities cover shape and color information that is not easily covered in other modalities. How-ever, this fusion is limited to concepts with cross-modal alignments in the training data which are only available for a few concepts. Since alignments over different modalities are rare and expensive to create, we finally investigate an extrapolation approach to translate entity representations outside of the training corpus to the shared representation space.