In molecular and cell biology, most of the data presented in published papers are not available in accessible formats that would allow for analysis and systematic mining. The goal of the SourceData project (http://sourcedata.embo.org) is to make published datasets easier to find, to connect them across papers and to promote their reuse. The main concept underlying the project is that the structure of a dataset provides information about the design of the experiment and can be exploited in powerful data-oriented search strategies. SourceData has therefore developed tools to generate machine-readable descriptive metadata from figures in published manuscripts. Experimentally tested hypotheses are represented as directed relationships between standardized biological entities, which can be connected into a searchable data-oriented ‘knowledge graph’. The resulting SourceData search platform and SmartFigure viewer allow users to find papers based on their data content. By coupling data availability to improved discoverability of published papers, SourceData aims at creating incentives for scientists to share their data and opening new ways to search and browse the literature.