Explicit and Implicit Schema Information on the Linked Open Data Cloud: Joined Forces or Antagonists?

Schema information about resources in the Linked Open Data (LOD) cloud can be provided in a twofold way: it can be explicitly defined by attaching RDF types to the resources. Or it is provided implicitly via the definition of the resources´ properties.
In this paper, we analyze the correlation between the two sources of schema information. To this end, we have extracted schema information regarding the types and properties defined in two datasets of different size. One dataset is a LOD crawl from TimBL- FOAF profile (11 Mio. triple) and the second is an extract from the Billion Triples Challenge 2011 dataset (500 Mio. triple). We have conducted an in depth analysis and have computed various entropy measures as well as the mutual information encoded in this two manifestations of schema information.
Our analysis provides insights into the information encoded in the different schema characteristics. It shows that a schema based on either types or properties alone will capture only about 75% of the information contained in the data. From these observations, we derive conclusions about the design of future schemas for LOD.