For bachelor students we offer German lectures on database systems in addition with paper- or project-oriented seminars. Within a one-year bachelor project students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, search engines and information retrieval enhanced by specialized seminars, master projects and advised master theses.

The Web Science group focuses on various topics related to the Web, such as Information Retrieval, Natural Language Processing, Data Mining, Knowledge Discovery, Social Network Analysis, Entity Linking, and Recommender Systems. The group is particularly interested in Text Mining to deal with the vast amount of unstructured and semi-structured information available on the Web.

Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our data sets and source code.

Repeatability - cINDs on RDF

This is a repeatability page for cIND discovery algorithms on RDF data. The algorithms are provided in the state their results have been published, but they may not represent the most recent version of their implementations.

Algorithms

Conditional inclusion dependencies (CINDs) within RDF datasets are a valuable input to core data management tasks, such as query optimization, ontology reverse engineering, and knowledge discovery. Most CIND discovery algorithms focus on relational databases, where they generate conditions for the left-hand side of a partial IND. In RDF datasets, it is of particular importance to consider both left-hand side and right-hand side conditions. Moreover, RDF datasets are fundamentally different from relational databases w.r.t. their structure. Therefore, RDF CIND discovery algorithms have to be designed differently from their relational counterparts.

1 Please note that datasets might be subject to change. The above table reflects the state of the datasets when we downloaded them in mid 2015.

2 Not a real-world dataset (generated).

Algorithmic Results

RDFind

In general, even small RDF datasets tend to contain an intractable amount of CINDs, most of which do not provide any value for applications, such as query optimization. For this reason, RDFind extracts only pertinent CINDs that (i) comprise sufficiently many entities (= CIND support) and (ii) that are not implied by any other pertinent CIND (= minimal cover). Furthermore, RDFind distinguishes a special class of CINDs (= association rules/ARs) that state "if a triple has value v1 in attribute a1, then it has value v2 in attribute a2". The following numbers reflect these pecularities of the RDFind algorithm. Find the details in our SIGMOD 2016 paper.