For bachelor students we offer German lectures on database systems in addition with paper- or project-oriented seminars. Within a one-year bachelor project students finalize their studies in cooperation with external partners. For master students we offer courses on information integration, data profiling, search engines and information retrieval enhanced by specialized seminars, master projects and advised master theses.

The Web Science group focuses on various topics related to the Web, such as Information Retrieval, Natural Language Processing, Data Mining, Knowledge Discovery, Social Network Analysis, Entity Linking, and Recommender Systems. The group is particularly interested in Text Mining to deal with the vast amount of unstructured and semi-structured information available on the Web.

Most of our research is conducted in the context of larger research projects, in collaboration across students, across groups, and across universities. We strive to make available most of our data sets and source code.

Termine

Antonio Sala, Information Engineering Department of the University of Modena and Reggio Emilia, Italy

15.06.2010

Extraction of Management Concepts from Web Sites for Sentiment Analysis

Arvid Heise, Masters Theses Results

06.07.2010

Duplikaterkennung unter Verwendung Unstrukturturierter Anteile

David Sonnabend, Masters Theses Results

13.07.2010

Optimizing Query Execution to Improve the Energy Efficiency of DBMS

Tobias Flach, Masters Theses Results

13.07.2010

Wikipedia Cross-lingual Infobox Alignment and Conflict Detection

Daniel Rinser, Masters Theses Results

04.08.2010

Towards Granular Data Placement Strategies for Cloud Platforms

Johannes Lorey, Practice Talk for Symposium on Cloud Computing and the Web at 2010 IEEE GrC

21.09.2010

Finding Unique Column Combinations within a Database

Ziawasch Abedjan, Masters Theses Results

21.09.2010

Sensitivity of Spatiotemporal Patterns based on Integrated Data from Distributed Environmental Sensor Networks

Sören Nils Haubrock, Masters Theses Results

Abstracts

Antonio Sala: Aggregated Search of Data and ServicesFrom a user perspective, data and services provide a complementary vision of an information source: data provide detailed information about specific needs, while services execute processes involving data and returning an informative result too. For this reason, users need to perform aggregated searches able to identify not only relevant data, but also services able to operate on them. At the current state of the art, such aggregated search can be performed manually only by expert users, that first identify relevant data, and then identify existing relevant services.We propose a semantic approach to perform aggregated search of data and services. In particular, we developed a technique that, on the basis of an ontological representation of data and services related to a domain, supports the translation of a data query into a service discovery process. To evaluate our approach, we developed a prototype that extends the existing MOMIS data integration system (http://www.dbgroup.unimore.it/Momis) with a new information retrieval-based web service engine called XIRE.

Stephan Ewen: Nephele/PACTs: A Programming Model and Execution Framework for Web-Scale Analytical ProcessingData Intensive Scalable Computing is a much-investigated topic in current research. Next to parallel databases, new flavors of data processors have established themselves - most prominently the map/reduce programming and execution model. These new systems provide key features that current parallel databases lack, such as flexibility in the data models, the ability to parallelize custom functions, and fault tolerance that enables them to scale out to thousands of machines.We present the Nephele/PACs system - a parallel data processor centered around a programming model of so-called Parallelization Contracts (PACs) and the scalable parallel execution engine Nephele. The PAC programming model is a generalization of the well-known map/reduce programming model, extending it with additional higher-order functions and output contracts that give guarantees about the behavior of a function. A PAC program is transformed into a data flow for Nephele, which executes its sequential building blocks in parallel and provides communication, synchronization, and fault tolerance. The PACs are defined in such a way that this transformation can apply several types of optimizations on the data flow. The system as a whole is as generic as map/reduce systems, while overcoming several of their major weaknesses.

Arvid Heise: Extraction of Management Concepts from Web Sites for Sentiment AnalysisCompany Web sites are vital information sources for organization theorists to summarize and assess the management concepts that are implemented by the respective companies. However, the enormous amount of data renders manual analyses virtually infeasible. We present the Management Concept Miner, an integrated tool that continuously extracts the texts from several hundred company Web sites into a relational database and that scores the management concepts of the companies. It comprises an incremental Web crawler, PDF and HTML text extractors, dictionary-based annotators for concept-related key phrases, and an automatic assessment of the annotated texts. We apply viewpoint detection and sentiment analysis techniques to the new domain of management concepts to assess the importance of the management concept to the company. The achieved results are on a comparable level to the assessment performance of domain experts. Additionally, the evaluation shows that the Management Concept Miner crawls the huge amount of data resource-efficiently and extracts the texts reliably.The Management Concept Miner demonstrates that the data of Web sites can be exploited to summarize the management concepts of a company. However, the presented design can be applied to further domains beyond management concepts. Since the amount of publicly available opinionated data is steadily increasing, the combination of information extraction techniques with viewpoint detection has a huge potential.