Many of the networks that one can identify in connection with human societies show a hierarchical structure [1,2]. For example scientific terms can naturally organized in hierarchies according to the disciplines and their sub-domains. Reconstructing these hierarchies from the underlying data and understanding their properties can lead to new insights, while the hierarchical structures also have a number of applications. For example, image classification can give more robust results if the classification tasks were related to large-scale semantic taxonomies [3]. Understanding the properties of these hierarchical structures is also of high interest [4].

The hierarchy reconstruction is challenging problem. While tree structures are widely used, in several application domains the hierarchies are better represented as directed acyclic graphs (that correspond to overlapping implicit structures), and not as trees. In particular if we wish to understand the organizational properties, limiting the structure leads to approximate observations.

The candidate should develop techniques that can reconstruct large-scale hierarchical structures. The thesis will focus on some of the following aspects.

Analyze the application specific requirements for hierarchical structures (i.e. whether one should consider trees, or some other hierarchical structure, with potentially with some additional of constrains) and define specific quality metrics.

Design appropriate data structures and methods to store hierarchical structures. We need a specific data structure that enables querying specific parts of the large-scale hierarchy and also realizing a zoomable visualization.

Embeddings in hyperbolical space [5,6] were successfully used for analyzing hierarchical structures. While hyperbolical embeddings performed well in general setting, they might need some specific task-specific adaptations.

Recent breakthroughs in NLP research on word embeddings could offer useful tools for our work. In particular ELMO [7], BERT [8] can offer substantially improved embeddings that also eliminate polysemy-related problems. Besides being a direct tool, they also offer a methodological approach for constructing hierarchies.

Our goal is to reconstruct the hierarchical structure of scientific domains, exploiting a collection of scientific articles. This is a work complementary to the ANR EPIQUE project where our group collaborates with experts in social sciences (philosophy of science) and in complex systems.

We would like to reconstruct hierarchical structures in other domains. In particular our results show that hierarchical skill models can be exploited for improving the task assignment quality for crowdsourcing [9], but constructing such skill hierarchies remains challenging.