For support, please contact

Documents

Summary

The seminar broadly dealt with machine learning, the area of computer science that concerns developing computational methods using data to make accurate predictions. The classical machine learning theory is built upon the assumption of independent and identically distributed random variables.
In practical applications, however, this assumption is often violated, for instance, when training and test data come from different
distributions (dataset bias or domain shift) or when the data exhibits temporal or spatial correlations. In general, there are three major reasons why the assumption of independent and identically distributed data can be violated:

The draw of a data point influences the outcome of a subsequent draw (inter-dependencies).

The distribution changes at some point (non-stationarity).

The data is not generated by a distribution at all (adversarial).

The seminar focused on the scenarios (a) and (b). This general research direction comprises several subfields of machine learning: transfer and multi-task learning, learning with interdependent data, and two application fields, that is, visual recognition and computational biology. Both application areas are not only two of the main application areas for machine learning algorithms in general, but their recognition tasks are often characterized by multiple related learning problems that require transfer and multitask learning approaches. For example, in visual recognition tasks, object categories are often visually related or hierarchically organized, and tasks in computational biology are often characterized by different but related organisms and phenotypes. The problems and techniques discussed during the seminar are also important for other more general application areas, such as scientific data analysis or data-oriented decision making.

Results of the Seminar and Topics Discussed

In the following, the important research fields related to the seminar topic are introduced and we also give a short list of corresponding research questions discussed at the seminar. In contrast to other workshops and seminars often associated with larger conferences, the aim of the Dagstuhl seminar was to reflect on open issues in each of the individual research areas.

Foundations of Transfer Learning

Transfer Learning (TL) [2, 18] refers to the problem of retaining and applying the knowledge available for one or more source tasks, in order to efficiently develop an hypothesis for a new target task. Each task may contain common (domain adaptation [25, 10]) or different label sets (across category transfer). Most of the effort has been devoted to binary classification
[23], while interesting practical transfer problems are often intrinsically multi-class and the number of classes can increase in time [17, 22]. Accordingly the following research questions arise:

How to formalize knowledge transfer across multi-class tasks and provide theoretical guarantees on this setting?

Can learning guarantees be provided when the adaptation relies only on pre-trained source hypotheses without explicit access
to the source samples, as it is often the case in real world scenarios?

Foundations of Multi-task Learning

Learning over multiple related tasks can outperform learning each task in isolation. This is the principal assertion of Multi-task learning (MTL) [3, 7, 1] and implies that the learning process may benefit from common information shared across the tasks. In the simplest case, the transfer process is symmetric and all the tasks are considered as equally related and
appropriate for joint training. Open questions in this area are:

What happens when the condition of equally related tasks does not hold, e.g., how to avoid negative transfer?

Moreover, can non-parametric statistics [27] be adequately integrated into the learning process to estimate and compare the distributions underlying the multiple tasks in order to learn the task similarity measure?

Can recent semi-automatic methods, like deep learning [9] or multiple kernel learning [13, 12, 11, 4], help to get a step closer towards the complete automatization of multi-task learning, e.g., by learning the task similarity measure?

How can insights and views of researcher be shared across domains (e.g., regarding the notation of source task selection in reinforcement learning)?

Foundations of Learning with Inter-dependent Data

Dependent data arises whenever there are inherent correlations in between observations. For example, this is to be expected for time series, where we would intuitively expect that instances with similar time stamps have stronger dependencies than ones that are far away in time. Another domain where dependent data occurs are spatially-indexed sequences, such as windows taken from DNA sequences. Most of the body of work on machine learning theory is on learning with i.i.d. data. Even the few analyses (e.g., [28]) allowing for "slight" violations of the assumption (mixing processes) analyze the same
algorithms as in the i.i.d. case, while it should be clear that also novel algorithms are needed to most effectively adapt to rich dependency structures in the data.
The following aspects have been discussed during the seminar:

Can we develop algorithms that exploit rich dependency structures in the data?

Do such algorithms enjoy theoretical generalization guarantees?

Can such algorithms be phrased in a general framework in order to jointly analyze them?

How can we appropriately measure the degree of inter-dependencies (theoretically) such that it can be also empirically estimated from data (overcoming the so-called mixing assumption)?

Can theoretical bounds be obtained for more practical dependency measures than mixing?

Visual Transfer and Adaptation

Visual recognition tasks are one of the main applications for knowledge transfer and adaptation techniques. For instance, transfer learning can put to good use in the presence of visual categories with only a few number of labels, while across category transfer can help to exploit training data available for related categories to improve the recognition performance [14, 21, 20, 22]. Multi-task learning can be applied for learning multiple object detectors [30]
or binary image classifiers [19] jointly, which is beneficial because visual features can be shared among categories and tasks. Another important topic is domain adaptation, which is very effective in object recognition applications [24], where the image distribution used for training (source domain) is different from the image distribution encountered during testing (target
domain). This distribution shift is typically caused by a data collection bias. Sophisticated methods are needed as in general the visual domains can differ in a combination of (often unknown) factors including scene, object location and pose, viewing angle, resolution, motion blur, scene illumination, background clutter, camera characteristics, etc. Recent studies have demonstrated a significant degradation in the performance of state-of-the-art image classifiers due to domain shift from pose changes [8], a shift from commercial to consumer video [5, 6, 10], and, more generally, training datasets biased by the way in which they were collected [29].

The following open questions have been discussed during the seminar:

Which types of representations are suitable for transfer learning?

How can we extend and update representations to avoid negative transfer?

Are current adaptation and transfer learning methods efficient enough to allow for large-scale continuous visual learning and recognition?

How can we exploit huge amounts of unlabeled data with certain dependencies to minimize supervision during learning and adaptation?

Application Scenarios in Computational Biology

Non-i.i.d. data arises in biology, e.g., when transferring information from one organism to another or when learning from multiple organisms simultaneously [31]. A scenario where dependent data occurs is when extracting local features from genomic DNA by running a sliding window over a DNA sequence, which is a common approach to detect transcription start sites (TSS) [26]. Windows close by on the DNA strand – or even overlapping – show stronger dependencies than those far away. Another application scenario comes from statistical genetics. Many efforts in recent years focused on models to correct for population
structure [16], which can arise from inter dependencies in the population under investigation. Correcting for such rich dependency structures is also a challenge in prediction problems in machine learning [15]. The seminar brought ideas together from the different fields of machine learning, statistical genetics, Bayesian probabilistic modeling, and frequentist statistics. In
particular, we discussed the following open research questions:

How can we empirically measure the degree of inter-dependencies, e.g., from a kinship matrix of patients?

Do theoretical guarantees of algorithms (see above) break down for realistic values of “the degree of dependency”?

What are effective prediction and learning algorithms correcting for population structure and inter-dependencies in general and can they be phrased in a general framework?

What are adequate benchmarks to evaluate learning with non-i.i.d. data?

How can information be transferred between organisms, taking into account the varying noise level and experimental conditions from which data are derived?

How can non-stationarity be exploited in biological applications?

What are promising applications of non-i.i.d. learning in the domains of bioinformatics and personalized medicine?

Conclusion

The idea of the seminar bringing together people from theory, algorithms, computer vision,
and computational biology, was very successful, since many discussions and joint research
questions came up that have not been anticipated in the beginning. These aspects were not
completely limited to non-i.i.d. learning and also touched ubiquitous topics like learning with
deeper architectures. It was the agreement of all participants that the seminar should be the
beginning of an ongoing series of longer Dagstuhl seminars focused on non-i.i.d. learning.

Erik Rodner and Joachim Denzler. Learning with few examples by transferring feature relevance. In Proceedings of the 31st Annual Symposium of the German Association for Pattern Recognition (DAGM), pages 252–261, 2009.

Erik Rodner and Joachim Denzler. One-shot learning of object categories using dependent gaussian processes. In Proceedings of the 32nd Annual Symposium of the German Association for Pattern Recognition (DAGM), pages 232–241, 2010.

Publications

Furthermore, a comprehensive peer-reviewed collection of research papers can be published in the series Dagstuhl Follow-Ups.

Dagstuhl's Impact

Please inform us when a publication was published as a result from your seminar. These publications are listed in the category Dagstuhl's Impact and are presented on a special shelf on the ground floor of the library.