Auskunft zu diesem Dagstuhl-Seminar erteilt

Dokumente

Summary

Large-scale data-intensive computing, commonly referred to as "Big Data", has been influenced by and can further benefit from programming languages ideas. The MapReduce programming model is an example of ideas from functional programming that has directly influenced the way distributed big data applications are written. As the volume of data has grown to require distributed processing potentially on
heterogeneous hardware, there is need for effective programming models, compilation techniques or static analyses, and specialized language runtimes. The motivation for this seminar has been to bring together researchers working on foundational and applied research in programming languages but also data-intensive computing and databases, in order to identify research problems and opportunities for improving data-intensive computing.

To this extent, on the database side, the seminar included participants who work on databases, query languages and relational calculi, query compilation, execution engines, distributed processing systems and networks, and foundations of databases. On the programming languages side, the seminar included participants who work on language design, integrated query languages and meta-programming, compilation,
as well as semantics. There was a mix of applied and foundational talks, and the participants included people from universities as well as industrial labs and incubation projects.

The work that has been presented can be grouped in the following broad categories:

Programming models and domain-specific programming abstractions (Cheney, Alexandrov, Vitek, Ulrich). How can data processing and query languages be integrated in general purpose languages, in type-safe ways and in ways that enable traditional optimizations and compilation techniques from database research? How can functional programming ideas such as monads and comprehensions improve the
programmability of big data systems? What are some language design issues for data-intensive computations for statistics?

Interactive and live programming (Green, Vaz Salles, Stevenson, Binnig, Suciu). What are some challenges and techniques for interactive applications. How to improve the live programming experience of data scientists? Ways to offer data
management and analytics as cloud services.

Big data in/for science (Teubner, Stoyanovich,
Ré). Challenges that arise in particle physics due to the
volume of generated data. Howe we can use data to speed up new
material discovery and engineering? How to use big data systems
for scientific extraction and integration from many different data
sources?

The seminar schedule involved three days of scheduled talks, followed
by two days of free-form discussions, demos, and working groups. This
report collects the abstracts of talks and demos, summaries of the
group discussion sessions, and a list of outcomes resulting from the
seminar.