Big data processing has gained much attention in the recent years raising the need for stream processing solutions. At the same time, there has been substantial research and development of methods and technologies that have high potential for further progress in stream processing. This includes (i) flexible stream processing infrastructures that go beyond MapReduce, such as Apache Storm and Apache Spark, (ii) generic and application-specific stream processing algorithms, (iii) reconfigurable hardware more accessible to programmers as well as initial experience of using such components as coprocessors when processing streams, (iv) scalable algorithms for social web analysis or new approaches to data visualization, etc.

In today’s urbanizing world, cities have not only physical infrastructures, such as road networks, utilities, or buildings, but also comprise a knowledge infrastructure ranging from lowlevel sensor networks to public databases and social media streams. The data emerging from all those sources is a very precious resource to make cities more intelligent, innovative and integrated beyond the boundaries of isolated applications. Although such “big data” has been popularised in the media as the “new currency", fuelling a future vision of contextual systems that will transform our cities, the reality is that we just began to recognise significant research challenges across a spectrum of topics (that must be addressed to realise the vision: information retrieval, knowledge representation, semantic reasoning, data mining and many others). In fact, if we cannot find the ways to harvest the city data and to transform it into tangible insights, our vision of offering innovative solutions to the realworld problems of the cities will not go beyond an expressed wish.

Recently, there has been a vast and rapidly increasing quantity of scientific, corporate, government and crowd-sourced knowledge bases published and curated independently on the Data Web in various data formats, such as Linked Open Data (LOD) and Data APIs. This great abundance of open Big Data sources and the need for Web-scale knowledge management support is essentially transforming the Web into a vibrant information ecosystem. One of its main characteristics is its dynamicity and its decentralized evolution that spans across multiple interrelated sources and data hubs. Traditional closed-world settings impose that data management and evolution are performed within a well-defined controlled environment where change operations and dependencies on data can be easily monitored and handled. On the other hand, web and digital preservation techniques assume that preservation subjects, such as web pages, are plain digital assets that are collected (usually via a crawling mechanism) and individually archived for future reference. In contrast with these two approaches, the Data Web poses new requirements for revisiting and adjusting traditional closed-world data management techniques such as temporal management and change detection, data archiving and preservation, data ingestion, integration and enrichment, data provenance and quality, data visualization and exploratory analysis to the characteristics of multi-curated knowledge bases.

The 5th workshop on Energy Data Management is supposed to spark discussion within the academic database community and to bridge the gap between practitioners coming from the energy domain and database experts. This workshop addresses PhD students looking for an interesting application domain, industrial representatives from the application domain as well as from database-related industry, and database experts in order to receive their comments on the presented use cases or discussed methods and techniques to cope with the data management challenges of the energy domain.

The growing scale and importance of graph data in several database application areas has recently driven much research efforts towards the development of data models and technologies for graph-data management. Life science databases, social networks, Semantic Web data, bibliographical networks, knowledge bases and ontologies, are prominent examples of application domains exhibiting data that is natural to represent in graph-based form. Datasets in these domains are often characterized by heterogeneity, complexity and largeness of contents that make the querying experience a really challenging task.

The sixth edition of the International Workshop on Linked Web Data Management (LWDM) maintains the same goal of the previous editions, aiming at stimulating participants to discuss about data management issues related to the Linked Data and the relationships with other Semantic Web technologies, and at the same time proposes a glance at new issues.

The field of data analytics includes techniques, algorithms and tools for the inspection of data collections in order to extract patterns, generalizations and other useful information. Big data analytics has become a necessity in the majority of industries, enabling engineers, domain experts and scientists alike to tap the potential of vast amounts of data that are critical for business and science. The success and effectiveness of such analysis depend on numerous challenges related to the data itself, the nature of the analytics tasks, as well as the computing environment over which the analysis is performed. These issues have given rise to many diverse programming models, execution engines and data stores to enable large-scale data management. While all these systems have had great success, they still showcase their advantages on a limited subset of applications and types of data: For instance, graph-processing engines limit the amount of freedom in the computation at each node (or part of a graph) and fail to fully exploit possible parallelism. In addition, modern analytics workflows are tremendously complex: Data sources are heterogeneous and distributed. The tasks may be long- or short-running and entail different execution details depending on the user role and expertise. Furthermore, such tasks may range from simple or complex data operations and queries, to algorithmic processing, like data mining, text retrieval, data annotation, etc. Finally, the analysis may require multiple query engines. To harvest the benefits of this plethora of data and compute engines as well as programming models, libraries and tools available, we need coordinated, adaptive and integrative efforts on collectively tapping their potential. This central goal is the focus of this workshop. These efforts include the definition of versatile programming models, engine performance modeling and monitoring, extended planning and optimization algorithms, deployment/execution on multiple engines, as well as workflow management and visualization techniques, for complex analytics queries over large, heterogeneous, irregular or unstructured data over diverse compute environments.

Organizations collect vast amounts of information on individuals, and at the same time they have access to ever-increasing levels of computational power. Although this conjunction of information and power provides great benefits to society, it also threatens individual privacy. As a result legislators for many countries try to regulate the use and the disclosure of confidential information. Various privacy regulations (such as USA Health Insurance Portability and Accountability Act, Canadian Standard Association’s Model Code for the Protection of Personal Information, Australian Privacy Amendment Act, etc.) have been enacted in many countries all over the world. Data privacy and protecting individuals’ anonymity have become a mainstream avenue for research. While privacy is a topic discussed everywhere, data anonymity recently established itself as an emerging area of computer science. Its goal is to produce useful computational solutions for releasing data, while providing scientific guarantees that the identities and other sensitive information of the individuals who are the subjects of the data are protected.