Auskunft zu diesem Dagstuhl Seminar erteilt

Dokumente

Summary

Computing hardware is undergoing radical changes. Forced by physical
limitations (mainly heat dissipation problems), systems trend
toward massively parallel and heterogeneous designs. New
technologies, e.g., for high-speed networking or
persistent storage emerge and open up new opportunities for the
design of database systems. This push by technology was the main
motivation to bring top researchers from different
communities - particularly hard- and software -- together to a
Dagstuhl seminar and have them discuss about "Databases on Future
Hardware." This report briefly summarizes the discussions that took
place during the seminar.

With regards to the mentioned technology push, during the seminar
bandwidth; memory and storage technologies; and
accelerators (or other forms of specialized computing
functionality or instruction sets) were considered the most pressing
topic areas in database design.

But it turned out that the field is influenced also by a strong push
from economy/market. New types of applications - in
particular Machine Learning - as well as the emergence of "compute" as
an independent type of resources - e.g., in the form of
cloud computing or appliances - can have a strong impact
on the viability of a given system design.

Bandwidth; Memory and Storage Technologies

During the seminar, probably the most often stated issue in the field
was bandwidth - at various places in the overall system stack,
such as CPU ,memory; machine machine (network); access to secondary
storage (e.g., disk, SSD, NVM). But very interestingly, the
issue was not only brought up as a key limitation to database
performance by the seminar attendees with a software background.
Rather, it also became clear that the hardware side, too, is very actively
looking at bandwidth. The networking community is working at ways to
provide more bandwidth, but also to provide hooks that allow the software
side to make better use of the available bandwidth. On the system
architecture side, new interface technologies (e.g., NVlink,
available in IBM's POWER8) aim to ease the bandwidth bottleneck.

Bandwidth usually is a problem only between system components.
To illustrate, HMC memories ("hybrid memory cube") provide only
320 GB/s of external bandwidth, but internally run at 512 GB/s per
cube ("vault"); in a 16-vault configuration, this corresponds to
8 TB/s of internal bandwidth. This may open up opportunities to build
heterogeneous system designs with near-data processing
capabilities. HMC memory units could, for instance, contain (limited)
processing functionality associated with every storage vault. This way,
simple tasks, such as data movement, re-organization, or scanning could
be off-loaded and performed right where the data resides. Similar
concepts have been used, e.g., to filter data in the network,
pre-process data near secondary storage, etc.

In breakout sessions during the seminar, attendees discussed the
implications that such system designs may have. Most importantly, the
designs will require to re-think the existing (programming)
interfaces. How does the programmer express the off-loaded task?
Which types of tasks can be off-loaded? What are the limitations of the
near-data processing unit (e.g., which memory areas can it
access)? How do host processor and processing unit exchange tasks,
data, and results? Clearly, a much closer collaboration will be needed
between the hard- and software sides to make this route viable.

But new designs may also shake up the commercial market. The
traditional hardware market is strongly separated between the memory and
logic worlds, with different manufacturers and processes. Breaking up
the separation may be a challenge both from a technological and from a
business/market point of view.

The group found only little time during the seminar to discuss another
potential game-changer in the memory/storage space. Companies are about
to bring their first non-volatile memory (NVM) components to the
market (and, in fact, Intel released its first round of "3D XPoint"
products shortly after the seminar). The availability of cheap,
high-capacity, byte-addressable, persistent storage technologies will
have profound impact on database software. Discussions during the
seminar revolved around the question whether classical persistent
(disk-based) mechanisms or in-memory mechanisms are more appropriate to
deal with the new technology.

Accelerators

A way of dealing with the technology trend toward heterogeneity is to
enrich general-purpose systems with more specialized processing units,
accelerators. Popular incarnations of this idea are
graphics processors (GPUs) or field-programmable gate
arrays (FPGAs); but there are also co-processing units for
floating-point arithmetics, multimedia processing, or network
acceleration.

Accelerators may fit well with what was said above. E.g., they
could be used as near-data processing units. But also the challenges
mentioned above apply to many accelerator integration strategies.
Specifically, the proper programming interface, but also the role
of an accelerator in the software system stack - e.g., sharing it
between processes - seem to be yet-unsolved challenges.

During the seminar, also the role of accelerators specifically for
database systems was discussed. It was mentioned, on the one hand, that
accelerators should be used to accelerate functionality outside the
database's core tasks, because existing hard- and software is actually
quite good at handling typical database tasks. On the other hand,
attendees reported that many of the non-core-database tasks, Machine
Learning in particular, demand a very high flexibility that is very hard
to provide with specialized hardware.

New Applications / Machine Learning

Databases are the classical device to deal with high volumes of data.
With the success of Machine Learning in many fields of computing, the
question arises how databases and Machine Learning applications should
relate to one another, and to which extent the database community should
embrace ML functionality in their system designs.

Some of the seminar attendees have, in fact, given examples of very
impressive and successful systems that apply ideas from database
co-processing to Machine Learning scenarios. In a breakout session on
the topic, it was concluded that the two worlds should still be treated
separately also in the future.

A key challenge around Machine Learning seems to be the very high
expectations with regard to the flexibility of the system. ML tasks are
often described in high-level languages (such as R or Python) and demand
expressiveness that goes far beyond the capabilities of efficient
database execution engines. Attempts to extend these engines with
tailor-made ML operators were not very well received, because even the
new operators were too restrictive for ML users.

Economic/Market Considerations

Somewhat unexpectedly, during the seminar it became clear that the
interplay of databases and hardware is not only a question of
technology. Rather, examples from the past and present demonstrate that
even a technologically superior database solution cannot survive today
without a clear business case.

The concept of cloud computing plays a particularly important
role in these considerations. From a business perspective, compute
resources - including database functionality - have become a commodity.
Companies move their workloads increasingly toward cloud-based systems,
raising the question whether the future of databases is also in the
cloud.

A similar line of arguments leads to the concept of database
appliances. Appliances package database functionality in a closed box,
allowing (a) to treat the service as a commodity (business aspect)
and (b) to tailor hard- and software of the appliance specifically
to the task at hand, with the promise of maximum performance (technology
aspect).

And, in fact, both concepts - cloud computing and appliances - may go well
together. Cloud setups enable to control the entire hard- and software
stack; large installations may provide the critical mass to include
tailor-made (database) functionality also within the cloud.