publications

The intercontinental EarthServer initiative has established a European datacube platform with proven scalability: known databases exceed 100 TB, and single queries have been split across more than 1,000 cloud nodes. Its service interface being rigorously based on the OGC "Big Geo Data" standards, Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS), a series of clients can dock into the services, ranging from open-source OpenLayers and QGIS over open-source NASA WorldWind to proprietary ESRI ArcGIS.

Spatio-temporal data sets often can be represented conveniently through datacubes as a common unifying paradigm. Flexible, scalable services can be offered based on the concept of a datacube query language while hiding the technicalities, thereby allowing user-friendly visual data interaction.

One of today's most influential initiatives in Big Geo Data is EarthServer which is paving the way for flexible, scalable datacube services based on innovative NewSQL technology. Researchers from Europe, the US and recently Australia have teamed up to rigorously materialize the datacube paradigm for Earth Observation, ocean, meteorological, and planetary science.

We have learnt to live with the pain of separating data and metadata into non-interoperable silos. For metadata, we enjoy the flexibility of databases, be they relational, graph, or some other NoSQL. Contrasting this, users still "drown in files" as an unstructured, low-level archiving paradigm. It is time to bridge this chasm which once was technologically induced, but today can be overcome.

With the unprecedented increase of orbital sensor, in-situ measurement, and simulation data as well as their derived products there is an immense potential for getting new and timely insights - yet, the value is not fully leveraged as of today. Incidentally, such spatio-temporal sensor, image, simulation, and statistics data in practice typically constitute prime Big Data contributors.

The data deluge is affecting the oil and gas industry just as much as many other industries. However, aside from the sheer volume there is the challenge of data variety, such as regular and irregular grids, multi-dimensional space/time grids, point clouds, and TINs and other meshes. A uniform conceptualization for modelling and serving them could save substantial effort, such as the proverbial "department of reformatting".

Gridded data, such as images, image timeseries, and climate datacubes, today are managed separately from the metadata, and with different, restricted retrieval capabilities. While databases are good at metadata modelled in tables, XML hierarchies, or RDF graphs, they traditionally do not support multidimensional arrays.

We have learnt to live with the pain of separating data and metadata into non-interoperable silos. For metadata, we enjoy the flexibility of databases, be they relational, graph, or some other NoSQL. Contrasting this, users still "drown in files" as an unstructured, low-level archiving paradigm. It is time to bridge this chasm which once was technologically induced, but today can be overcome.

Big Data in the Earth sciences, the Tera- to Exabyte archives, mostly are made up from coverage data, according to ISO and OGC defined as the digital representation of some space-time varying phenomenon. Common examples include 1-D sensor timeseries, 2-D remote sensing imagery, 3D x/y/t image timeseries and x/y/z geology data, and 4-D x/y/z/t atmosphere and ocean data. Analytics on such data requires on-demand processing of sometimes significant complexity, such as getting the Fourier

With the unprecedented increase of orbital sensor, in-situ measurement, and simulation data there is a rich, yet not leveraged potential for getting insights from dissecting datasets and rejoining them with other datasets, effectively establishing a "datacube" paradigm with the ultimate goal to allow users to "ask any question, any time" thereby enabling them to "build their own product on the go". One of the most influential initiatives in Big Geo Data is EarthServer which is demonstrating new directions for flexible, scalable datacube services based on innovative NewSQL technology.