Virtual Earth Observatory Concepts

Satellite missions continuously send to Earth huge amounts of EO data providing snapshots of the surface of the Earth or its atmosphere. The management of the so-called payload data is an important activity of the ground segments of satellite missions. Figure 1(a) gives a high-level view of some of the basic data processing and user services available at EO data centers today e.g., the German Remote Sensing Data Center (DFD) of TELEIOS partner DLR through its Data Information and Management System (DIMS) [1].

Raw data, often from multiple satellite missions, is ingested, processed, catalogued and archived. Processing results in the creation of various standard products (Level 1, 2 etc. in EO jargon; raw data is Level 0) together with extensive metadata describing them. For example, in the NOA application images from the SEVIRI sensor are processed (cropped, georeferenced and run through a pixel classification algorithm) to detect pixels that are hotspots. Then these pixels are stored as standard products in the form of shapefiles. Raw data and derived products are complemented by auxiliary data e.g., various kinds of geospatial data such as maps, land use/land cover data etc.

Raw data, derived products, metadata and auxiliary data are stored in a variety of storage systems and are made available using a variety of policies depending on their volume and expected future use. For example, in the TerraSAR-X archive managed by DFD, long term archiving is done using a hierarchy of storage systems (including a robotic tape library) which offers batch to near-line access, while product metadata are available on-line by utilizing a relational DBMS and an object-based query language [1].

EO data centers such as DFD also offers a variety of user services. For example, for scientists that want to utilize EO data in their research, DFD offers the Web interface EOWEB-NG for searching, inspection and ordering of products. Space agencies such as DLR and NOA might also make various other services available aimed at specific classes of users. For example, the Center for Satellite Based Crisis Information (ZKI) of DLR provides a 24/7 service for the rapid provision, processing and analysis of satellite imagery during natural and environmental disasters, for humanitarian relief activities and civil security issues worldwide. Similar emergency support services for fire mapping and damage assessment are offered by NOA through its participation in the GMES SAFER programme.

The TELEIOS advancements to today's state of the art in EO data processing are shown graphically with yellow color in Figure 1(b) and can be summarized as follows:

Hierarchies of domain concepts are formalized using RDFS ontologies and are used to annotate standard products. Annotations are expressed in RDF and are made available as linked data [3] so that they can be easily combined with other publicly available linked data sources (e.g., GeoNames, LinkedGeoData, DBpedia) to allow for the expression of rich user queries.

Web interfaces to EO data centers and specialized applications (e.g., rapid mapping) can now be improved significantly by exploiting the semantically-enriched standard products and linked data sources made available by TELEIOS. For example, an advanced EOWEB-like interface to EO data archives can be developed on top of a system like Strabon to enable end-users to pose very expressive queries (an example is given below). Rapid mapping applications can also take advantage of rich semantic annotations and open linked data to produce useful maps even in cases where this is difficult with current technology. Open geospatial data are especially important here. There are cases of rapid mapping where emergency response can initially be based on possibly imperfect, open data (e.g., from OpenStreetMap) until more precise, detailed data becomes available.

In all of the above processing stages, from raw data to application development, TELEIOS utilizes the scientific database query language SciQL [4], and the semantic web data model stRDF and its query language stSPARQL [5]. These data models and query languages and their role in TELEIOS are summarized below.

Data Modeling and Querying in TELEIOS

SciQL [4] is an SQL query language for scientific applications with arrays as first class citizens. It provides a seamless symbiosis of array-, set-, and sequence- interpretation using a clear separation of the mathematical object from its underlying implementation. A key innovation is to extend value-based grouping in SQL:2003 with structural grouping, i.e., fixed-sized and unbounded groups based on explicit relationships between their dimension attributes. It leads to a generalization of window-based query processing with wide applicability in science domains.

Most of previous database approaches to the management of EO data store satellite images either as a separate file repository outside the database, or as "black-box" BLOBs inside the database [4]. In either case, the DBMS's declarative database query language (SQL) cannot transparently access the content of the EO data. Instead, external programs, external user defined functions (usually implemented in C or other procedural programming languages) [4], or array extensions implemented as middleware on top of the DBMS are required to access the EO data content [4]. In particular, typical operations on (satellite) images such as cropping, re-scaling, and geo-referencing are currently implemented in various EO data centers using specialized software such as stand-alone external programs or external user-defined functions. Changing existing operations or adding new ones thus requires the programming skills of a database or application developer and compilation of the new code.

In [4], we have developed SciQL, a new SQL-based query language for scientific applications with arrays as first-class citizens. SciQL uses multi-dimensional arrays to represent EO data of various processing levels. This allows us to store EO data (e.g., satellite images) in the database, and query and manipulate their content transparently within the high-level declarative database query language. This has three important advantages. First, it allows us to express low level image processing (e.g., cropping, re-scaling, geo-referencing etc.) as well as image content analysis (e.g., feature extraction, pixel classification) in a user-friendly high-level declarative language that provides efficient array manipulation primitives. Second, it opens up these algorithms to be optimized by the DBMSs (extended) query optimizer. Third, using the seamless integration and symbiosis of relational tables and arrays, query processing and knowledge discovery can exploit both image metadata and image data at the same time.

We are also utilizing the model stRDF, an extension of the W3C standard RDF that allows the representation of geospatial data that changes over time [5]. stRDF is accompanied by stSPARQL, an extension of the query language SPARQL 1.1 for querying stRDF data. stRDF and stSPARQL use OGC standards (Well-Known Text and Geography Markup Language) for the representation of temporal and geospatial data.

In TELEIOS, stRDF is used to represent satellite image metadata (e.g., time of acquisition, geographical coverage), knowledge extracted from satellite images (e.g., a certain image region is a water body) and auxiliary geospatial data sets encoded as linked data. One can then use stSPARQL to express in a single query an information request such as the following: "Find an optical image taken by a high resolution satellite during June 2011 which covers an area near one of the major cities of Crete and contains a region which is a lake". Encoding this information request today in a typical interface to an EO data archive such as EOWEB-NG is impossible, because domain-specific concepts such as "lake" are not included in the archive metadata, thus they cannot be used as search criteria. In [2,6] we have been developing image information mining techniques that allow us to characterize satellite image regions with concepts from a landcover ontology (e.g., water-body, lake, etc.). These concepts are encoded in RDFS ontologies and are used to annotate EO products. In this way, we attempt to close the semantic gap that exists between user requests and searchable information available explicitly in the archive.

But even if domain-specific concepts were included in the archive annotations, one would need to join them with information obtained from auxiliary data sources to answer the above query (e.g., Wikipedia to find the major cities of Crete, GeoNames to find their geographic location etc.). Although such open sources of data are available to EO data centers, they are not used currently to support sophisticated ways of end-user querying in Web interfaces such as EOWEB-NG. In TELEIOS, we assume that auxiliary data sources, especially geospatial ones, are encoded in RDF and are available as linked data, thus stSPARQL can easily be used to express information requests such as the above. The linked data web is being populated with geospatial data quickly, thus we expect that languages such as stSPARQL (and the related forthcoming OGC standard GeoSPARQL) will soon be mainstream extensions of SPARQL that can be used to access such data effectively.