Knowledge Discovery from EO Data

In TELEIOS we deal with knowledge discovery from Earth-Observation (EO) images, related geospatial data sources and their associated metadata, mapping the extracted low-level data descriptors into semantic classes and symbolic representations, and providing an interactive method for efficient image information mining. We address three important tasks:

Knowledge discovery from EO images and related GIS data.

This task focuses on the design and implementation of methods for the extraction of relevant descriptors (features) of EO images, specifically TerraSAR-X images, physical integration (fusion) and combined usage of raster images and vector data in synergy with existing metadata. The extracted content is represented using the data model stRDF and queried using query language stSPARQL.

This task concentrates on investigating new semi-supervised learning methods to cope with heterogeneous spatial-temporal image data and to take into account contextual information. The methods combine labelled and unlabeled data during training to improve the performance of classification and the generation of categories (number of classes). The methods are applied jointly to raster, vector and text data, for the definition of semantic categories.

This task focuses on designing and implementing HMI techniques with the users in the loop to optimize the visual interaction with the huge volumes of data of heterogeneous nature that are present in EO archives. Since human perception is limited in communication capacity, the HMI paradigm is supported by special methods that increase the information being transmitted.

These tasks may be better understood in the context of the following figure which introduces the framework of Knowledge Discovery within TELEIOS.

TELEIOS intends to implement a communication channel between the EO Data Sources and the User (Operator) who receives the content of the Data Sources coded in an understandable format associated with semantics.

The Data Sources are TerraSAR-X images and their associated metadata (e.g., acquisition time, incidence angles, etc), and auxiliary data in vector format coming from GIS sources that complement the information about the images (e.g., park boundaries, city boundaries or land uses represented as polygons).

The Data Model Generation focuses on content and context analysis of the different data sources. The image content analysis provides different feature extraction methods, which are dealing with the specificities of TerraSAR-X images in order to represent the relevant and correct information contained in the images known as descriptors. The image descriptors are complemented with image metadata (text information) and GIS data (vector data). It is important to note that the efficiency of the query data mining and knowledge discovery depends on the robustness and accuracy of the image descriptors.

The Operator (User) requires visual information that is intuitive, unlike raw TerraSAR-X images that convey land cover classes such as forests, water bodies, etc., as different grey levels. However, when the image content is enriched with semantic annotations expressed using ontologies, the operator can understand the content of the image better and perform queries over collections of images easily. For instance, having available annotations regarding forest regions in Brazil or flooding areas in Nepal is more understandable than observing grey levels in a TerraSAR-X image.

The component Query, Data Mining and Knowledge Discovery requires the combined use of the following techniques: (1) image processing and pattern recognition for understanding and extracting useful patterns from a single image, (2) spatial data mining for identifying implicit spatial relationships between images, and (3) content-based retrieval of images from the archive based on their semantic and visual content. These types of techniques are used to discover knowledge from the EO data sources.

The Interpretation and Understanding component, with the interaction of the Operator, attempts to extract information from the DBMS in order to carry out topological, spatial and temporal analysis by combining relevant information stored in the database.

The Visual Data Mining component allows interactive exploration and analysis of very large, highly complex, and non-visual data sets stored in the database. It provides the operator with an intuitive tool for Data Mining through a graphical interface, where the selection of different images in 2-D or 3D space is achieved through visualization techniques, data reduction methods, and similarity metrics to group the images. Thus, Visual Data Mining links the DBMS with the real user applications.

Finally, the DBMS is the central component of our knowledge discovery framework. TELEIOS adopts a DB-centric view of Earth Observation: data coming from the Data Sources are stored in the MonetDB database and queried using the query language SciQL to carry out the sophisticated content and context analysis that is necessary for knowledge discovery. The discovered knowledge is expressed in terms of semantic classes associated with an image and it is represented by instantiating an appropriate EO ontology that we have defined in TELEIOS. This ontology and its instances are encoded in stRDF and stored in MonetDB together with other relevant data (e.g., image metadata). These stRDF graps are then queried using the query language stSPARQL to support user-related tasks.