Language

CloudMdsQL (Cloud Multidatastore Query Language) has been designed in the context of the CoherentPaaS project. It is a functional SQL-like language, capable of querying multiple cloud data stores (SQL, NoSQL, HDFS, etc.) within a single query that may contain embedded invocations to each data store’s native query interface. Thus, CloudMdsQL unifies a diverse set of data management technologies while preserving the expressivity of their local query languages. The language itself is SQL-based with the extended capabilities for embedding native queries to data stores in the form of “named table expressions” that return tables according to a specified signature (names and types of the columns), similarly to table-valued user-defined functions. CloudMdsQL is being validated on relational, document and graph data stores.

Data Model

For the definition of the common data model, relational approach has been followed, because of its intuitive data representation, wide acceptance and ability to integrate data using binary operations like joins, unions, etc. The common data model has no global schema, while at the same time it is designed to ensure that datasets retrieved from the data stores are compliant with the common data model. NoSQL data stores, whose data models can be considered as subsets of the relational one (e.g. key-value and document), may be queried via the common query language with SQL-like sub-queries. Other NoSQL data stores may be queried using embedded statements of their native query interface that produce tabular datasets according to the relational nature of the common data model.

Compiler

The query compiler has been designed to parse a CloudMdsQL query and generate an optimized query execution plan to be processed by the query operator engine. The intermediate format of the plan is specified in the form of a JSON document that contains a query execution tree with sufficient information to configure and run each of the query operations. The compiler/optimizer is implemented in C++ and uses the Boost.Spirit framework for parsing context-free grammars. The compiler performs semantic analysis taking into account named table expression signatures.

This research has been partially funded by the European Commission under the FP7 programme project #611068