Distributed Query Processing

Distributed query processing has been introduced since the Jul2015 release of MonetDB through the support for remote tables. The remote table technique complements merge table by allowing the partitions of a merge table to reside on different databases. Queries involving remote tables are automatically split into subqueries by the master database and executed on remote databases. The combination of merge table and remote table enables fine control of how to distribute data and query workloads to maximally benefit from the available CPU and RAM resources.

Setting up REMOTE TABLE-s

Remote table adopts a straightforward master-worker architecture: one can place the partition tables in different databases, each served by a worker MonetDB server, and then glue everything together in a MERGE TABLE in the master database served by the master MonetDB server. Each MonetDB server can act as both a worker and a master, depending on the roles of the tables it serves.

The following shell commands and SQL queries show how to place the partition tables t1 and t2 on two worker databases, and glue them together on a master database. The format of the address of a REMOTE TABLE is: mapi:monetdb://<host>:<port>/<dbname>, where all three parameters are compulsory. A worker database can reside on both a local cluster node and a remote machine. N.B.:

The declaration of a REMOTE TABLE must match exactly the signature of its counterpart in the remote database, i.e., the same table name, the same columns names and the same column data types.

Currently, at the creation time of a remote table, the remote database server is not contacted to verify the existence of the table. When a CREATE REMOTE TABLE report “operation successful”, it merely means that the information about the new remote table has been added to the local SQL catalogue. The check at the remote database server is delayed until the first actual query on the remote table.