Active Work In Progress

Add pgsql_fdw as a contrib module

Smart planning

We might have statistics of external data. ANALYZE command would need to have hook to delegate row sampling to each FDW. For this purpose, "Collecting statistics on foreign tables" is under proposal at CF 2011-11. This proposal provides a handler function which allows FDWs to handle ANALYZE commands which are executed for foreign tables. In addition, contrib/file_fdw is enhanced to get sample rows from actual data file and calculate statistics by using existing routines in core.

set_foreign_size_estimates() have to be enhanced to reflect actual statistics.

JOIN push down

Doing a JOIN (or JOINs) on the remote side would reduce amount of data transferred from external server.

Connection caching

Currently, connection caching has not been implemented in order to focus on the FDW API. Ideas below once had been implemented but have since been removed.

Connections to foreign servers are cached and reused during the lifetime of the backend. When a scan of a foreign table is initialized at ExecInitForeignScan(), the backend searches the reusable connection from cache. If the reusable connection is not in cache, then call FdwRoutine.ConnectServer() to create a new connection and store it in the connection cache.

Connections are identified by name. A connection's name is the same as the name of the server which the connection uses.

The pg_foreign_connections view displays all the foreign connections that are available in the current session.

Name

Type

Reference

Description

connname

Text

name of the connection

srvname

Name

pg_foreign_server.srvname

name of the foreign server

usename

Name

pg_authid.rolname

name of the local role which was used to map the foreign user

fdwname

Name

pg_foreign_data_wrapper.fdwname

name of the foreign data wrapper which was used to connect to the foreign server

Finished works

Syntax

In SQL standard, 'CREATE FOREIGN DATA WRAPPER' has a 'LIBRARY' option and FDW routines are exported directly from the library, but another approach like 'CREATE LANGUAGE' would be better because we already have pg_proc, an existing function manager.

In the first version, column DEFAULT value and column level options are omitted to simplify the patch and make review easy.
hackers-ML archive

FDW routines

Version 1

In the SQL standard, FDW routines are designed to have a portable application binary interface. FDW libraries could be used by several DBMSes without recompiling there, but it doesn't seem realistic. Instead, a PostgreSQL-specific and C language-specific routine set would be feasible:

FDW routines are designed to be used in the executor module. The executor seems to be the best-balanced layer for query optimization and data abstraction. It would be harder with other approaches like AM (access methods) or storage manager (smgr) layers to optimize complex queries like JOINs on several foreign tables in the same foreign server.

Only interfaces of FdwRoutine, FSConnection are defined in PostgreSQL core, and the actual contents are implemented by each FDW library.

In future, more planner hooks may be added to allow FDWs to optimize the query.

Version 4

In 9.2, PlanForeignScan is changed so that FDW can return multiple scan paths per a foreign table, and this change get rid of FdwPlan. Planner chooses appropriate path from paths provided by FDW, and creates only one ForeignScan node which has copy of fdw_private of chosen path. Now PlanForeignScan is responsible to create ForeignScan path node and add it to RelOptInfo (baserel). You can use create_foreignscan_path, which is also changed in 9.2, to create a finished ForeignScan path node.

On-disk structure

pg_catalog.pg_foreign_data_wrapper

An FDW handler function returns an FDW routine set. A new pseudo type 'fdw_handler' is added to represent the routine set. FDW handlers take no arguments and return fdw_handler type.

A FDW handler is registered in fdwhandler column of pg_foreign_data_wrapper catalog. InvalidOid for fdwhandler means that the foreign-data wrapper has no FDW handler, so it can't be used to define any foreign table. This specification supports usage in which foreign-data wrapper is used as the container of connection information like the past.

pg_catalog.pg_foreign_table

A foreign table is registered in pg_class with relkind = 'f' (RELKIND_FOREIGN_TABLE). It also has a corresponding pg_foreign_table tuple, in which we store the foreign server ID and generic options for the foreign table.

Planner and Executor changes

The access layer of foreign tables will be implemented in the planner module and the executor module. We will have new ForeignPath and ForeignScan nodes for this purpose.

Planner

The Planner module is responsible to find the best access path, so FDW should provide the cost for a ForeignPath.

In the planning phase, create_foreignscan_path() calls PlanRelScan() of the related FDW's FdwRoutine for each ForeignScan node. PlanRelScan() should provide proper costs for the scan which have been estimated in the way each FDW would like to use.

In future, additional planner hooks might be added for:

Pass-through mode (one ForeignScan node executes whole query)

Query optimization such as merging multiple foreign tables into one remote query

To estimate costs as correctly as possible, FDWs might want to have their own statistics. In this step, we don't provide a common mechanism to store statistics. Once such mechanism has been implemented, FdwRoutine should have another function which is called from ANALYZE. With such a function, FDWs can update their statistics in their own respective ways.

In version 1, the planner generates a ForeignScan node for each foreign table in the query, and store FdwPlan in it which is returned by PlanRelScan().

Executor

Call FdwRoutine.BeginScan() with FdwPlan which was stored in ForeignScan to initiate foreign query if the execution was not for EXPLAIN, and receive FdwExecutionState.

ExecForeignScan()

Call FdwRoutine.Iterate() to retrieve a tuple from the foreign table via TupleTableSlot.

If the scan reaches the end, the slot will be empty after Iterate() call.

ExecForeignReScan()

Call FdwRoutine.ReScan() to re-initialize scanning.

ExecEndScan()

Call FdwRoutine.EndScan() to finalize the foreign scan.

ExecForeignMarkPos()/ExecForeignRestrPos()

Currently MarkPos() and RestrPos() for ForeignScan are not supported, so ExecSupportsMarkRestore() returns false for ForeignScan. The reason not to support is that they are used to perform merge join, and merge join needs sorted results. If a FDW could deparse Sort nodes into ORDER BY clause properly and supports MarkPos() and RestrPos(), then merge join of foreign tables are supported.

ExecInitForeignScan() generates ForeignScanState from ForeignScan and FDW routines use it to manage the status of scan.

FdwExecutionState has private area which can be used to pass foreign-data wrapper specific data between FDW routines. Each foreign-data wrapper can define private data structure and store it into ForeignScanState.fdw_state->private.

Per-column FDW option

Similar to other kind of FDW objects, column of a foreign table can have FDW options. This means that CREATE/ALTER FOREIGN TABLE syntax accept OPTIONS clause for a column, and key/value pairs are stored in attfdwoptions of pg_attribute.

Because of syntax vagueness between "DEFAULT b_expr" and "OPTIONS ( ... )", OPTIONS clause for a column must be specified before any constraints or default value.

Foreign data wrappers

file_fdw

The file_fdw is a foreign-data wrapper implementation, and included in the distribution of PostgreSQL 9.1 as a contrib module. This can be used to read data from files in the server's local file system like COPY FROM command.
Currently, stdin, although allowed in COPY FROM, is not supported.

Because the FDW read from files on server-side, some security issues should be considered. Maybe Non-superuser should not be allowed to create or alter foreign tables which uses the file_fdw. At least by default.

using COPY FROM routines

File_fdw can recognize the file formats which are recognized by COPY command, by using exported COPY FROM routines.

generic options

Information of the source file such as filename are passed via generic options. Options of COPY FROM statement are acceptable, but oids is not supported by file_fdw because it's a legacy feature.

Different from COPY, the force_not_null can be described in per-column generic option with boolean values, not a list of column names.

PostgreSQL

This can be used to connect external postgres servers.
It might be able to be integrated with contrib/dblink to share the code and connections.
dblink will be installed optionally like as standard contrib modules.

Connection options

The connection options are constructed from FDW options of foreign-data wrapper, foreign server and user mapping, with choosing only connection options because FDW option might include non-connection options such as relname and nspname.
Note that non-superuser MUST specify password in FDW options and require password authentication by the foreign server because of security issues.

In current implementation, FDW options of user mappings are visible to users who has SUPERUSER privilege or USAGE privilege on relevant SERVER, because of security issues.

No transaction management

FDW for PostgreSQL never emit transaction command such as BEGIN, ROLLBACK and COMMIT. Thus, all SQL statements are executed in each transaction when 'autocommit' was set to 'on'.

Cost estimation

ANALYZE for foreign tables is not supported in 9.0, so we can't store statistics in local PG. One work around is getting EXPLAIN result from remote server, and use its cost values for local planning.

SELECT-clause optimization

Currently SELECT clause is constructed as "SELECT col1, col2, col3, ...". If some of columns are not used at all in the original query, they will be replaced with NULL for optimization. For example, if col2 was unused, SELECT clause will be "SELECT col1, NULL, col3, ...". Main purpose of this optimization is to reduce amount of data transferred from remote server.

WHERE-clause push-down

WHERE clauses in the original query are pushed-down into the reconstructed query sent to the foreign server.

To push-down a condition, it must consist of only the following node types. For this purpose, we check each element in RelOptInfo.baserestrictinfo list. If there are conditions which can't be pushed down, the remote server will send rows without the conditions, and the local server will evaluate the rows and ignore rows which don't satisfy the conditions.

Element

Tag name

Note

Constant value

Const

Table column reference

Var

Array of some type

Array

expression like "'{1, 2, 3}'"

External parameter

Param

"External" means that "Param.paramkind == PARAM_EXTERNAL"

Bool expression

BoolExpr

expressions such as "A AND B", "A OR B", "NOT A"

NULL test

NullTest

expressions like "IS [NOT] NULL"

Operator

OpExpr

pg_operator.opcode MUST be a IMMUTABLE function

DISTINCT operator

DistinctExpr

expressions like "A IS DISTINCT FROM B"

pg_operator.opcode MUST be a IMMUTABLE function

Scalar array operator

ScalarArrayOpExpr

expressions such as "ANY (...)", "ALL (...)"

pg_operator.opcode MUST be a IMMUTABLE function

Function call

FuncExpr

MUST be a IMMUTABLE function

Neither ORDER BY, LIMIT, OFFSET, GROUP BY nor HAVING is used in a foreign query.

Retrieving result tuples

This FDW switches method for retrieving result tuples according to estimated # of result rows.

If the estimated rows is less than the threshold, simple SELECT is used to retrieve all result at once in first call of Iterate() after Begin() or ReScan(). Otherwise, SQL-level cursor is created in that place, and result rows are retrieved when they were necessary.

Two numbers, minimum # of rows to use cursor and # of rows fetched in one FETCH call, are configurable via FDW option of SERVER and/or FOREIGN TABLE. If a option was specified on both object, latter overrides former.

We must ensure that PGresult is released explicitly in any case because libpq uses malloc rather than palloc. Copying results into a Tuplestorestate is a solution, which is used in contrib/dblink, but it needs extra memory during the copy and some overhead. Another solution is registering cleanup function to resource owner, and release PGresult in that cleanup function. This method has already been used to close libpq connection.

Open questions

There are still several issues in the FDW design and implementation:

Which should we export foreign connection management functions from?

Currently DISCARD ALL disconnects all of connections, but we might provide SQL functions to manage each foreign connection. We could export those functions from the core like pg_connect()/pg_disconnect(), or continue to use contrib/dblink if they are optional.

Resolved questions

pg_foreign_table.ftoptions vs. pg_class.reloptions

We could store ftserver and ftoptions into some fields in pg_class, ex. relam and reloptions, because we probably won't use those fields for foreign tables.

FdwRoutine vs. SETOF record function

Some of fdw routines are similar to SETOF record function. We could merge them or share some of the internal routines. However, it seems to be hard to use SRF instead of FdwRoutine because FDW needs to support a couple of utility functions; connect, disconnect, handle WHERE conditions, etc.

fdw_handler vs. function table like pg_am

FDW routines requires a set of functions. The fdw_handler can pack those functions in a C++ like interface. However, we have pg_am for index access methods, that is a table-based approach. Note that we probably need to write fdw routines with C because it accesses executor objects to extract expressions.

Which user identifier is appropriate to determine USER MAPPING ?

Current implementation uses OuterUserId but not CurrentUserId to determine USER MAPPING. Because OuterUserId is the role that the user specified explicitly with SET ROLE or SET SESSION AUTHORIZATOIN, on the other hand, CurrentUserId is changed implicitly during execution of a function which have been created with SECURITY DEFINER option. It would not be what the user expect that a access to a foreign table via a SECURITY-DEFINER-function uses the USER MAPPING which related to the owner of the function. Is this an appropriate specification ?

Locking a foreign table

Currently a foreign table can be locked in only ACCESS SHARE mode because only SELECT privilege can be granted on a foreign table. In normal table case, at least one of INSERT/UPDATE/DELETE privilege is required to lock in other modes. Should we relax the restriction if the target is a foreign server ? We must consider about recursive locking via table inheritance.

Future improvements

General

Smart planning

ANALYZE command can update pg_statistic and part of pg_class (reltuples and relpages) of the foreign tables with adding FDW routine Analyze(tableoid or tablename) which returns pg_statistic records for the foreign table.

The costs to access foreign data will be different from the cost to access local data even if the data definition and contents are same. GENERIC OPTION like cost_factor allow to tell the overhead to planner.

for SQL-based FDWs

JOINs of two foreign tables in the same server

They could be merged into one ForeignScan so that the foreign server can return the result after local JOINs in it.

Optimize SELECT clause

Some foreign scan need only a part of columns. Unnecessary columns in such a scan are omissible from the SELECT clause.

Support internal parameter

A certain kind of a plan, i.e. nested loop, generates internal parameter to pass value(s) from parent node to child node. The number of records acquired from an foreign server can be decreased by applying an internal parameter to external query.

This seems difficult in some cases, because value of internal parameter is determined after fetching tuple from a relation.

Optimize parameter

Some foreign scan uses only a part of parameters of EXECUTE statement. Unused parameters are omissible from the parameter of PQexecParams(). And parameters can be passed in binary format to avoid conversion between text and binary.

Support cursor mode for huge result

Currently libpq does not support protocol level cursor, so the FDW for PostgreSQL executes SELECT statement directly via PQexecParams() and retrieves all tuples at once. If parameterized cursor is supported, the FDW for PostgreSQL will be able to retrieve a part of the result at a time to improve response.

Push-down WHERE clause including CURRENT_TIMESTAMP

Rewriting query like pgpool, or replacing the FuncExpr node with a Const node representing the result of CURRENT_TIMESTAMP.