Wednesday, March 18, 2015

Parallel Sequential Scan for PostgreSQL 9.5

Amit Kapila and I have been working very hard to make parallel sequential scan ready to commit to PostgreSQL 9.5. It is not all there yet, but we are making very good progress. I'm very grateful to everyone in the PostgreSQL community who has helped us with review and testing, and I hope that more people will join the effort. Getting a feature of this size and complexity completed is obviously a huge undertaking, and a significant amount of work remains to be done. Not a whole lot of brand-new code remains to be written, I hope, but there are known issues with the existing patches where we need to improve the code, and I'm sure there are also bugs we haven't found yet.
Currently, if you'd like to test out parallel sequential scan, you need to look at four different patches. Most of these patches are being updated at a rather brisk pace, so by the time you read this the versions mentioned here may no longer be the latest ones. But right now, I think the latest versions are:

parallel-mode-v8.1.patch. This patch introduces the basic infrastructure we need to do work in parallel in PostgreSQL. It allows a process to request that the postmaster launch a given number of background workers, and it automatically arranges to share important pieces of backend-private state such as GUC settings and snapshots with those workers. This infrastructure is intended to be used not only for parallel sequential scan, or even for parallel query more generally, but also for parallel utility commands (e.g. parallel CLUSTER or parallel VACUUM). It could even be used user-defined functions that spin up parallel workers for particularly compute-intensive tasks. In general, I think this patch is in pretty good shape. Andres Freund has concerns that the approach I have taken to handling heavyweight locking may not be robust, and this was certainly a valid criticism of earlier versions, but I have tightened it up quite a bit since then. Hopefully it will pass muster.

assess-parallel-safety-v4.patch. This patch introduces introduces a framework for deciding whether a particular query is safe to run in parallel mode. It works by classifying functions as parallel-safe (meaning that the use of that function imposes no restrictions on the use of parallelism), parallel-restricted (meaning that a parallel query can use that function, but the function itself must be executed in the parallel group leader, not one of the workers), or parallel-unsafe (meaning that a query that includes this function cannot use parallelism at all). An example of the latter category is setval(). Right now, no process involved in a parallel operation can perform any data writes, so if we choose a parallel plan for a query containing setval(), it will fail at execution time.

There are a couple of thorny problems around this patch. One is that the patch currently enables parallelism only when the simple query protocol is used. This is because parallelism can only be used when the query will be run to completion. If the client sends a Query (Q) message, that always runs the query all the way through. If the client sends Parse (P), Bind (B), and Execute (E) messages, the execute message could specify a row count, meaning that we should not run the query to completion, but only until that many rows are generated. libpq doesn't actually support this, but the underlying wire protocol does, and some drivers may be relying on it. Another question is whether this patch will add too much overhead in cases where parallelism is not used.

parallel-heap-scan.patch. This patch provides a way for several cooperating backends to perform a coordinated sequential scan of some relation. Your first reaction might be to think this is the payoff patch, but it's not, because it's exposing a C API for this functionality, not an SQL one. It turns out that, with the infrastructure provided by the parallel-mode patch, this is actually quite simple. The hard part is making the functionality visible at the SQL level.

parallel_seqscan_v10.patch. This one is the payoff patch. It introduces two new executor nodes, one called Funnel and another called Partial Seq Scan. Funnel is actually a generic kind of node that will be useful for other forms of parallelism we may want to introduce in the future. A funnel node has a single child, which represents the operation to be run in parallel. It launches a designated number of background workers and consolidates the output of all of those background workers into a single stream of tuples.

The other new type of node is a Partial Seq Scan, which will appear under a Funnel. The idea is that each worker does a Partial Seq Scan, and together those partial scans add up to a full scan. The Funnel consolidates those partial results into a single stream of output tuples.

In addition, this patch also introduces a bunch of infrastructure which will be needed for any kind of parallel query, though not necessarily for other kinds of parallel operations such as parallel utility statements. For example, it makes parameters passed to the query visible inside all of the child backends, and it adds a bunch of infrastructure for transporting the portion of the query to be executed by the parallel worker from the parallel leader to each worker. Currently, this will always be a Partial Seq Scan node, but it is generic enough to transport any other plan node we might want to push down to a worker in the future.

The current version of this patch, v10, is not yet integrated with the assess-parallel-safety patch. Amit is working on correcting that, and is also working on fixing a number of bugs that have been reported. Expect a new version soon. This patch still needs some more refactoring, and there are more mundane decisions that need to be made as well, like how the costing should work. Still, Amit has made great progress in making this infrastructure much more general and improving the structure of it over the last few weeks, and I am excited about it.

It remains to be seen how much of this work we will be able to incorporate into PostgreSQL 9.5. Certainly, a good deal of work remains to be done. On the other hand, a large portion of this code is quite stable, and we seem to be making fairly rapid progress. Even if we are not able to hammer out the remaining issues in time for PostgreSQL 9.5, I am quite hopeful that it will not take us too much longer than that. I believe that parallel query is an important part of the future of PostgreSQL, and I believe that having a fully working - if basic - implementation of parallel query for PostgreSQL is not too far off. Once we have that, we can build on it: the sky is the limit.