What I'm motivated is to collect extra run-time statistics specificto a particular ForeignScan/CustomScan, not only the standardInstrumentation; like DMA transfer rate or execution time of GPUkernels in my case.

Per-node DSM toc is one of the best way to return run-time statisticsto the master backend, because FDW/CSP can assign arbitrary length ofthe region according to its needs. It is quite easy to require.However, one problem is, the per-node DSM toc is already released whenExecEndNode() is called on the child node of Gather.

This patch allows extensions to get control on the master backend'scontext when all the worker node gets finished but prior to releaseof the DSM segment. If FDW/CSP has its special statistics on thesegment, it can move to the private memory area for EXPLAIN outputor something other purpose.

One design consideration is whether the hook shall be called fromExecParallelRetrieveInstrumentation() or ExecParallelFinish().The former is a function to retrieve the standard Instrumentationinformation, thus, it is valid only if EXPLAIN ANALYZE.On the other hands, if we put entrypoint at ExecParallelFinish(),extension can get control regardless of EXPLAIN ANALYZE, however,it also needs an extra planstate_tree_walker().