The set of functions that need parallel-safety adjustments isn't the
same in 9.6 as 10, so I shouldn't have blindly back-patched that list.
Adjust as needed. Also, provide examples of the commands to issue.

In 9.6, we moved a number of functions over to using the GRANT system to
control access instead of having hard-coded superuser checks.
As it turns out, adminpack was creating another function in the catalog
for one of those backend functions where the superuser check was
removed, specifically pg_rotate_logfile(), but it didn't get the memo
about having to REVOKE EXECUTE on the alternative-name function
(pg_logfile_rotate()), meaning that in any installations with adminpack
on 9.6 and higher, any user is able to run the pg_logfile_rotate()
function, which then calls pg_rotate_logfile() and rotates the logfile.
Fix by adding a new version of adminpack (1.1) which handles the REVOKE.
As this function should have only been available to the superuser, this
is a security issue, albeit a minor one.
Security: CVE-2018-1115

Python 3.7 removes the trailing comma in the repr() of
BaseException (see <https://bugs.python.org/issue30399>), leading to
test output differences. Work around that by composing the equivalent
test output in a more manual way.

If a continuation record is split so that its first half has already been
removed from the master, and is only present in pg_wal, and there is a
recycled WAL segment in the standby server that looks like it would
contain the second half, recovery would get stuck. The code in
XLogPageRead() incorrectly started streaming at the beginning of the
WAL record, even if we had already read the first page.
Backpatch to 9.4. In principle, older versions have the same problem, but
without replication slots, there was no straightforward mechanism to
prevent the master from recycling old WAL that was still needed by standby.
Without such a mechanism, I think it's reasonable to assume that there's
enough slack in how many old segments are kept around to not run into this,
or you have a WAL archive.
Reported by Jonathon Nelson. Analysis and patch by Kyotaro HORIGUCHI, with
some extra comments by me.
Discussion: https://www.postgresql.org/message-id/CACJqAM3xVz0JY1XFDKPP%2BJoJAjoGx%3DGNuOAshEDWCext7BFvCQ%40mail.gmail.com

Dan Wood diagnosed a long-standing problem that pages containing tuples
that are locked by multixacts containing live lockers may spuriously end
up as candidates for getting their all-visible flag set. This has the
long-term effect that multixacts remain unfrozen; this may previously
pass undetected, but since commit XYZ it would be reported as
"ERROR: found multixact 134100944 from before relminmxid 192042633"
because when a later vacuum tries to freeze the page it detects that a
multixact that should have gotten frozen, wasn't.
Dan proposed a (correct) patch that simply sets a variable to its
correct value, after a bogus initialization. But, per discussion, it
seems better coding to avoid the bogus initializations altogether, since
they could give rise to more bugs later. Therefore this fix rewrites
the logic a little bit to avoid depending on the bogus initializations.
This bug was part of a family introduced in 9.6 by commit a892234f830e;
later, commit 38e9f90a227d fixed most of them, but this one was
unnoticed.
Authors: Dan Wood, Pavan Deolasee, Álvaro Herrera
Reviewed-by: Masahiko Sawada, Pavan Deolasee, Álvaro Herrera
Discussion: https://postgr.es/m/84EBAC55-F06D-4FBE-A3F3-8BDA093CE3E3@amazon.com

This should have been done some years ago as promised in commit
c4dcdd0c2. However, better late than never.
Along the way do a little housekeeping, including using a simpler test
for the python version being tested, and removing a redundant subroutine
parameter. These changes only apply back to release 9.5.
Backpatch to all live releases.

The non-cosmetic changes involve teaching the "zic" tzdata compiler about
negative DST. While I'm not currently intending that we start using
negative-DST data right away, it seems possible that somebody would try
to use our copy of zic with bleeding-edge IANA data. So we'd better be
out in front of this change code-wise, even though it doesn't matter for
the data file we're shipping.
Discussion: https://postgr.es/m/30996.1525445902@sss.pgh.pa.us

If an interrupt arrives in the middle of FinishPreparedTransaction
and any callback decide to call CHECK_FOR_INTERRUPTS (e.g.
RemoveTwoPhaseFile can write a warning with ereport, which checks for
interrupts) then it's possible to leave current GXact undeleted.
Backpatch to all supported branches
Stas Kelvich
Discussion: ihttps://www.postgresql.org/message-id/3AD85097-A3F3-4EBA-99BD-C38EDF8D2949@postgrespro.ru

Per discussion, the value of fixing these bugs in the back branches
doesn't outweigh the downsides of changing corner-case behavior in
a minor release. Hence, revert commits 217d8f3a1 and 4d864de48 in
the v10 branch and the corresponding commits in 9.3-9.6.
Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A73E741@BPXM05GP.gisp.nec.co.jp

statext_dependencies_load and statext_ndistinct_load were not up to snuff,
in addition to being randomly different from each other. In detail:
* Deserialize the fetched bytea value before releasing the syscache
entry, not after. This mistake causes visible regression test failures
when running with -DCATCACHE_FORCE_RELEASE. Since it's not exposed by
-DCLOBBER_CACHE_ALWAYS, I think there may be no production hazard here
at present, but it's at least a latent bug.
* Use DatumGetByteaPP not DatumGetByteaP to save a detoasting cycle
for short stats values; the deserialize function has to be, and is,
prepared for short-header values since its other caller uses PP.
* Use a test-and-elog for null stats values in both functions, rather
than a test-and-elog in one case and an Assert in the other. Perhaps
Asserts would be sufficient in both cases, but I don't see a good
argument for them being different.
* Minor cosmetic changes to make these functions more visibly alike.
Backpatch to v10 where this code came in.
Amit Langote, minor additional hacking by me
Discussion: https://postgr.es/m/1349aabb-3a1f-6675-9fc0-65e2ce7491dd@lab.ntt.co.jp

Support for version-0 calling convention was removed in PostgreSQL v10.
Change the SPI example to use version 1 convention, so that it actually
works.
Author: John Naylor
Discussion: https://www.postgresql.org/message-id/CAJVSVGVydmhLBdm80Rw3G8Oq5TnA7eCxUv065yoZfNfLbF1tzA@mail.gmail.com

While looking at a recent buildfarm failure in the ecpg tests, I wondered
why the pg_regress output claimed the stderr part of the test failed, when
the regression diffs were clearly for the stdout part. Looking into it,
the reason is that pg_regress.c's logic for iterating over three parallel
lists is wrong, and has been wrong since it was written: it advances the
"tag" pointer at a different place in the loop than the other two pointers.
Fix that.

Buildfarm results show that the modern POSIX rule that 1 ^ NaN = 1 is not
honored on *BSD until relatively recently, and really old platforms don't
believe that NaN ^ 0 = 1 either. (This is unsurprising, perhaps, since
SUSv2 doesn't require either behavior.) In hopes of getting to platform
independent behavior, let's deal with all the NaN-input cases explicitly
in dpow().
Note that numeric_power() doesn't know either of these special cases.
But since that behavior is platform-independent, I think it should be
addressed separately, and probably not back-patched.
Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A73E741@BPXM05GP.gisp.nec.co.jp

Per spec, the result of power() should be NaN if either input is NaN.
It appears that on some versions of Windows, the libc function does
return NaN, but it also sets errno = EDOM, confusing our code that
attempts to work around shortcomings of other platforms. Hence, add
guard tests to avoid substituting a wrong result for the right one.
It's been like this for a long time (and the odd behavior only appears
in older MSVC releases, too) so back-patch to all supported branches.
Dang Minh Huong, reviewed by David Rowley
Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A73E741@BPXM05GP.gisp.nec.co.jp

Commit b75f467b6e removed the LogicalTapeAssignReadBufferSize() function,
but forgot to update this comment. The read buffer size is an argument to
LogicalTapeRewindForRead() now. Doesn't seem worth going into the details
in the file header comment, so remove the outdated sentence altogether.

The predecessor test boiled down to "PQserverVersion(NULL) >= 100000",
which is always false. No release includes that, so it could not have
reintroduced CVE-2018-1058. Back-patch to 9.4, like the addition of the
predecessor in commit 8d2814f274def85f39fbe997d454b01628cb5667.
Discussion: https://postgr.es/m/20180422215551.GB2676194@rfd.leadboat.com

Previously, you could partition by a boolean column as long as you
spelled the bound values as string literals, for instance FOR VALUES
IN ('t'). The trouble with this is that ruleutils.c printed that as
FOR VALUES IN (TRUE), which is reasonable syntax but wasn't accepted by
the grammar. That results in dump-and-reload failures for such cases.
Apply a minimal fix that just causes TRUE and FALSE to be converted to
strings 'true' and 'false'. This is pretty grotty, but it's too late for
a more principled fix in v11 (to say nothing of v10). We should revisit
the whole issue of how partition bound values are parsed for v12.
Amit Langote
Discussion: https://postgr.es/m/e05c5162-1103-7e37-d1ab-6de3e0afaf70@lab.ntt.co.jp

EventTriggerTableRewrite crashed if there were table_rewrite triggers
present, but there had not been when the calling command started.
EventTriggerDDLCommandEnd called ddl_command_end triggers if present,
even if there had been no such triggers when the calling command started,
which would lead to a failure in pg_event_trigger_ddl_commands.
In both cases, fix by doing nothing; it's better to wait till the next
command when things will be properly initialized.
In passing, remove an elog(DEBUG1) call that might have seemed interesting
four years ago but surely isn't today.
We found this because of intermittent failures in the buildfarm. Thanks
to Alvaro Herrera and Andrew Gierth for analysis.
Back-patch to 9.5; some of this code exists before that, but the specific
hazards we need to guard against don't.
Discussion: https://postgr.es/m/5767.1523995174@sss.pgh.pa.us

M src/backend/commands/event_trigger.c

Change more places to be less trusting of RestrictInfo.is_pushed_down.

On further reflection, commit e5d83995e didn't go far enough: pretty much
everywhere in the planner that examines a clause's is_pushed_down flag
ought to be changed to use the more complicated behavior where we also
check the clause's required_relids. Otherwise we could make incorrect
decisions about whether, say, a clause is safe to use as a hash clause.
Some (many?) of these places are safe as-is, either because they are
never reached while considering a parameterized path, or because there
are additional checks that would reject a pushed-down clause anyway.
However, it seems smarter to just code them all the same way rather
than rely on easily-broken reasoning of that sort.
In support of that, invent a new macro RINFO_IS_PUSHED_DOWN that should
be used in place of direct tests on the is_pushed_down flag.
Like the previous patch, back-patch to all supported branches.
Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se

In some cases a clause attached to an outer join can be pushed down into
the outer join's RHS even though the clause is not degenerate --- this
can happen if we choose to make a parameterized path for the RHS. If
the clause ends up attached to a lower outer join, we'd misclassify it
as being a "join filter" not a plain "filter" condition at that node,
leading to wrong query results.
To fix, teach extract_actual_join_clauses to examine each join clause's
required_relids, not just its is_pushed_down flag. (The latter now
seems vestigial, or at least in need of rethinking, but we won't do
anything so invasive as redefining it in a bug-fix patch.)
This has been wrong since we introduced parameterized paths in 9.2,
though it's evidently hard to hit given the lack of previous reports.
The test case used here involves a lateral function call, and I think
that a lateral reference may be required to get the planner to select
a broken plan; though I wouldn't swear to that. In any case, even if
LATERAL is needed to trigger the bug, it still affects all supported
branches, so back-patch to all.
Per report from Andreas Karlsson. Thanks to Andrew Gierth for
preliminary investigation.
Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se

The buffer was 100 bytes long, which is barely sufficient when the
version string gets longer (such as by configure --with-extra-version).
Set it to MAXPGPATH.
Author: Nikhil Sontakke
Discussion: https://postgr.es/m/CAMGcDxfLfpYU_Jru++L6ARPCOyxr0W+2O3Q54TDi5XdYeU36ow@mail.gmail.com

Commit 54eff5311 did not account for the possibility that we'd have
a transaction snapshot due to default_transaction_isolation being
set high enough to require one. The transaction snapshot is enough
to hold back our advertised xmin and thus risk deadlock anyway.
The only way to get rid of that snap is to start a new transaction,
so let's do that instead. Also throw in an assert checking that we
really have gotten to a state where no xmin is being advertised.
Back-patch to 9.4, like the previous commit.
Discussion: https://postgr.es/m/CAMkU=1ztk3TpQdcUNbxq93pc80FrXUjpDWLGMeVBDx71GHNwZQ@mail.gmail.com

spg_text_leaf_consistent() supposed that it should compare only
Min(querylen, entrylen) bytes of the two strings, and then deal with
any excess bytes in one string or the other by assuming the longer
string is greater if the prefixes are equal. Quite aside from the
fact that that's just wrong in some locales (e.g., 'ch' is not less
than 'd' in cs_CZ), it also risked passing incomplete multibyte
characters to strcoll(), with ensuing bad results.
Instead, just pass the full strings to varstr_cmp, and let it decide
what to do about unequal-length strings.
Fortunately, this error doesn't imply any index corruption, it's just
that searches might return the wrong set of entries.
Per report from Emre Hasegeli, though this is not his patch.
Thanks to Peter Geoghegan for review and discussion.
This code was born broken, so back-patch to all supported branches.
In HEAD, I failed to resist the temptation to do a bit of cosmetic
cleanup/pgindent'ing on 710d90da1, too.
Discussion: https://postgr.es/m/CAE2gYzzb6K51VnTq5i5p52z+j9p2duEa-K1T3RrC_GQEynAKEg@mail.gmail.com

Spelling access(2)'s second argument as "2" is just horrid.
POSIX makes no promises as to the numeric values of W_OK and related
macros. Even if it accidentally works as intended on every supported
platform, it's still unreadable and inconsistent with adjacent code.
In passing, don't spell "NULL" as "0" either. Yes, that's legal C;
no, it's not project style.
Back-patch, just in case the unportability is real and not theoretical.
(Most likely, even if a platform had different bit assignments for
access()'s modes, there'd not be an observable behavior difference
here; but I'm being paranoid today.)

M contrib/adminpack/adminpack.c

In libpq, free any partial query result before collecting a server error.

We'd throw away the partial result anyway after parsing the error message.
Throwing it away beforehand costs nothing and reduces the risk of
out-of-memory failure. Also, at least in systems that behave like
glibc/Linux, if the partial result was very large then the error PGresult
would get allocated at high heap addresses, preventing the heap storage
used by the partial result from being released to the OS until the error
PGresult is freed.
In psql >= 9.6, we hold onto the error PGresult until another error is
received (for \errverbose), so that this behavior causes a seeming
memory leak to persist for awhile, as in a recent complaint from
Darafei Praliaskouski. This is a potential performance regression from
older versions, justifying back-patching at least that far. But similar
behavior may occur in other client applications, so it seems worth just
back-patching to all supported branches.
Discussion: https://postgr.es/m/CAC8Q8tJ=7cOkPePyAbJE_Pf691t8nDFhJp0KZxHvnq_uicfyVg@mail.gmail.com

NISortAffixes() compared successive compound affixes incorrectly,
thus possibly failing to merge identical affixes, or (less likely)
merging ones that shouldn't be merged. The user-visible effects
of this are unclear, to me anyway.
Per bug #15150 from Alexander Lakhin. It's been broken for a long time,
so back-patch to all supported branches.
Arthur Zakirov
Discussion: https://postgr.es/m/152353327780.31225.13445405496721177988@wrigleys.postgresql.org

We were using CurrentMemoryContext to put the partsupfunc fmgr_info
into, which isn't right, because we want the PartitionKey as a whole to
be in the isolated Relation->rd_partkeycxt context. This can cause a
crash with user-defined support functions in the operator classes used
by partitioning keys. (Maybe this can cause problems with core-supplied
opclasses too, not sure.)
This is demonstrably broken in Postgres 10, too, but the initial
proposed fix runs afoul of a problem discussed back when 8a0596cb656e
("Get rid of copy_partition_key") reorganized that code: namely that it
is possible to jump out of RelationBuildPartitionKey because of some
error and leave a dangling memory context child of CacheMemoryContext.
Also, while reviewing this I noticed that the removed-in-pg11
copy_partition_key was doing something wrong, unfixed in pg10, namely
doing memcpy() on the FmgrInfo, which is bogus (should be doing
fmgr_info_copy). Therefore, in branch pg10, the sane fix seems to be to
backpatch both the aforementioned 8a0596cb656e and its followup
be2343221fb7 ("Protect against hypothetical memory leaks in
RelationGetPartitionKey"), so do that, then apply the fmgr_info memcxt
bugfix on top.
Add a test case exercising btree-based custom operator classes, which
causes a crash prior to this fix. This is not a security problem,
because in order to create an operator class you need superuser
privileges anyway.
Authors: Álvaro Herrera and Amit Langote
Reported and diagnosed by: Amit Langote
Discussion: https://postgr.es/m/3041e853-b1dd-a0c6-ff21-7cc5633bffd0@lab.ntt.co.jp

The nextOid value is from the start of the checkpoint and may well be stale
compared to values from more recent XLOG_NEXTOID records. Previously, we
adopted it anyway, allowing the OID counter to go backwards during a crash.
While this should be harmless, it contributed to the severity of the bug
fixed in commit 0408e1ed5, by allowing duplicate TOAST OIDs to be assigned
immediately following a crash. Without this error, that issue would only
have arisen when TOAST objects just younger than a multiple of 2^32 OIDs
were deleted and then not vacuumed in time to avoid a conflict.
Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com

When selecting a new OID, we take care to avoid picking one that's already
in use in the target table, so as not to create duplicates after the OID
counter has wrapped around. However, up to now we used SnapshotDirty when
scanning for pre-existing entries. That ignores committed-dead rows, so
that we could select an OID matching a deleted-but-not-yet-vacuumed row.
While that mostly worked, it has two problems:
* If recently deleted, the dead row might still be visible to MVCC
snapshots, creating a risk for duplicate OIDs when examining the catalogs
within our own transaction. Such duplication couldn't be visible outside
the object-creating transaction, though, and we've heard few if any field
reports corresponding to such a symptom.
* When selecting a TOAST OID, deleted toast rows definitely *are* visible
to SnapshotToast, and will remain so until vacuumed away. This leads to
a conflict that will manifest in errors like "unexpected chunk number 0
(expected 1) for toast value nnnnn". We've been seeing reports of such
errors from the field for years, but the cause was unclear before.
The fix is simple: just use SnapshotAny to search for conflicting rows.
This results in a slightly longer window before object OIDs can be
recycled, but that seems unlikely to create any large problems.
Pavan Deolasee
Discussion: https://postgr.es/m/CABOikdOgWT2hHkYG3Wwo2cyZJq2zfs1FH0FgX-=h4OLosXHf9w@mail.gmail.com

This fixes a bug whereby the st_appname, st_clienthostname, and
st_activity_raw fields for auxiliary processes point beyond the end of
their respective shared memory segments. As a result, the application_name
of a backend might show up as the client hostname of an auxiliary process.
Backpatch to v10, where this bug was introduced, when the auxiliary
processes were added to the array.
Author: Edmund Horner
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/CAMyN-kA7aOJzBmrYFdXcc7Z0NmW%2B5jBaf_m%3D_-77uRNyKC9r%3DA%40mail.gmail.com

The other strings, application_name and query string, were snapshotted to
local memory in pgstat_read_current_status(), but we forgot to do that for
client hostnames. As a result, the client hostname would appear to change in
the local copy, if the client disconnected.
Backpatch to all supported versions.
Author: Edmund Horner
Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/CAMyN-kA7aOJzBmrYFdXcc7Z0NmW%2B5jBaf_m%3D_-77uRNyKC9r%3DA%40mail.gmail.com

One improbable error-exit path in this function used close() where
it should have used CloseTransientFile(). This is unlikely to be
hit in the field, and I think the consequences wouldn't be awful
(just an elog(LOG) bleat later). But a bug is a bug, so back-patch
to 9.4 where this code came in.
Pan Bian
Discussion: https://postgr.es/m/152056616579.4966.583293218357089052@wrigleys.postgresql.org

Due to sloppy division of changes between f50c80dbb (which was not
back-patched) and 563a053bd, this piece of code was wrongly backpatched to
REL_10_STABLE and REL9_6_STABLE. This code never causes real error because
its condition is never satisfied, but it's a dead code, which needs to be
removed.
Alexander Korotkov per gripe from Tom Lane

This section confusingly used both "infile" and "outfile" to refer
to the same file, i.e. the textual output of pg_dump. Use "dumpfile"
for both cases, per suggestion from Jonathan Katz.
Discussion: https://postgr.es/m/152311295239.31235.6487236091906987117@wrigleys.postgresql.org

The atomics code asserts proper alignment in various places. That's
mainly because the alignment of 64bit integers is not sufficient for
atomic operations on all platforms. Some ABIs only have four byte
alignment, but don't have atomic behavior when crossing page
boundaries.
The flags code isn't affected by that however, as the type alignment
always is sufficient for atomic operations. Nevertheless the code
asserted alignment requirements. Before 8c3debbb it was only broken on
hppa, after it probably affect further platforms.
Thus remove the assertions for pg_atomic_flag operators.
Per buildfarm animal pademelon.
Discussion: https://postgr.es/m/7223.1523124425@sss.pgh.pa.us
Backpatch: 9.5-

The atomics fallback implementation for pg_atomic_flag was broken,
returning the inverted value from pg_atomic_test_set_flag(). This was
unnoticed because a) atomic flags were unused until recently b) the
test code wasn't run when the fallback implementation was in
use (because it didn't allow to test for some edge cases).
Fix the bug, and improve the fallback so it has the same behaviour as
the non-fallback implementation in the problematic edge cases. That
breaks ABI compatibility in the back branches when fallbacks are in
use, but given they were broken until now...
Author: Andres Freund
Reported-by: Daniel Gustafsson
Discussion:
https://postgr.es/m/FB948276-7B32-4B77-83E6-D00167F8EEB4@yesql.se
https://postgr.es/m/20180406233854.uni2h3mbnveczl32@alap3.anarazel.de
Backpatch: 9.5-, where the atomics abstraction was introduced.

The previous coding inadvertently checked the constraints for the
partitioned table rather than the target partition, which could
lead to data in a partition that fails to satisfy some constraint
on that partition. This problem seems to date back to when
table partitioning was introduced; prior to that, there was only
one target table for a COPY, so the problem didn't occur, and the
code just didn't get updated.
Etsuro Fujita, reviewed by Amit Langote and Ashutosh Bapat
Discussion: https://postgr.es/message-id/5ABA4074.1090500%40lab.ntt.co.jp

Avoid storing the result of PQsocket() in a pgsocket variable; it's
declared as int, and the no-socket test is properly written as "x < 0"
not "x == PGINVALID_SOCKET". This accidentally had no bad effect
because we never got to init_slot() with a bad connection, but it's
still wrong.
Actually, it seems like we should avoid storing the result for a long
period at all. The function's not so expensive that it's worth avoiding,
and the existing coding technique here would fail if anyone tried to
PQreset the connection during the life of the program. Hence, just
re-call PQsocket every time we construct a select(2) mask.
Speaking of select(), GetIdleSlot imagined that it could compute the
select mask once and continue to use it over multiple calls to
select_loop(), which is pretty bogus since that would stomp on the
mask on return. This could only matter if the function's outer loop
iterated more than once, which is unlikely (it'd take some connection
receiving data, but not enough to complete its command). But if it
did happen, we'd acquire "tunnel vision" and stop watching the other
connections for query termination, with the effect of losing parallelism.
Another way in which GetIdleSlot could lose parallelism is that once
PQisBusy returns false, it would lock in on that connection and do
PQgetResult until that returns NULL; in some cases that could result
in blocking. (Perhaps this can never happen in vacuumdb due to the
limited set of commands that it can issue, but I'm not quite sure
of that, and even if true today it's not a future-proof assumption.)
Refactor the code to do that properly, so that it risks blocking in
PQgetResult only in cases where we need to wait anyway.
Another loss-of-parallelism problem, which *is* easily demonstrable,
is that any setup queries issued during prepare_vacuum_command() were
always issued on the last-to-be-created connection, whether or not
that was idle. Long-running operations on that connection thus
prevented issuance of additional operations on the other ones, except
in the limited cases where no preparatory query was needed. Instead,
wait till we've identified a free connection and use that one.
Also, avoid core dump due to undersized malloc request in the case
that no tables are identified to be vacuumed.
The bogus no-socket test was noted by CharSyam, the other problems
identified in my own code review. Back-patch to 9.5 where parallel
vacuumdb was introduced.
Discussion: https://postgr.es/m/CAMrLSE6etb33-192DTEUGkV-TsvEcxtBDxGWG1tgNOMnQHwgDA@mail.gmail.com

M src/bin/scripts/vacuumdb.c

Fix bogus provolatile/proparallel markings on a few built-in functions.

Richard Yen reported that pg_upgrade failed if the target cluster had
force_parallel_mode = on, because binary_upgrade_create_empty_extension()
is marked parallel restricted, allowing it to be executed in parallel
mode, which complains because it tries to acquire an XID.
In general, no function that might try to modify database data should
be considered parallel safe or restricted, since execution of it might
force XID acquisition. We found several other examples of this mistake.
Furthermore, functions that execute user-supplied SQL queries or query
fragments, or pull data from user-supplied cursors, had better be marked
both volatile and parallel unsafe, because we don't know what the supplied
query or cursor might try to do. There were several tsquery and XML
functions that had the wrong proparallel marking for this, and some of
them were even mislabeled as to volatility.
All these bugs are old, dating back to 9.6 for the proparallel mistakes
and much further for the provolatile mistakes. We can't force a
catversion bump in the back branches, but we can at least ensure that
installations initdb'd in future have the right values.
Thomas Munro and Tom Lane
Discussion: https://postgr.es/m/CAEepm=2sNDScSLTfyMYu32Q=ob98ZGW-vM_2oLxinzSABGQ6VA@mail.gmail.com

After processing the filemap to build the list of chunks that will be
fetched from the source to rewing the target server, it is possible that
a file which was previously processed is removed from the source. A
simple example of such an occurence is a WAL segment which gets recycled
on the target in-between. When the filemap is processed, files not
categorized as relation files are first truncated to prepare for its
full copy of which is going to be taken from the source, divided into a
set of junks. However, for a recycled WAL segment, this would result in
a segment which has a zero-byte size. With such an empty file,
post-rewind recovery thinks that records are saved but they are actually
not because of the truncation which happened when processing the
filemap, resulting in data loss.
In order to fix the problem, make sure that files which are found as
removed on the source when receiving chunks of them are as well deleted
on the target server for consistency.
Back-patch to 9.5 where pg_rewind was added.
Author: Tsunakawa Takayuki
Reviewed-by: Michael Paquier
Reported-by: Tsunakawa Takayuki
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8DAAA2%40G01JPEXMBYT05

tuplesort_gettupleslot() passed back tuples allocated in the tuplesort's
own memory context, even when the caller was responsible to free them.
This created a double-free hazard, because some callers might destroy
the tuplesort object (via tuplesort_end) before trying to clean up the
last returned tuple. To avoid this, change the API to specify that the
tuple is allocated in the caller's memory context. v10 and HEAD already
did things that way, but in 9.5 and 9.6 this is a live bug that can
demonstrably cause crashes with some grouping-set usages.
In 9.5 and 9.6, this requires doing an extra tuple copy in some cases,
which is unfortunate. But the amount of refactoring needed to avoid it
seems excessive for a back-patched change, especially since the cases
where an extra copy happens are less performance-critical.
Likewise change tuplesort_getdatum() to return pass-by-reference Datums
in the caller's context not the tuplesort's context. There seem to be
no live bugs among its callers, but clearly the same sort of situation
could happen in future.
For other tuplesort fetch routines, continue to allocate the memory in
the tuplesort's context. This is a little inconsistent with what we now
do for tuplesort_gettupleslot() and tuplesort_getdatum(), but that's
preferable to adding new copy overhead in the back branches where it's
clearly unnecessary. These other fetch routines provide the weakest
possible guarantees about tuple memory lifespan from v10 on, anyway,
so this actually seems more consistent overall.
Adjust relevant comments to reflect these API redefinitions.
Arguably, we should change the pre-9.5 branches as well, but since
there are no known failure cases there, it seems not worth the risk.
Peter Geoghegan, per report from Bernd Helmle. Reviewed by Kyotaro
Horiguchi; thanks also to Andreas Seltenreich for extracting a
self-contained test case.
Discussion: https://postgr.es/m/1512661638.9720.34.camel@oopsware.de

The listed numbers disagreed with the ones being used in the symbols;
but instead of just fixing the numbers in the comment, use the symbolic
name instead, which seems clearer.
This has been wrong all along, so apply back to 9.5 where BRIN was
introduced.
Reported-by: Tomas Vondra
Discussion: https://postgr.es/m/5ff514f2-8b1e-6366-b11c-8e2ed442562d@2ndquadrant.com

Section 10.5 didn't say explicitly that multiple UNIONs are resolved
pairwise. Since the resolution algorithm is described as taking any
number of inputs, readers might well think that a query like
"select x union select y union select z" would be resolved by
considering x, y, and z in one resolution step. But that's not what
happens (and I think that behavior is per SQL spec). Add an example
clarifying this point.
Per bug #15129 from Philippe Beaudoin.
Discussion: https://postgr.es/m/152196085023.32649.9916472370480121694@wrigleys.postgresql.org

For years, our makefiles have correctly observed that "there is no correct
way to write a rule that generates two files". However, what we did is to
provide empty rules that "generate" the secondary output files from the
primary one, and that's not right either. Depending on the details of
the creating process, the primary file might end up timestamped later than
one or more secondary files, causing subsequent make runs to consider the
secondary file(s) out of date. That's harmless in a plain build, since
make will just re-execute the empty rule and nothing happens. But it's
fatal in a VPATH build, since make will expect the secondary file to be
rebuilt in the build directory. This would manifest as "file not found"
failures during VPATH builds from tarballs, if we were ever unlucky enough
to ship a tarball with apparently out-of-date secondary files. (It's not
clear whether that has ever actually happened, but it definitely could.)
To ensure that secondary output files have timestamps >= their primary's,
change our makefile convention to be that we provide a "touch $@" action
not an empty rule. Also, make sure that this rule actually gets invoked
during a distprep run, else the hazard remains.
It's been like this a long time, so back-patch to all supported branches.
In HEAD, I skipped the changes in src/backend/catalog/Makefile, because
those rules are due to get replaced soon in the bootstrap data format
patch, and there seems no need to create a merge issue for that patch.
If for some reason we fail to land that patch in v11, we'll need to
back-fill the changes in that one makefile from v10.
Discussion: https://postgr.es/m/18556.1521668179@sss.pgh.pa.us

Count the number of tuples in the index honestly, instead of assuming
that it's the same as the number of tuples in the heap. (It might be
different if the index is partial.)
Back-patch to all supported versions.
Tomas Vondra
Discussion: https://postgr.es/m/3b3d8eac-c709-0d25-088e-b98339a1b28a@2ndquadrant.com

Count the number of tuples in the index honestly, instead of assuming
that it's the same as the number of tuples in the heap. (It might be
different if the index is partial.)
Fix counting of tuples in current index page, too. This error would
have led to failing to write out the final page of the index if it
contained exactly one tuple, so that the last tuple of the relation
would not get indexed.
Back-patch to 9.6 where contrib/bloom was added.
Tomas Vondra and Tom Lane
Discussion: https://postgr.es/m/3b3d8eac-c709-0d25-088e-b98339a1b28a@2ndquadrant.com

Code that prints out the contents of setconfig or proconfig arrays in
SQL format needs to handle GUC_LIST_QUOTE variables differently from
other ones, because for those variables, flatten_set_variable_args()
already applied a layer of quoting. The value can therefore safely
be printed as-is, and indeed must be, or flatten_set_variable_args()
will muck it up completely on reload. For all other GUC variables,
it's necessary and sufficient to quote the value as a SQL literal.
We'd recognized the need for this long ago, but mis-analyzed the
need slightly, thinking that all GUC_LIST_INPUT variables needed
the special treatment. That's actually wrong, since a valid value
of a LIST variable might include characters that need quoting,
although no existing variables accept such values.
More to the point, we hadn't made any particular effort to keep the
various places that deal with this up-to-date with the set of variables
that actually need special treatment, meaning that we'd do the wrong
thing with, for example, temp_tablespaces values. This affects dumping
of SET clauses attached to functions, as well as ALTER DATABASE/ROLE SET
commands.
In ruleutils.c we can fix it reasonably honestly by exporting a guc.c
function that allows discovering the flags for a given GUC variable.
But pg_dump doesn't have easy access to that, so continue the old method
of having a hard-wired list of affected variable names. At least we can
fix it to have just one list not two, and update the list to match
current reality.
A remaining problem with this is that it only works for built-in
GUC variables. pg_dump's list obvious knows nothing of third-party
extensions, and even the "ask guc.c" method isn't bulletproof since
the relevant extension might not be loaded. There's no obvious
solution to that, so for now, we'll just have to discourage extension
authors from inventing custom GUCs that need GUC_LIST_QUOTE.
This has been busted for a long time, so back-patch to all supported
branches.
Michael Paquier and Tom Lane, reviewed by Kyotaro Horiguchi and
Pavel Stehule
Discussion: https://postgr.es/m/20180111064900.GA51030@paquier.xyz

If there were multiple grouping sets, none of them empty, all of which
were unsortable, then an oversight in consider_groupingsets_paths led
to a null pointer dereference. Fix, and add a regression test for this
case.
Per report from Dang Minh Huong, though I didn't use their patch.
Backpatch to 10.x where hashed grouping sets were added.

The original coding of the SP-GiST scan traversalValue feature (commit
ccd6eb49a) arranged for traversal values to be stored in the query's main
executor context. That's fine if there's only one index scan per query,
but if there are many, we have a memory leak as successive scans create
new traversal values. Fix it by creating a separate memory context for
traversal values, which we can reset during spgrescan(). Back-patch
to 9.6 where this code was introduced.
In principle, adding the traversalCxt field to SpGistScanOpaqueData
creates an ABI break in the back branches. But I (tgl) have little
sympathy for extensions including spgist_private.h, so I'm not very
worried about that. Alternatively we could stick the new field at the
end of the struct in back branches, but that has its own downsides.
Anton Dignös, reviewed by Alexander Kuzmenkov
Discussion: https://postgr.es/m/CALNdv1jb6y2Te-m8xHLxLX12RsBmZJ1f4hESX7J0HjgyOhA9eA@mail.gmail.com

refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:
1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query. (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.
The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause. I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.
While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching. That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.
Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us

Jeff Janes discovered that commit 7ca25b7de made one of the queries run by
REFRESH MATERIALIZED VIEW CONCURRENTLY perform badly. The root cause is
bad cardinality estimation for correlated quals, but a principled solution
to that problem is some way off, especially since the planner lacks any
statistics about whole-row variables. Moreover, in non-error cases this
query produces no rows, meaning it must be run to completion; but use of
LIMIT 1 encourages the planner to pick a fast-start, slow-completion plan,
exactly not what we want. Remove the LIMIT clause, and instead rely on
the count parameter we pass to SPI_execute() to prevent excess work if the
query does return some rows.
While we've heard no field reports of planner misbehavior with this query,
it could be that people are having performance issues that haven't reached
the level of pain needed to cause a bug report. In any case, that LIMIT
clause can't possibly do anything helpful with any existing version of the
planner, and it demonstrably can cause bad choices in some cases, so
back-patch to 9.4 where the code was introduced.
Thomas Munro
Discussion: https://postgr.es/m/CAMkU=1z-JoGymHneGHar1cru4F1XDfHqJDzxP_CtK5cL3DOfmg@mail.gmail.com

We make some changes to ModifyTableState and the EState it uses whenever
we route tuples to partitions; but we weren't restoring properly in all
cases, possibly causing crashes when partitions with different tuple
descriptors are targeted by tuples inserted in the same command.
Refactor some code, creating ExecPrepareTupleRouting, to encapsulate the
needed state changing logic, and have it invoked one level above its
current place (ie. put it in ExecModifyTable instead of ExecInsert);
this makes it all more readable.
Add a test case to exercise this.
We don't support having views as partitions; and since only views can
have INSTEAD OF triggers, there is no point in testing for INSTEAD OF
when processing insertions into a partitioned table. Remove code that
appears to support this (but which is actually never relevant.)
In passing, fix location of some very confusing comments in
ModifyTableState.
Reported-by: Amit Langote
Author: Etsuro Fujita, Amit Langote
Discussion: https://postgr/es/m/0473bf5c-57b1-f1f7-3d58-455c2230bc5f@lab.ntt.co.jp

If a view lacks an INSTEAD OF trigger, DML on it can only work by rewriting
the command into a command on the underlying base table(s). Then we will
fire triggers attached to those table(s), not those for the view. This
seems appropriate from a consistency standpoint, but nowhere was the
behavior explicitly documented, so let's do that.
There was some discussion of throwing an error or warning if a statement
trigger is created on a view without creating a row INSTEAD OF trigger.
But a simple implementation of that would result in dump/restore ordering
hazards. Given that it's been like this all along, and we hadn't heard
a complaint till now, a documentation improvement seems sufficient.
Per bug #15106 from Pu Qun. Back-patch to all supported branches.
Discussion: https://postgr.es/m/152083391168.1215.16892140713507052796@wrigleys.postgresql.org

In e170b8c8, protection against modified search_path was added. However,
PostgreSQL versions prior to 10 does not accept SQL commands over a
replication connection, so the protection would generate a syntax error.
Since we cannot run SQL commands on it, we are also not vulnerable to
the issue that e170b8c8 fixes, so we can just skip this command for
older versions.
Author: Michael Paquier <michael@paquier.xyz>

The test to exit the loop if the integer control value would overflow
an int32 turns out not to work on some ICC versions, as it's dependent
on the assumption that the compiler will execute the code as written
rather than "optimize" it. ICC lacks any equivalent of gcc's -fwrapv
switch, so it was optimizing on the assumption of no integer overflow,
and that breaks this. Rewrite into a form that in fact does not
do any overflowing computations.
Per Tomas Vondra and buildfarm member fulmar. It's been like this
for a long time, although it was not till we added a regression test
case covering the behavior (in commit dd2243f2a) that the problem
became apparent. Back-patch to all supported versions.
Discussion: https://postgr.es/m/50562fdc-0876-9843-c883-15b8566c7511@2ndquadrant.com

M src/pl/plpgsql/src/pl_exec.c

Fix WHERE CURRENT OF when the referenced cursor uses an index-only scan.

"UPDATE/DELETE WHERE CURRENT OF cursor_name" failed, with an error message
like "cannot extract system attribute from virtual tuple", if the cursor
was using a index-only scan for the target table. Fix it by digging the
current TID out of the indexscan state.
It seems likely that the same failure could occur for CustomScan plans
and perhaps some FDW plan types, so that leaving this to be treated as an
internal error with an obscure message isn't as good an idea as it first
seemed. Hence, add a bit of heaptuple.c infrastructure to let us deliver
a more on-topic message. I chose to make the message match what you get
for the case where execCurrentOf can't identify the target scan node at
all, "cursor "foo" is not a simply updatable scan of table "bar"".
Perhaps it should be different, but we can always adjust that later.
In the future, it might be nice to provide hooks that would let custom
scan providers and/or FDWs deal with this in other ways; but that's
not a suitable topic for a back-patchable bug fix.
It's been like this all along, so back-patch to all supported branches.
Yugo Nagata and Tom Lane
Discussion: https://postgr.es/m/20180201013349.937dfc5f.nagata@sraoss.co.jp

ExecHashTableCreate allocated some memory that wasn't freed by
ExecHashTableDestroy, specifically the per-hash-key function information.
That's not a huge amount of data, but if one runs a query that repeats
a hash join enough times, it builds up. Fix by arranging for the data
in question to be kept in the hashtable's hashCxt instead of leaving it
"loose" in the query-lifespan executor context. (This ensures that we'll
also clean up anything that the hash functions allocate in fn_mcxt.)
Per report from Amit Khandekar. It's been like this forever, so back-patch
to all supported branches.
Discussion: https://postgr.es/m/CAJ3gD9cFofAWGvcxLOxDHC=B0hjtW8yGmUsF2hdGh97CM38=7g@mail.gmail.com

This was not stated in so many words anywhere. Document it to make
clear that it's a design limitation and not just an oversight or
documentation omission.
Discussion: https://postgr.es/m/152089733343.1222.6927268289645380498@wrigleys.postgresql.org

Many of the objects we create during the regression tests are put in the
public schema, so that using the same names in different regression tests
creates a hazard of test failures if any two such scripts run concurrently.
This patch cleans up a bunch of latent hazards of that sort, as well as two
live hazards.
The current situation in this regard is far worse than it was a year or two
back, because practically all of the partitioning-related test cases have
reused table names with enthusiasm. I despaired of cleaning up that mess
within the five most-affected tests (create_table, alter_table, insert,
update, inherit); fortunately those don't run concurrently.
Other than partitioning problems, most of the issues boil down to using
names like "foo", "bar", "tmp", etc, without thought for the fact that
other test scripts might use similar names concurrently. I've made an
effort to make all such names more specific.
One of the live hazards was that commit 7421f4b8 caused with.sql to
create a table named "test", conflicting with a similarly-named table
in alter_table.sql; this was exposed in the buildfarm recently.
The other one was that join.sql and transactions.sql both create tables
named "foo" and "bar"; but join.sql's uses of those names date back
only to December or so.
Since commit 7421f4b8 was back-patched to v10, back-patch a minimal
fix for that problem. The rest of this is just future-proofing.
Discussion: https://postgr.es/m/4627.1521070268@sss.pgh.pa.us

Since these names are global, using the same ones in different regression
tests creates a hazard of test failures if any two such scripts run
concurrently. Let's establish a policy of not doing that. In the cases
where a conflict existed, I chose to rename both sides: in principle one
script or the other could've been left in possession of the common name,
but that seems to just invite more trouble of the same sort.
There are a number of places where scripts are using names that seem
unduly generic, but in the absence of actual conflicts I left them alone.
In addition, fix insert.sql's use of "someone_else" as a role name.
That's a flat out violation of longstanding project policy, so back-patch
that change to v10 where the usage appeared. The rest of this is just
future-proofing, as no two of these scripts are actually run concurrently
in the existing parallel_schedule.
Conflicts of schema-qualified names also exist, but will be dealt with
separately.
Discussion: https://postgr.es/m/4627.1521070268@sss.pgh.pa.us

Doing so causes a pg_upgrade of a database containing these objects to
fail whenever pg_class changes. And it's pointless anyway: we have more
interesting tables anyhow.
Discussion: https://postgr.es/m/CAD5tBc+s8pW9WvH2+_z=B4x95FD4QuzZKcaMpff_9H4rS0VU1A@mail.gmail.com

The logical replication type map seems to have been misused by its only
caller -- it would try to use the remote OID as input for local type
routines, which unsurprisingly could result in bogus "cache lookup
failed for type XYZ" errors, or random other type names being picked up
if they happened to use the right OID. Fix that, changing
Oid logicalrep_typmap_getid(Oid remoteid) to
char *logicalrep_typmap_gettypname(Oid remoteid)
which is more useful. If the remote type is not part of the typmap,
this simply prints "unrecognized type" instead of choking trying to
figure out -- a pointless exercise (because the only input for that
comes from replication messages, which are not under the local node's
control) and dangerous to boot, when called from within an error context
callback.
Once that is done, it comes to light that the local OID in the typmap
entry was not being used for anything; the type/schema names are what we
need, so remove local type OID from that struct.
Once you do that, it becomes pointless to attach a callback to regular
syscache invalidation. So remove that also.
Reported-by: Dang Minh Huong
Author: Masahiko Sawada
Reviewed-by: Álvaro Herrera, Petr Jelínek, Dang Minh Huong, Atsushi Torikoshi
Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A6BE964@BPXM05GP.gisp.nec.co.jp
Discussion: https://postgr.es/m/75DB81BEEA95B445AE6D576A0A5C9E936A6C4B0A@BPXM05GP.gisp.nec.co.jp

Autovacuum's 'workitem' request queue is of limited size, so requests
can fail if they arrive more quickly than autovacuum can process them.
Emit a log message when this happens, to provide better visibility of
this.
Backpatch to 10. While this represents an API change for
AutoVacuumRequestWork, that function is not yet prepared to deal with
external modules calling it, so there doesn't seem to be any risk (other
than log spam, that is.)
Author: Masahiko Sawada
Reviewed-by: Fabrízio Mello, Ildar Musin, Álvaro Herrera
Discussion: https://postgr.es/m/CAD21AoB1HrQhp6_4rTyHN5kWEJCEsG8YzsjZNt-ctoXSn5Uisw@mail.gmail.com

The existing logic for updating pg_class.reltuples trusted the sampling
results only for the pages ANALYZE actually visited, preferring to
believe the previous tuple density estimate for all the unvisited pages.
While there's some rationale for doing that for VACUUM (first that
VACUUM is likely to visit a very nonrandom subset of pages, and second
that we know for sure that the unvisited pages did not change), there's
no such rationale for ANALYZE: by assumption, it's looked at an unbiased
random sample of the table's pages. Furthermore, in a very large table
ANALYZE will have examined only a tiny fraction of the table's pages,
meaning it cannot slew the overall density estimate very far at all.
In a table that is physically growing, this causes reltuples to increase
nearly proportionally to the change in relpages, regardless of what is
actually happening in the table. This has been observed to cause reltuples
to become so much larger than reality that it effectively shuts off
autovacuum, whose threshold for doing anything is a fraction of reltuples.
(Getting to the point where that would happen seems to require some
additional, not well understood, conditions. But it's undeniable that if
reltuples is seriously off in a large table, ANALYZE alone will not fix it
in any reasonable number of iterations, especially not if the table is
continuing to grow.)
Hence, restrict the use of vac_estimate_reltuples() to VACUUM alone,
and in ANALYZE, just extrapolate from the sample pages on the assumption
that they provide an accurate model of the whole table. If, by very bad
luck, they don't, at least another ANALYZE will fix it; in the old logic
a single bad estimate could cause problems indefinitely.
In HEAD, let's remove vac_estimate_reltuples' is_analyze argument
altogether; it was never used for anything and now it's totally pointless.
But keep it in the back branches, in case any third-party code is calling
this function.
Per bug #15005. Back-patch to all supported branches.
David Gould, reviewed by Alexander Kuzmenkov, cosmetic changes by me
Discussion: https://postgr.es/m/20180117164916.3fdcf2e9@engels

In databases with many tables, re-fetching the statistics takes some time,
so that this behavior seriously decreases the available concurrency for
multiple autovac workers. There's discussion afoot about more complete
fixes, but a simple and back-patchable amelioration is to claim the table
and release the lock before rechecking stats. If we find out there's no
longer a reason to process the table, re-taking the lock to un-claim the
table is cheap enough.
(This patch is quite old, but got lost amongst a discussion of more
aggressive fixes. It's not clear when or if such a fix will be
accepted, but in any case it'd be unlikely to get back-patched.
Let's do this now so we have some improvement for the back branches.)
In passing, make the normal un-claim step take AutovacuumScheduleLock
not AutovacuumLock, since that is what is documented to protect the
wi_tableoid field. This wasn't an actual bug in view of the fact that
readers of that field hold both locks, but it creates some concurrency
penalty against operations that need only AutovacuumLock.
Back-patch to all supported versions.
Jeff Janes
Discussion: https://postgr.es/m/26118.1520865816@sss.pgh.pa.us

CREATE TABLE / LIKE with a bigint identity column would fail on
platforms where long is 32 bits. Copying the sequence values used
makeInteger(), which would truncate the 64-bit sequence data to 32 bits.
To fix, use makeFloat() instead, like the parser. (This does not
actually make use of floats, but stores the values as strings.)
Bug: #15096
Reviewed-by: Michael Paquier <michael@paquier.xyz>

One of the things canonicalize_qual() does is to remove constant-NULL
subexpressions of top-level AND/OR clauses. It does that on the assumption
that what it's given is a top-level WHERE clause, so that NULL can be
treated like FALSE. Although this is documented down inside a subroutine
of canonicalize_qual(), it wasn't mentioned in the documentation of that
function itself, and some callers hadn't gotten that memo.
Notably, commit d007a9505 caused get_relation_constraints() to apply
canonicalize_qual() to CHECK constraints. That allowed constraint
exclusion to misoptimize situations in which a CHECK constraint had a
provably-NULL subclause, as seen in the regression test case added here,
in which a child table that should be scanned is not. (Although this
thinko is ancient, the test case doesn't fail before 9.2, for reasons
I've not bothered to track down in detail. There may be related cases
that do fail before that.)
More recently, commit f0e44751d added an independent bug by applying
canonicalize_qual() to index expressions, which is even sillier since
those might not even be boolean. If they are, though, I think this
could lead to making incorrect index entries for affected index
expressions in v10. I haven't attempted to prove that though.
To fix, add an "is_check" parameter to canonicalize_qual() to specify
whether it should assume WHERE or CHECK semantics, and make it perform
NULL-elimination accordingly. Adjust the callers to apply the right
semantics, or remove the call entirely in cases where it's not known
that the expression has one or the other semantics. I also removed
the call in some cases involving partition expressions, where it should
be a no-op because such expressions should be canonical already ...
and was a no-op, independently of whether it could in principle have
done something, because it was being handed the qual in implicit-AND
format which isn't what it expects. In HEAD, add an Assert to catch
that type of mistake in future.
This represents an API break for external callers of canonicalize_qual().
While that's intentional in HEAD to make such callers think about which
case applies to them, it seems like something we probably wouldn't be
thanked for in released branches. Hence, in released branches, the
extra parameter is added to a new function canonicalize_qual_ext(),
and canonicalize_qual() is a wrapper that retains its old behavior.
Patch by me with suggestions from Dean Rasheed. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us

The changes in the CREATE POLICY man page from commit
87c2a17fee784c7e1004ba3d3c5d8147da676783 triggered a stylesheet bug that
created some warning messages and incorrect output. This installs a
workaround.
Also improve the whitespace a bit so it looks better.

The server won't actually start with that setting anymore, not since
we raised the default max_wal_senders to 10. Per discussion, we don't
wish to back down on that default, so instead raise the effective floor
for max_connections (to 20). It's still possible to configure a smaller
setting manually, but initdb won't set it that way.
Since that change happened in v10, back-patch to v10.
Kyotaro Horiguchi
Discussion: https://postgr.es/m/20180209.170823.42008365.horiguchi.kyotaro@lab.ntt.co.jp

If a walsender exits leaving data in reorderbuffers, the next walsender
that tries to decode the same transaction would append its decoded data
in the same spill files without truncating it first, which effectively
duplicate the data. Avoid that by removing any leftover reorderbuffer
spill files when a walsender starts.
Backpatch to 9.4; this bug has been there from the very beginning of
logical decoding.
Author: Craig Ringer, revised by me
Reviewed by: Álvaro Herrera, Petr Jelínek, Masahiko Sawada

Apparently, it doesn't work to use a plain cstring as a Name datum: you
may end up having random bytes because of failing to zero the bytes
after the terminating \0, as indicated by valgrind. I introduced this
bug in 5564c1181548, so backpatch this fix to REL_10_STABLE, like that
commit.
While at it, fix a slightly misleading comment, pointed out by David
Rowley.

The LIKE INCLUDING ALL clause to CREATE TABLE intuitively indicates
cloning of extended statistics on the source table, but it failed to do
so. Patch it up so that it does. Also include an INCLUDING STATISTICS
option to the LIKE clause, so that the behavior can be requested
individually, or excluded individually.
While at it, reorder the INCLUDING options, both in code and in docs, in
alphabetical order which makes more sense than feature-implementation
order that was previously used.
Backpatch this to Postgres 10, where extended statistics were
introduced, because this is seen as an oversight in a fresh feature
which is better to get consistent from the get-go instead of changing
only in pg11.
In pg11, comments on statistics objects are cloned too. In pg10 they
are not, because I (Álvaro) was too coward to change the parse node as
required to support it. Also, in pg10 I chose not to renumber the
parser symbols for the various INCLUDING options in LIKE, for the same
reason. Any corresponding user-visible changes (docs) are backpatched,
though.
Reported-by: Stephen Froehlich
Author: David Rowley
Reviewed-by: Álvaro Herrera, Tomas Vondra
Discussion: https://postgr.es/m/CY1PR0601MB1927315B45667A1B679D0FD5E5EF0@CY1PR0601MB1927.namprd06.prod.outlook.com

pg_rewind checks whether each file is a relation data file, from its path.
Previously this check logic had the bug which made pg_rewind fail to
recognize any relation data files in tablespaces. Which also caused
an assertion failure in pg_rewind.
Back-patch to 9.5 where pg_rewind was added.
Author: Takayuki Tsunakawa
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F8D6C7A@G01JPEXMBYT05

If convert_to_scalar is passed a pair of datatypes it can't cope with,
its former behavior was just to elog(ERROR). While this is OK so far as
the core code is concerned, there's extension code that would like to use
scalarltsel/scalargtsel/etc as selectivity estimators for operators that
work on non-core datatypes, and this behavior is a show-stopper for that
use-case. If we simply allow convert_to_scalar to return FALSE instead of
outright failing, then the main logic of scalarltsel/scalargtsel will work
fine for any operator that behaves like a scalar inequality comparison.
The lack of conversion capability will mean that we can't estimate to
better than histogram-bin-width precision, since the code will effectively
assume that the comparison constant falls at the middle of its bin. But
that's still a lot better than nothing. (Someday we should provide a way
for extension code to supply a custom version of convert_to_scalar, but
today is not that day.)
While poking at this issue, we noted that the existing code for handling
type bytea in convert_to_scalar is several bricks shy of a load.
It assumes without checking that if the comparison value is type bytea,
the bounds values are too; in the worst case this could lead to a crash.
It also fails to detoast the input values, so that the comparison result is
complete garbage if any input is toasted out-of-line, compressed, or even
just short-header. I'm not sure how often such cases actually occur ---
the bounds values, at least, are probably safe since they are elements of
an array and hence can't be toasted. But that doesn't make this code OK.
Back-patch to all supported branches, partly because author requested that,
but mostly because of the bytea bugs. The change in API for the exposed
routine convert_network_to_scalar() is theoretically a back-patch hazard,
but it seems pretty unlikely that any third-party code is calling that
function directly.
Tomas Vondra, with some adjustments by me
Discussion: https://postgr.es/m/b68441b6-d18f-13ab-b43b-9a72188a4e02@2ndquadrant.com

Sloppy coding in this function could lead to leaking a VM buffer pin,
or to attempting to free the same pin twice. Repair. While at it,
reduce the code's tendency to free and reacquire the same page pin.
Back-patch to 9.6; before that, this routine did not concern itself
with VM pages.
Amit Kapila and Tom Lane
Discussion: https://postgr.es/m/CAA4eK1KJKwhc=isgTQHjM76CAdVswzNeAuZkh_cx-6QgGkSEgA@mail.gmail.com

Previously, it just returned the heap tuple count, which might be only an
estimate, and would be completely the wrong thing if the index is partial.
Since this function scans every index page anyway to find free pages,
it's practically free to count the surviving index tuples. Let's do that
and return an accurate count.
This is easily visible as a wrong reltuples value for a partial GiST
index following VACUUM, so back-patch to all supported branches.
Andrey Borodin, reviewed by Michail Nikolaev
Discussion: https://postgr.es/m/151956654251.6915.675951950408204404.pgcf@coridan.postgresql.org

These errors have been seen in the field in corrupted-data situations.
It seems worthwhile to report them with ERRCODE_DATA_CORRUPTED, rather
than the generic ERRCODE_INTERNAL_ERROR, for the benefit of log monitoring
and tools like amcheck. However, use errmsg_internal so that the text
strings still aren't translated; it seems unlikely to be worth
translators' time to do so.
Back-patch to 9.3, like the predecessor commit d70cf811f that introduced
these elog calls originally (replacing Asserts).
Peter Geoghegan
Discussion: https://postgr.es/m/CAH2-Wzmn4-Pg-UGFwyuyK-wiTih9j32pwg_7T9iwqXpAUZr=Mg@mail.gmail.com

Commit 4800f16a7ad0 added some sanity checks to ensure we don't
accidentally corrupt data, but in one of them we failed to consider the
effects of a database upgraded from 9.2 or earlier, where a tuple
exclusively locked prior to the upgrade has a slightly different bit
pattern. Fix that by using the macro that we fixed in commit
74ebba84aeb6 for similar situations.
Reported-by: Alexandre Garcia
Reviewed-by: Andres Freund
Discussion: https://postgr.es/m/CAPYLKR6yxV4=pfW0Gwij7aPNiiPx+3ib4USVYnbuQdUtmkMaEA@mail.gmail.com
Andres suspects that this bug may have wider ranging consequences, but I
couldn't find anything.

M src/backend/access/heap/heapam.c

Fix IOS planning when only some index columns can return an attribute.

Since 9.5, it's possible that some but not all columns of an index
support returning the indexed value for index-only scans. If the
same indexed column appears in index columns that behave both ways,
check_index_only() supposed that it'd be OK to do an index-only scan
testing that column; but that fails if we have to recheck the indexed
condition on one of the columns that doesn't support this.
In principle we could make this work by remapping the recheck expressions
to pull the value from a column that does support returning the indexed
value. But such cases are so weird and rare that, at least for now,
it doesn't seem worth the trouble. Instead, just teach check_index_only
that a value is returnable only if all the index columns containing it
are returnable, rather than any of them.
Per report from David Pereiro Lagares. Back-patch to 9.5 where the
possibility of this situation appeared.
Kyotaro Horiguchi
Discussion: https://postgr.es/m/1516210494.1798.16.camel@nlpgo.com

Solaris 11.4 has built-in functions named b64_encode and b64_decode.
Rename ours to something else to avoid the conflict (fortunately,
ours are static so the impact is limited).
One could wish for less duplication of code in this area, but that
would be a larger patch and not very suitable for back-patching.
Since this is a portability fix, we want to put it into all supported
branches.
Report and initial patch by Rainer Orth, reviewed and adjusted a bit
by Michael Paquier
Discussion: https://postgr.es/m/ydd372wk28h.fsf@CeBiTec.Uni-Bielefeld.DE

specscanner.l had a fixed limit of 1024 bytes on the length of
individual SQL stanzas in an isolation test script. People are
starting to run into that, so fix it by making the buffer resizable.
Once we allow this in HEAD, it seems inevitable that somebody will
try to back-patch a test that exceeds the old limit, so back-patch
this change as a preventive measure.
Daniel Gustafsson
Discussion: https://postgr.es/m/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se

M src/test/isolation/specscanner.l

Fix up ecpg’s configuration so it handles “long long int” in MSVC builds.

Although configure-based builds correctly define HAVE_LONG_LONG_INT when
appropriate (in both pg_config.h and ecpg_config.h), builds using the MSVC
scripts failed to do so. This currently has no impact on the backend,
since it uses that symbol nowhere; but it does prevent ecpg from
supporting "long long int". Fix that.
Also, adjust Solution.pm so that in the constructed ecpg_config.h file,
the "#if (_MSC_VER > 1200)" covers only the LONG_LONG_INT-related
#defines, not the whole file. AFAICS this was a thinko on somebody's
part: ENABLE_THREAD_SAFETY should always be defined in Windows builds,
and in branches using USE_INTEGER_DATETIMES, the setting of that shouldn't
depend on the compiler version either. If I'm wrong, I imagine the
buildfarm will say so.
Per bug #15080 from Jonathan Allen; issue diagnosed by Michael Meskes
and Andrew Gierth. Back-patch to all supported branches.
Discussion: https://postgr.es/m/151935568942.1461.14623890240535309745@wrigleys.postgresql.org

Tom Kazimiers reported that transition tables don't work correctly when
they are scanned by more than one executor node. That's because commit
18ce3a4ab allocated separate read pointers for each executor node, as it
must, but failed to make them active at the appropriate times. Repair.
Thomas Munro
Discussion: https://postgr.es/m/20180224034748.bixarv6632vbxgeb%40dewberry.localdomain

I removed these functions altogether in HEAD, in commit db3af9feb, and
it emerges that that causes trouble for cross-branch upgrade testing.
We could put back stub functions but that seems pretty silly. Instead,
back-patch a minimal subset of db3af9feb, namely just removing the
CREATE FUNCTION commands.
Discussion: https://postgr.es/m/11927.1519756619@sss.pgh.pa.us

A before-update row trigger may choose to return the "new" or "old" tuple
unmodified. ExecBRUpdateTriggers failed to consider the second
possibility, and would proceed to free the "old" tuple even if it was the
one returned, leading to subsequent access to already-deallocated memory.
In debug builds this reliably leads to an "invalid memory alloc request
size" failure; in production builds it might accidentally work, but data
corruption is also possible.
This is a very old bug. There are probably a couple of reasons it hasn't
been noticed up to now. It would be more usual to return NULL if one
wanted to suppress the update action; returning "old" is significantly less
efficient since the update will occur anyway. Also, none of the standard
PLs would ever cause this because they all returned freshly-manufactured
tuples even if they were just copying "old". But commit 4b93f5799 changed
that for plpgsql, making it possible to see the bug with a plpgsql trigger.
Still, this is certainly legal behavior for a trigger function, so it's
ExecBRUpdateTriggers's fault not plpgsql's.
It seems worth creating a test case that exercises returning "old" directly
with a C-language trigger; testing this through plpgsql seems unreliable
because its behavior might change again.
Report and fix by Rushabh Lathia; regression test case by me.
Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAGPqQf1P4pjiNPrMof=P_16E-DFjt457j+nH2ex3=nBTew7tXw@mail.gmail.com