Previously, our regex code defined CHR_MAX as 0xfffffffe, which is a
bad choice because it is outside the range of type "celt" (int32).
Characters approaching that limit could lead to infinite loops in logic
such as "for (c = a; c <= b; c++)" where c is of type celt but the
range bounds are chr. Such loops will work safely only if CHR_MAX+1
is representable in celt, since c must advance to beyond b before the
loop will exit.
Fortunately, there seems no reason not to restrict CHR_MAX to 0x7ffffffe.
It's highly unlikely that Unicode will ever assign codes that high, and
none of our other backend encodings need characters beyond that either.
In addition to modifying the macro, we have to explicitly enforce character
range restrictions on the values of \u, \U, and \x escape sequences, else
the limit is trivially bypassed.
Also, the code for expanding case-independent character ranges in bracket
expressions had a potential integer overflow in its calculation of the
number of characters it could generate, which could lead to allocating too
small a character vector and then overwriting memory. An attacker with the
ability to supply arbitrary regex patterns could easily cause transient DOS
via server crashes, and the possibility for privilege escalation has not
been ruled out.
Quite aside from the integer-overflow problem, the range expansion code was
unnecessarily inefficient in that it always produced a result consisting of
individual characters, abandoning the knowledge that we had a range to
start with. If the input range is large, this requires excessive memory.
Change it so that the original range is reported as-is, and then we add on
any case-equivalent characters that are outside that range. With this
approach, we can bound the number of individual characters allowed without
sacrificing much. This patch allows at most 100000 individual characters,
which I believe to be more than the number of case pairs existing in
Unicode, so that the restriction will never be hit in practice.
It's still possible for range() to take awhile given a large character code
range, so also add statement-cancel detection to its loop. The downstream
function dovec() also lacked cancel detection, and could take a long time
given a large output from range().
Per fuzz testing by Greg Stark. Back-patch to all supported branches.
Security: CVE-2016-0773

In 61444bfb we started to allow HAVING clauses to be fully pushed down
into WHERE, even when grouping sets are in use. That turns out not to
work correctly, because grouping sets can "produce" NULLs, meaning that
filtering in WHERE and HAVING can have different results, even when no
aggregates or volatile functions are involved.
Instead only allow pushdown of empty grouping sets.
It'd be nice to do better, but the exact mechanics of deciding which
cases are safe are still being debated. It's important to give correct
results till we find a good solution, and such a solution might not be
appropriate for backpatching anyway.
Bug: #13863
Reported-By: 'wrb'
Diagnosed-By: Dean Rasheed
Author: Andrew Gierth
Reviewed-By: Dean Rasheed and Andres Freund
Discussion: 20160113183558.12989.56904@wrigleys.postgresql.org
Backpatch: 9.5, where grouping sets were introduced

Get rid of the false implication that PRIMARY KEY is exactly equivalent to
UNIQUE + NOT NULL. That was more-or-less true at one time in our
implementation, but the standard doesn't say that, and we've grown various
features (many of them required by spec) that treat a pkey differently from
less-formal constraints. Per recent discussion on pgsql-general.
I failed to resist the temptation to do some other wordsmithing in the
same area.

The parser doesn't allow qualification of column names appearing in
these clauses, but ruleutils.c would sometimes qualify them, leading
to dump/reload failures. Per bug #13891 from Onder Kalaci.
(In passing, make stanzas in ruleutils.c that save/restore varprefix
more consistent.)
Peter Geoghegan

Commit 45f6240a8fa9d355 added an assumption in ExecHashIncreaseNumBatches
and ExecHashIncreaseNumBuckets that they could find all tuples in the main
hash table by iterating over the "dense storage" introduced by that patch.
However, ExecHashRemoveNextSkewBucket continued its old practice of simply
re-linking deleted skew tuples into the main table's hashchains. Hence,
such tuples got lost during any subsequent increase in nbatch or nbuckets,
and would never get joined, as reported in bug #13908 from Seth P.
I (tgl) think that the aforesaid commit has got multiple design issues
and should be reworked rather completely; but there is no time for that
right now, so band-aid the problem by making ExecHashRemoveNextSkewBucket
physically copy deleted skew tuples into the "dense storage" arena.
The added test case is able to exhibit the problem by means of fooling the
planner with a WHERE condition that it will underestimate the selectivity
of, causing the initial nbatch estimate to be too small.
Tomas Vondra and Tom Lane. Thanks to David Johnston for initial
investigation into the bug report.

Commit 30d7ae3c76d2de144232ae6ab328ca86b70e72c3 introduced an HJDEBUG
stanza that probably didn't compile at the time, and definitely doesn't
compile now, because it refers to a nonexistent variable. It doesn't seem
terribly useful anyway, so just get rid of it.
While I'm fooling with it, use %z modifier instead of the obsolete hack of
casting size_t to unsigned long, and include the HashJoinTable's address in
each printout so that it's possible to distinguish the activities of
multiple hashjoins occurring in one query.
Noted while trying to use HJDEBUG to investigate bug #13908. Back-patch
to 9.5, because code that doesn't compile is certainly not very helpful.

Future PL/Java versions will close CVE-2016-0766 by making these GUCs
PGC_SUSET. This PostgreSQL change independently mitigates that PL/Java
vulnerability, helping sites that update PostgreSQL more frequently than
PL/Java. Back-patch to 9.1 (all supported versions).

deparseReturningList ended up adding up RETURNING NULL to the code, but
code elsewhere saw an empty list of attributes and concluded that it
should not expect tuples from the remote side.
Etsuro Fujita and Robert Haas, reviewed by Thom Brown

Since there currently is only one possible parenthesized option, namely
VERBOSE, it's a bit pointless to show it with "{ } [, ... ]". The curly
braces are useless and therefore confusing, as seen in a recent question
from Karsten Hilbert. Remove the extra decoration for the time being;
we can put it back when and if REINDEX grows some more options.

If a view is split into CREATE TABLE + CREATE RULE to break a circular
dependency, then any triggers on the view must be dumped/reloaded after
the CREATE RULE; else the backend may reject the CREATE TRIGGER because
it's the wrong type of trigger for a plain table. This works all right
in plain dump/restore because of pg_dump's sorting heuristic that places
triggers after rules. However, when using parallel restore, the ordering
must be enforced by a dependency --- and we didn't have one.
Fixing this is a mere matter of adding an addObjectDependency() call,
except that we need to be able to find all the triggers belonging to the
view relation, and there was no easy way to do that. Add fields to
pg_dump's TableInfo struct to remember where the associated TriggerInfo
struct(s) are.
Per bug report from Dennis Kögel. The failure can be exhibited at least
as far back as 9.1, so back-patch to all supported branches.

Commit e09996ff8dee3f70 was one brick shy of a load: it didn't insist
that the detected JSON number be the whole of the supplied string.
This allowed inputs such as "2016-01-01" to be misdetected as valid JSON
numbers. Per bug #13906 from Dmitry Ryabov.
In passing, be more wary of zero-length input (I'm not sure this can
happen given current callers, but better safe than sorry), and do some
minor cosmetic cleanup.

All the other jsonb function descriptions refer to the arguments as being
"jsonb", but these two said "json". Make it consistent. Per bug #13905
from Petru Florin Mihancea.
No catversion bump --- we can't force one in the back branches, and this
isn't very critical anyway.

listForeignTables' invocation of processSQLNamePattern did not match up
with the other ones that handle potentially-schema-qualified names; it
failed to make use of pg_table_is_visible() and also passed the name
arguments in the wrong order. Bug seems to have been aboriginal in commit
0d692a0dc9f0e532. It accidentally sort of worked as long as you didn't
inquire too closely into the behavior, although the silliness was later
exposed by inconsistencies in the test queries added by 59efda3e50ca4de6
(which I probably should have questioned at the time, but didn't).
Per bug #13899 from Reece Hart. Patch by Reece Hart and Tom Lane.
Back-patch to all affected branches.

We entirely randomly chose to initialize port->remote_host just after
printing the log_connections message, when we could perfectly well do it
just before, allowing %h and %r to work for that message. Per gripe from
Artem Tomyuk.

The original code was adding double quotes to an already-quoted
identifier, leading to nonsensical results. Remove the quoting call.
I introduced the broken code in 7eca575d1c of 9.5 era, so backpatch to
9.5.
Report and patch by Elvis Pranskevichus
Reviewed by Michael Paquier

Commit e529cd4ffa605c6f introduced an Assert requiring NAMEDATALEN to be
less than MAX_LEVENSHTEIN_STRLEN, which has been 255 for a long time.
Since up to that instant we had always allowed NAMEDATALEN to be
substantially more than that, this was ill-advised.
It's debatable whether we need MAX_LEVENSHTEIN_STRLEN at all (versus
putting a CHECK_FOR_INTERRUPTS into the loop), or whether it has to be
so tight; but this patch takes the narrower approach of just not applying
the MAX_LEVENSHTEIN_STRLEN limit to calls from the parser.
Trusting the parser for this seems reasonable, first because the strings
are limited to NAMEDATALEN which is unlikely to be hugely more than 256,
and second because the maximum distance is tightly constrained by
MAX_FUZZY_DISTANCE (though we'd forgotten to make use of that limit in one
place). That means the cost is not really O(mn) but more like O(max(m,n)).
Relaxing the limit for user-supplied calls is left for future research;
given the lack of complaints to date, it doesn't seem very high priority.
In passing, fix confusion between lengths-in-bytes and lengths-in-chars
in comments and error messages.
Per gripe from Kevin Day; solution suggested by Robert Haas. Back-patch
to 9.5 where the unwanted restriction was introduced.

Putting a reference to an expanded-format value into a Const node would be
a bad idea for a couple of reasons. It'd be possible for the supposedly
immutable Const to change value, if something modified the referenced
variable ... in fact, if the Const's reference were R/W, any function that
has the Const as argument might itself change it at runtime. Also, because
datumIsEqual() is pretty simplistic, the Const might fail to compare equal
to other Consts that it should compare equal to, notably including copies
of itself. This could lead to unexpected planner behavior, such as "could
not find pathkey item to sort" errors or inferior plans.
I have not been able to find any way to get an expanded value into a Const
within the existing core code; but Paul Ramsey was able to trigger the
problem by writing a datatype input function that returns an expanded
value.
The best fix seems to be to establish a rule that varlena values being
placed into Const nodes should be passed through pg_detoast_datum().
That will do nothing (and cost little) in normal cases, but it will flatten
expanded values and thereby avoid the above problems. Also, it will
convert short-header or compressed values into canonical format, which will
avoid possible unexpected lack-of-equality issues for those cases too.
And it provides a last-ditch defense against putting a toasted value into
a Const, which we already knew was dangerous, cf commit 2b0c86b66563cf2f.
(In the light of this discussion, I'm no longer sure that that commit
provided 100% protection against such cases, but this fix should do it.)
The test added in commit 65c3d05e18e7c530 to catch datatype input functions
with unstable results would fail for functions that returned expanded
values; but it seems a bit uncharitable to deem a result unstable just
because it's expressed in expanded form, so revise the coding so that we
check for bitwise equality only after applying pg_detoast_datum(). That's
a sufficient condition anyway given the new rule about detoasting when
forming a Const.
Back-patch to 9.5 where the expanded-object facility was added. It's
possible that this should go back further; but in the absence of clear
evidence that there's any live bug in older branches, I'll refrain for now.

Coverity quite reasonably complained that this check for fout==NULL
occurred after we'd already dereferenced fout. However, the check
is just dead code since there is no code path by which CreateArchive
can return a null pointer. Errors such as can't-open-that-file are
reported down inside CreateArchive, and control doesn't return.
So let's silence the warning by removing the dead code, rather than
continuing to pretend it does something.
Coverity didn't complain about this before 5b5fea2a1, so back-patch
to 9.5 like that patch.

pg_dump's original approach to handling extension member objects was to
run around and clear (or set) their dump flags rather late in its data
collection process. Unfortunately, quite a lot of code expects those flags
to be valid before that; which was an entirely reasonable expectation
before we added extensions. In particular, this explains Karsten Hilbert's
recent report of pg_upgrade failing on a database in which an extension
has been installed into the pg_catalog schema. Its objects are initially
marked as not-to-be-dumped on the strength of their schema, and later we
change them to must-dump because we're doing a binary upgrade of their
extension; but we've already skipped essential tasks like making associated
DO_SHELL_TYPE objects.
To fix, collect extension membership data first, and incorporate it in the
initial setting of the dump flags, so that those are once again correct
from the get-go. This has the undesirable side effect of slightly
lengthening the time taken before pg_dump acquires table locks, but testing
suggests that the increase in that window is not very much.
Along the way, get rid of ugly special-case logic for deciding whether
to dump procedural languages, FDWs, and foreign servers; dump decisions
for those are now correct up-front, too.
In 9.3 and up, this also fixes erroneous logic about when to dump event
triggers (basically, they were *always* dumped before). In 9.5 and up,
transform objects had that problem too.
Since this problem came in with extensions, back-patch to all supported
versions.

Rather than passing around DumpOptions and RestoreOptions as separate
arguments, add fields to struct Archive to carry pointers to these objects,
and access them through those fields when needed. There already was a
RestoreOptions pointer in Archive, though for no obvious reason it was part
of the "private" struct rather than out where pg_dump.c could see it.
Doing this allows reversion of quite a lot of parameter-addition changes
made in commit 0eea8047bf, which is a good thing IMO because this will
reduce the code delta between 9.4 and 9.5, probably easing a few future
back-patch efforts. Moreover, the previous commit only added a DumpOptions
argument to functions that had to have it at the time, which means we could
anticipate still more code churn (and more back-patch hazard) as the
requirement spread further. I'd hit exactly that problem in my upcoming
patch to fix extension membership marking, which is what motivated me to
do this.

Commit 866566a690bb9916 is insufficient to prevent dump/reload failures
when using transform modules in a database with both plpython2 and
plpython3 installed. The reason is that the transform extension scripts
use DO blocks as a mechanism to pull in the libpython library before
creating the transform function. It's necessary to preload the library
because the dynamic loader won't do it for us on every platform, leading
to "unresolved symbol" failures when the transform library is loaded.
But it's *not* necessary to execute Python code, and doing so will
provoke a multiple-Pythons-are-loaded error even after the preceding
commit.
To fix, use LOAD instead of a DO block. That requires superuser privilege,
but creation of a C function does anyway. It also embeds knowledge of
the underlying library name for each PL language; but that's wired into
the initdb-time contents of pg_pltemplate too, so that doesn't seem like
a large problem either. Note that CREATE TRANSFORM as such doesn't call
the language module at all.
Per a report from Paul Jones. Back-patch to 9.5 where transform modules
were introduced.

Commit 803716013dc1350f installed a safeguard against loading plpython2
and plpython3 at the same time, but asserted that both could still be
used in the same database, just not in the same session. However, that's
not actually all that practical because dumping and reloading will fail
(since both libraries necessarily get loaded into the restoring session).
pg_upgrade is even worse, because it checks for missing libraries by
loading every .so library mentioned in the entire installation into one
session, so that you can have only one across the whole cluster.
We can improve matters by not throwing the error immediately in _PG_init,
but only when and if we're asked to do something that requires calling
into libpython. This ameliorates both of the above situations, since
while execution of CREATE LANGUAGE, CREATE FUNCTION, etc will result in
loading plpython, it isn't asked to do anything interesting (at least
not if check_function_bodies is off, as it will be during a restore).
It's possible that this opens some corner-case holes in which a crash
could be provoked with sufficient effort. However, since plpython
only exists as an untrusted language, any such crash would require
superuser privileges, making it "don't do that" not a security issue.
To reduce the hazards in this area, the error is still FATAL when it
does get thrown.
Per a report from Paul Jones. Back-patch to 9.2, which is as far back
as the patch applies without work. (It could be made to work in 9.1,
but given the lack of previous complaints, I'm disinclined to expend
effort so far back. We've been pretty desultory about support for
Python 3 in 9.1 anyway.)

This loop uselessly fetched the argument after the one it's currently
looking at. No real harm is done since we couldn't possibly fetch off
the end of memory, but it's confusing to the reader.
Also remove a duplicate (and therefore confusing) PG_ARGISNULL check in
jsonb_build_object.
I happened to notice these things while trolling for missed null-arg
checks earlier today. Back-patch to 9.5, not because there is any
real bug, but just because 9.5 and HEAD are still in sync in this
file and we might as well keep them so.
In passing, re-pgindent.

A scan for missed proisstrict markings in the core code turned up
these functions:
brin_summarize_new_values
pg_stat_reset_single_table_counters
pg_stat_reset_single_function_counters
pg_create_logical_replication_slot
pg_create_physical_replication_slot
pg_drop_replication_slot
The first three of these take OID, so a null argument will normally look
like a zero to them, resulting in "ERROR: could not open relation with OID
0" for brin_summarize_new_values, and no action for the pg_stat_reset_XXX
functions. The other three will dump core on a null argument, though this
is mitigated by the fact that they won't do so until after checking that
the caller is superuser or has rolreplication privilege.
In addition, the pg_logical_slot_get/peek[_binary]_changes family was
intentionally marked nonstrict, but failed to make nullness checks on all
the arguments; so again a null-pointer-dereference crash is possible but
only for superusers and rolreplication users.
Add the missing ARGISNULL checks to the latter functions, and mark the
former functions as strict in pg_proc. Make that change in the back
branches too, even though we can't force initdb there, just so that
installations initdb'd in future won't have the issue. Since none of these
bugs rise to the level of security issues (and indeed the pg_stat_reset_XXX
functions hardly misbehave at all), it seems sufficient to do this.
In addition, fix some order-of-operations oddities in the slot_get_changes
family, mostly cosmetic, but not the part that moves the function's last
few operations into the PG_TRY block. As it stood, there was significant
risk for an error to exit without clearing historical information from
the system caches.
The slot_get_changes bugs go back to 9.4 where that code was introduced.
Back-patch appropriate subsets of the pg_proc changes into all active
branches, as well.

Given syntactically wrong input, widget_in() could call atof() with an
indeterminate pointer argument, typically leading to a crash; or if it
didn't do that, it might return a NULL pointer, which again would lead
to a crash since old-style C functions aren't supposed to do things
that way. Fix that by correcting the off-by-one syntax test and
throwing a proper error rather than just returning NULL.
Also, since widget_in and widget_out have been marked STRICT for a
long time, their tests for null inputs are just dead code; remove 'em.
In the oldest branches, also improve widget_out to use snprintf not
sprintf, just to be sure.
In passing, get rid of a long-since-useless sprintf into a local buffer
that nothing further is done with, and make some other minor coding
style cleanups.
In the intended regression-testing usage of these functions, none of
this is very significant; but if the regression test database were
left around in a production installation, these bugs could amount
to a minor security hazard.
Piotr Stefaniak, Michael Paquier, and Tom Lane

These functions readily crash when passed a NULL input value. The tests
themselves do not pass NULL values to them; but when the regression
database is used as a basis for fuzz testing, they cause a lot of noise.
Also, if someone were to leave a regression database lying about in a
production installation, these would create a minor security hazard.
Andreas Seltenreich

The error message wording for AttributeError has changed in Python 3.5.
For the plpython_error test, add a new expected file. In the
plpython_subtransaction test, we didn't really care what the exception
is, only that it is something coming from Python. So use a generic
exception instead, which has a message that doesn't vary across
versions.
Back-patch of commit f16d52269a196f7f303abe3b978d95ade265f05f, which
was previously back-patched into 9.2-9.4, but missed 9.5.

Turns out the only reason initdb -X worked is that pg_mkdir_p won't
whine if you point it at something that's a symlink to a directory.
Otherwise, the attempt to create pg_xlog/ just like all the other
subdirectories would have failed. Let's be a little more explicit
about what's happening. Oversight in my patch for bug #13853
(mea culpa for not testing -X ...)

M src/bin/initdb/initdb.c

Use plain mkdir() not pg_mkdir_p() to create subdirectories of PGDATA.

When we're creating subdirectories of PGDATA during initdb, we know darn
well that the parent directory exists (or should exist) and that the new
subdirectory doesn't (or shouldn't). There is therefore no need to use
anything more complicated than mkdir(). Using pg_mkdir_p() just opens us
up to unexpected failure modes, such as the one exhibited in bug #13853
from Nuri Boardman. It's not very clear why pg_mkdir_p() went wrong there,
but it is clear that we didn't need to be trying to create parent
directories in the first place. We're not even saving any code, as proven
by the fact that this patch nets out at minus five lines.
Since this is a response to a field bug report, back-patch to all branches.

pg_ctl is using isatty() to verify whether the process is running in a
terminal, and if not it sends its output to Windows' Event Log ... which
does the wrong thing when the output has been redirected to a pipe, as
reported in bug #13592.
To fix, make pg_ctl use the code we already have to detect service-ness:
in the master branch, move src/backend/port/win32/security.c to src/port
(with suitable tweaks so that it runs properly in backend and frontend
environments); pg_ctl already has access to pgport so it Just Works. In
older branches, that's likely to cause trouble, so instead duplicate the
required code in pg_ctl.c.
Author: Michael Paquier
Bug report and diagnosis: Egon Kocjan
Backpatch: all supported branches

The order of inclusion of .o files makes a difference in linker output;
not a functional difference, but still a bitwise difference, which annoys
some packagers who would like reproducible builds.
Report and patch by Christoph Berg

A pointless and confusing error message is shown to the user when
attempting to identify a 9.3 or older remote server with a 9.5/9.6
pg_receivexlog, because the return signature of IDENTIFY_SYSTEM was
changed in 9.4. There's no good reason for the warning message, so
shuffle code around to keep it quiet.
(pg_recvlogical is also affected by this commit, but since it obviously
cannot work with 9.3 that doesn't actually matter much.)
Backpatch to 9.5.
Reported by Marco Nenciarini, who also wrote the initial patch. Further
tweaked by Robert Haas and Fujii Masao; reviewed by Michael Paquier and
Craig Ringer.