If any error occurred while we were in the middle of reading a protocol
message from the client, we could lose sync, and incorrectly try to
interpret a part of another message as a new protocol message. That will
usually lead to an "invalid frontend message" error that terminates the
connection. However, this is a security issue because an attacker might
be able to deliberately cause an error, inject a Query message in what's
supposed to be just user data, and have the server execute it.
We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other
operations that could ereport(ERROR) in the middle of processing a message,
but a query cancel interrupt or statement timeout could nevertheless cause
it to happen. Also, the V2 fastpath and COPY handling were not so careful.
It's very difficult to recover in the V2 COPY protocol, so we will just
terminate the connection on error. In practice, that's what happened
previously anyway, as we lost protocol sync.
To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set
whenever we're in the middle of reading a message. When it's set, we cannot
safely ERROR out and continue running, because we might've read only part
of a message. PqCommReadingMsg acts somewhat similarly to critical sections
in that if an error occurs while it's set, the error handler will force the
connection to be terminated, as if the error was FATAL. It's not
implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted
to PANIC in critical sections, because we want to be able to use
PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes
advantage of that to prevent an OOM error from terminating the connection.
To prevent unnecessary connection terminations, add a holdoff mechanism
similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel
interrupts, but still allow die interrupts. The rules on which interrupts
are processed when are now a bit more complicated, so refactor
ProcessInterrupts() and the calls to it in signal handlers so that the
signal handlers always call it if ImmediateInterruptOK is set, and
ProcessInterrupts() can decide to not do anything if the other conditions
are not met.
Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund.
Backpatch to all supported versions.
Security: CVE-2015-0244

This covers alterations to buffer sizing and zeroing made between imath
1.3 and imath 1.20. Valgrind Memcheck identified the buffer overruns
and reliance on uninitialized data; their exploit potential is unknown.
Builds specifying --with-openssl are unaffected, because they use the
OpenSSL BIGNUM facility instead of imath. Back-patch to 9.0 (all
supported versions).
Security: CVE-2015-0243

Most callers pass a stack buffer. The ensuing stack smash can crash the
server, and we have not ruled out the viability of attacks that lead to
privilege escalation. Back-patch to 9.0 (all supported versions).
Marko Tiikkaja
Security: CVE-2015-0243

Prevent port/snprintf() from overflowing its local fixed-size
buffer and pad to the desired number of digits with zeros, even
if the precision is beyond the ability of the native sprintf().
port/snprintf() is only used on systems that lack a native
snprintf().
Reported by Bruce Momjian. Patch by Tom Lane. Backpatch to all
supported versions.
Security: CVE-2015-0242

Previously very long localized month and weekday strings could
overflow the allocated buffers, causing a server crash.
Reported and patch reviewed by Noah Misch. Backpatch to all
supported versions.
Security: CVE-2015-0241

Previously very long field masks for floats could access memory
beyond the existing buffer allocated to hold the result.
Reported by Andres Freund and Peter Geoghegan. Backpatch to all
supported versions.
Security: CVE-2015-0241

The previous wording claimed that the file was always in /etc, but of
course this varies with the installation layout. Write instead that it
can be found via `pg_config --sysconfdir`. Even though this is still
somewhat incorrect because it doesn't account of moved installations, it
at least conveys that the location depends on the installation.

"ECHO all" is ignored for interactive input, and has been for a very long
time, though possibly not for as long as the documentation has claimed the
opposite. Fix that, and also note that empty lines aren't echoed, which
while dubious is another longstanding behavior (it's embedded in our
regression test files for one thing). Per bug #12721 from Hans Ginzel.
In HEAD, also improve the code comments in this area, and suppress an
unnecessary fflush(stdout) when we're not echoing. That would likely
be safe to back-patch, but I'll not risk it mere hours before a release
wrap.

Coverity points out that mdc_finish returns a pointer to a local buffer
(which of course is gone as soon as the function returns), leaving open
a risk of misbehaviors possibly as bad as a stack overwrite.
In reality, the only possible call site is in process_data_packets()
which does not examine the returned pointer at all. So there's no
live bug, but nonetheless the code is confusing and risky. Refactor
to avoid the issue by letting process_data_packets() call mdc_finish()
directly instead of going through the pullf_read() API.
Although this is only cosmetic, it seems good to back-patch so that
the logic in pgp-decrypt.c stays in sync across all branches.
Marko Kreen

In 804b6b6db4dcfc590a468e7be390738f9f7755fb we modified
BuildIndexValueDescription to pay attention to which columns are visible
to the user, but unfortunatley that commit neglected to consider indexes
which are built on expressions.
Handle error-reporting of violations of constraint indexes based on
expressions by not returning any detail when the user does not have
table-level SELECT rights.
Backpatch to 9.0, as the prior commit was.
Pointed out by Tom.

connectby() didn't adequately check that the constructed SQL query returns
what it's expected to; in fact, since commit 08c33c426bfebb32 it wasn't
checking that at all. This could result in a null-pointer-dereference
crash if the constructed query returns only one column instead of the
expected two. Less excitingly, it could also result in surprising data
conversion failures if the constructed query returned values that were
not I/O-conversion-compatible with the types specified by the query
calling connectby().
In all branches, insist that the query return at least two columns;
this seems like a minimal sanity check that can't break any reasonable
use-cases.
In HEAD, insist that the constructed query return the types specified by
the outer query, including checking for typmod incompatibility, which the
code never did even before it got broken. This is to hide the fact that
the implementation does a conversion to text and back; someday we might
want to improve that.
In back branches, leave that alone, since adding a type check in a minor
release is more likely to break things than make people happy. Type
inconsistencies will continue to work so long as the actual type and
declared type are I/O representation compatible, and otherwise will fail
the same way they used to.
Also, in all branches, be on guard for NULL results from the constructed
query, which formerly would cause null-pointer dereference crashes.
We now print the row with the NULL but don't recurse down from it.
In passing, get rid of the rather pointless idea that
build_tuplestore_recursively() should return the same tuplestore that's
passed to it.
Michael Paquier, adjusted somewhat by me

GetLockConflicts() has for a long time not properly terminated the
returned array. During normal processing the returned array is zero
initialized which, while not pretty, is sufficient to be recognized as
a invalid virtual transaction id. But the HotStandby case is more than
aesthetically broken: The allocated (and reused) array is neither
zeroed upon allocation, nor reinitialized, nor terminated.
Not having a terminating element means that the end of the array will
not be recognized and that recovery conflict handling will thus read
ahead into adjacent memory. Only terminating when hitting memory
content that looks like a invalid virtual transaction id. Luckily
this seems so far not have caused significant problems, besides making
recovery conflict more expensive.
Discussion: 20150127142713.GD29457@awork2.anarazel.de
Backpatch into all supported branches.

Fix bug where GIN scan keys were not initialized with gin_fuzzy_search_limit.

When gin_fuzzy_search_limit was used, we could jump out of startScan()
without calling startScanKey(). That was harmless in 9.3 and below, because
startScanKey()() didn't do anything interesting, but in 9.4 it initializes
information needed for skipping entries (aka GIN fast scans), and you
readily get a segfault if it's not done. Nevertheless, it was clearly wrong
all along, so backpatch all the way to 9.1 where the early return was
introduced.
(AFAICS startScanKey() did nothing useful in 9.3 and below, because the
fields it initialized were already initialized in ginFillScanKey(), but I
don't dare to change that in a minor release. ginFillScanKey() is always
called in gingetbitmap() even though there's a check there to see if the
scan keys have already been initialized, because they never are; ginrescan()
free's them.)
In the passing, remove unnecessary if-check from the second inner loop in
startScan(). We already check in the first loop that the condition is true
for all entries.
Reported by Olaf Gawenda, bug #12694, Backpatch to 9.1 and above, although
AFAICS it causes a live bug only in 9.4.

Commit 804b6b6db4dcfc590a468e7be390738f9f7755fb added the build of a
range table in copy.c to initialize the EState es_range_table since it
can be needed in error paths. Unfortunately, that commit didn't
appreciate that some code paths might end up not initializing the rte
which is used to build the range table.
Fix that and clean up a couple others things along the way- build it
only once and don't explicitly set it on the !is_from path as it
doesn't make any sense there (cstate is palloc0'd, so this isn't an
issue from an initializing standpoint either).
The prior commit went back to 9.0, but this only goes back to 9.1 as
prior to that the range table build happens immediately after building
the RTE and therefore doesn't suffer from this issue.
Pointed out by Robert.

While building error messages to return to the user,
BuildIndexValueDescription and ri_ReportViolation would happily include
the entire key or entire row in the result returned to the user, even if
the user didn't have access to view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription.
Note that, for key cases, the user must have access to all of the
columns for the key to be shown; a partial key will not be returned.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.

Commit 145343534c153d1e6c3cff1fa1855787684d9a38 arranged to store numeric
NaN values as short-header numerics, but the field access macros did not
get the memo: they thought only "SHORT" numerics have short headers.
Most of the time this makes no difference because we don't access the
weight or dscale of a NaN; but numeric_send does that. As pointed out
by Andrew Gierth, this led to fetching uninitialized bytes.
AFAICS this could not have any worse consequences than that; in particular,
an unaligned stored numeric would have been detoasted by PG_GETARG_NUMERIC,
so that there's no risk of a fetch off the end of memory. Still, the code
is wrong on its own terms, and it's not hard to foresee future changes that
might expose us to real risks. So back-patch to all affected branches.

The "callargs" variable is modified within PG_TRY and then referenced
within PG_CATCH, which is exactly the coding pattern we've now found
to be unsafe. Marking "callargs" volatile would be problematic because
it is passed by reference to some Tcl functions, so fix the problem
by not modifying it within PG_TRY. We can just postpone the free()
till we exit the PG_TRY construct, as is already done elsewhere in this
same file.
Also, fix failure to free(callargs) when exiting on too-many-arguments
error. This is only a minor memory leak, but a leak nonetheless.
In passing, remove some unnecessary "volatile" markings in the same
function. Those doubtless are there because gcc 2.95.3 whinged about
them, but we now know that its algorithm for complaining is many bricks
shy of a load.
This is certainly a live bug with compilers that optimize similarly
to current gcc, so back-patch to all active branches.

The "pos" variable is modified within PG_TRY and then referenced
within PG_CATCH, so for strict POSIX conformance it must be marked
volatile. Superficially the code looked safe because pos's address
was taken, which was sufficient to force it into memory ... but it's
not sufficient to ensure that the compiler applies updates exactly
where the program text says to. The volatility marking has to extend
into a couple of subroutines too, but I think that's probably a good
thing because the risk of out-of-order updates is mostly in those
subroutines not asyncQueueReadAllNotifications() itself. In principle
the compiler could have re-ordered operations such that an error could
be thrown while "pos" had an incorrect value.
It's unclear how real the risk is here, but for safety back-patch
to all active branches.

strncpy() has a well-deserved reputation for being unsafe, so make an
effort to get rid of nearly all occurrences in HEAD.
A large fraction of the remaining uses were passing length less than or
equal to the known strlen() of the source, in which case no null-padding
can occur and the behavior is equivalent to memcpy(), though doubtless
slower and certainly harder to reason about. So just use memcpy() in
these cases.
In other cases, use either StrNCpy() or strlcpy() as appropriate (depending
on whether padding to the full length of the destination buffer seems
useful).
I left a few strncpy() calls alone in the src/timezone/ code, to keep it
in sync with upstream (the IANA tzcode distribution). There are also a
few such calls in ecpg that could possibly do with more analysis.
AFAICT, none of these changes are more than cosmetic, except for the four
occurrences in fe-secure-openssl.c, which are in fact buggy: an overlength
source leads to a non-null-terminated destination buffer and ensuing
misbehavior. These don't seem like security issues, first because no stack
clobber is possible and second because if your values of sslcert etc are
coming from untrusted sources then you've got problems way worse than this.
Still, it's undesirable to have unpredictable behavior for overlength
inputs, so back-patch those four changes to all active branches.

Move random() and setseed() to a separate table, to have them grouped
together. Also add a notice that random() is not cryptographically secure.
Back-patch of commit 75fdcec14543b60cc0c67483d8cc47d5c7adf1a8 into
all supported versions, per discussion of the need to document that
random() is just a wrapper around random(3).

In pg_regress, remove the temporary installation upon successful exit.

This results in a very substantial reduction in disk space usage during
"make check-world", since that sequence involves creation of numerous
temporary installations. It should also help a bit in the buildfarm, even
though the buildfarm script doesn't create as many temp installations,
because the current script misses deleting some of them; and anyway it
seems better to do this once in one place rather than expecting that
script to get it right every time.
In 9.4 and HEAD, also undo the unwise choice in commit b1aebbb6a86e96d7
to report strerror(errno) after a rmtree() failure. rmtree has already
reported that, possibly for multiple failures with distinct errnos; and
what's more, by the time it returns there is no good reason to assume
that errno still reflects the last reportable error. So reporting errno
here is at best redundant and at worst badly misleading.
Back-patch to all supported branches, so that future revisions of the
buildfarm script can rely on this behavior.

Per discussion, change the log level of this message to be LOG not WARNING.
The main point of this change is to avoid causing buildfarm run failures
when the stats collector is exceptionally slow to respond, which it not
infrequently is on some of the smaller/slower buildfarm members.
This change does lose notice to an interactive user when his stats query
is looking at out-of-date stats, but the majority opinion (not necessarily
that of yours truly) is that WARNING messages would probably not get
noticed anyway on heavily loaded production systems. A LOG message at
least ensures that the problem is recorded somewhere where bulk auditing
for the issue is possible.
Also, instead of an untranslated "pgstat wait timeout" message, provide
a translatable and hopefully more understandable message "using stale
statistics instead of current ones because stats collector is not
responding". The original text was written hastily under the assumption
that it would never really happen in practice, which we now know to be
unduly optimistic.
Back-patch to all active branches, since we've seen the buildfarm issue
in all branches.

Previously, the xml value resulting from an xpath query would not have
namespace declarations if the namespace declarations were attached to
an ancestor element in the input xml value. That means the output value
was not correct XML. Fix that by running the result value through
xmlCopyNode(), which produces the correct namespace declarations.
Author: Ali Akbar <the.apaan@gmail.com>

Commit 894459e59ffa5c7fee297b246c17e1f72564db1d revealed this option to
be broken for NLS builds on Darwin, but "make -C contrib/unaccent check"
and the buildfarm client rely on it. Fix that configuration by
redefining the option to imply LANG=C on Darwin. In passing, use LANG=C
instead of LANG=en on Windows; since only postmaster startup uses that
value, testers are unlikely to notice the change. Back-patch to 9.0,
like the predecessor commit.

Up to now, the "child" executor state trees generated for EvalPlanQual
rechecks have simply shared the ResultRelInfo arrays used for the original
execution tree. However, this leads to dangling-pointer problems, because
ExecInitModifyTable() is all too willing to scribble on some fields of the
ResultRelInfo(s) even when it's being run in one of those child trees.
This trashes those fields from the perspective of the parent tree, because
even if the generated subtree is logically identical to what was in use in
the parent, it's in a memory context that will go away when we're done
with the child state tree.
We do however want to share information in the direction from the parent
down to the children; in particular, fields such as es_instrument *must*
be shared or we'll lose the stats arising from execution of the children.
So the simplest fix is to make a copy of the parent's ResultRelInfo array,
but not copy any fields back at end of child execution.
Per report from Manuel Kniep. The added isolation test is based on his
example. In an unpatched memory-clobber-enabled build it will reliably
fail with "ctid is NULL" errors in all branches back to 9.1, as a
consequence of junkfilter->jf_junkAttNo being overwritten with $7f7f.
This test cannot be run as-is before that for lack of WITH syntax; but
I have no doubt that some variant of this problem can arise in older
branches, so apply the code change all the way back.

Previously, read() might have returned a length equal to the buffer
length, and then the subsequent store to buf[len] would write a
zero-byte one byte past the end. This doesn't seem likely to be
a security issue, but there's some chance it could result in
pg_standby misbehaving.
Spotted by Coverity; patch by Michael Paquier, reviewed by me.

Commit b94ce6e80 reordered postmaster's startup sequence so that the
tempfile directory is only cleaned up after all the necessary state
for pg_ctl is collected. Unfortunately the chosen location is after
the syslogger has been started; which normally is fine, except for
!WIN32 EXEC_BACKEND builds, which pass information to children via
files in the temp directory.
Move the call to RemovePgTempFiles() to just before the syslogger has
started. That's the first child we fork.
Luckily EXEC_BACKEND is pretty much only used by endusers on windows,
which has a separate method to pass information to children. That
means the real world impact of this bug is very small.
Discussion: 20150113182344.GF12272@alap3.anarazel.de
Backpatch to 9.1, just as the previous commit was.

I noticed the "vacuum" regression test taking really significantly longer
than it used to on a slow machine. Investigation pointed the finger at
commit e415b469b33ba328765e39fd62edcd28f30d9c3c, which added creation of
an index using an extremely expensive index function. That function was
evidently meant to be applied only twice ... but the test re-used an
existing test table, which up till a couple lines before that had had over
two thousand rows. Depending on timing of the concurrent regression tests,
the intervening VACUUMs might have been unable to remove those
recently-dead rows, and then the index build would need to create index
entries for them too, leading to the wrap_do_analyze() function being
executed 2000+ times not twice. Avoid this by using a different table
that is guaranteed to have only the intended two rows in it.
Back-patch to 9.0, like the commit that created the problem.

The corner case where a relcache invalidation tried to rebuild the
entry for a referenced relation but couldn't find it in the catalog
wasn't correct.
The code tried to RelationCacheDelete/RelationDestroyRelation the
entry. That didn't work when assertions are enabled because the latter
contains an assertion ensuring the refcount is zero. It's also more
generally a bad idea, because by virtue of being referenced somebody
might actually look at the entry, which is possible if the error is
trapped and handled via a subtransaction abort.
Instead just error out, without deleting the entry. As the entry is
marked invalid, the worst that can happen is that the invalid (and at
some point unused) entry lingers in the relcache.
Discussion: 22459.1418656530@sss.pgh.pa.us
There should be no way to hit this case < 9.4 where logical decoding
introduced a bug that can hit this. But since the code for handling
the corner case is there it should do something halfway sane, so
backpatch all the the way back. The logical decoding bug will be
handled in a separate commit.

WAL (and timeline history) files created by pg_basebackup did not
maintain the new base backup's archive status. That's currently not a
problem if the new node is used as a standby - but if that node is
promoted all still existing files can get archived again. With a high
wal_keep_segment settings that can happen a significant time later -
which is quite confusing.
Change both the backend (for the -x/-X fetch case) and pg_basebackup
(for -X stream) itself to always mark WAL/timeline files included in
the base backup as .done. That's in line with walreceiver.c doing so.
The verbosity of the pg_basebackup changes show pretty clearly that it
needs some refactoring, but that'd result in not be backpatchable
changes.
Backpatch to 9.1 where pg_basebackup was introduced.
Discussion: 20141205002854.GE21964@awork2.anarazel.de

Use the phraseology "ISO 8601 week-numbering year" in place of just
"ISO year", and make related adjustments to other terminology.
The point of this change is that it seems some people see "ISO year"
and think "standard year", whereupon they're surprised when constructs
like to_char(..., "IYYY-MM-DD") produce nonsensical results. Perhaps
hanging a few more adjectives on it will discourage them from jumping
to false conclusions. I put in an explicit warning against that
specific usage, too, though the main point is to discourage people
who haven't read this far down the page.
In passing fix some nearby markup and terminology inconsistencies.

For simple boolean variables such as ON_ERROR_STOP, psql has for a long
time recognized variant spellings of "on" and "off" (such as "1"/"0"),
and it also made a point of warning you if you'd misspelled the setting.
But these conveniences did not exist for other keyword-valued variables.
In particular, though ECHO_HIDDEN and ON_ERROR_ROLLBACK include "on" and
"off" as possible values, none of the alternative spellings for those were
recognized; and to make matters worse the code would just silently assume
"on" was meant for any unrecognized spelling. Several people have reported
getting bitten by this, so let's fix it. In detail, this patch:
* Allows all spellings recognized by ParseVariableBool() for ECHO_HIDDEN
and ON_ERROR_ROLLBACK.
* Reports a warning for unrecognized values for COMP_KEYWORD_CASE, ECHO,
ECHO_HIDDEN, HISTCONTROL, ON_ERROR_ROLLBACK, and VERBOSITY.
* Recognizes all values for all these variables case-insensitively;
previously there was a mishmash of case-sensitive and case-insensitive
behaviors.
Back-patch to all supported branches. There is a small risk of breaking
existing scripts that were accidentally failing to malfunction; but the
consensus is that the chance of detecting real problems and preventing
future mistakes outweighs this.

Document the long forms of \H \i \ir \o \p \r \w ... apparently, we have
a long and dishonorable history of leaving out the unabbreviated names of
psql backslash commands.
Avoid saying "Unix shell"; we can just say "shell" with equal clarity,
and not leave Windows users wondering whether the feature works for them.
Improve consistency of documentation of \g \o \w metacommands. There's
no reason to use slightly different wording or markup for each one.

Windows versions later than Windows Server 2003 map "localhost" to ::1.
Account for that in the generated pg_hba.conf, fixing another oversight
in commit f6dc6dd5ba54d52c0733aaafc50da2fbaeabb8b0. Back-patch to 9.0,
like that commit.
David Rowley and Noah Misch

For some reason this seems to have been missed when the lists in
src/timezone/tznames/ were first constructed. We can't put it in Default
because of the conflict with US CST, but we should certainly list it among
the alternative entries in Asia.txt. (I checked for other oversights, but
all the other abbreviations that are in current use according to the IANA
files seem to be accounted for.) Noted while responding to bug #12326.

Explain that you have to use "VARIADIC ARRAY[]" to pass an empty array
to a variadic parameter position. This was already implicit in the text
but it seems better to spell it out.
Per a suggestion from David Johnston, though I didn't use his proposed
wording. Back-patch to all supported branches.

In LWLockRelease() (and in 9.4+ LWLockUpdateVar()) we release enqueued
waiters using PGSemaphoreUnlock(). As there are other sources of such
unlocks backends only wake up if MyProc->lwWaiting is set to false;
which is only done in the aforementioned functions.
Before this commit there were dangers because the store to lwWaitLink
could become visible before the store to lwWaitLink. This could both
happen due to compiler reordering (on most compilers) and on some
platforms due to the CPU reordering stores.
The possible consequence of this is that a backend stops waiting
before lwWaitLink is set to NULL. If that backend then tries to
acquire another lock and has to wait there the list could become
corrupted once the lwWaitLink store is finally performed.
Add a write memory barrier to prevent that issue.
Unfortunately the barrier support has been only added in 9.2. Given
that the issue has not knowingly been observed in praxis it seems
sufficient to prohibit compiler reordering using volatile for 9.0 and
9.1. Actual problems due to compiler reordering are more likely
anyway.
Discussion: 20140210134625.GA15246@awork2.anarazel.de

The possibility that constant subexpressions of a CASE might be evaluated
at planning time was touched on in 9.17.1 (CASE expressions), but it really
ought to be explained in 4.2.14 (Expression Evaluation Rules) which is the
primary discussion of such topics. Add text and an example there, and
revise the <note> under CASE to link there.
Back-patch to all supported branches, since it's acted like this for a
long time (though 9.2+ is probably worse because of its more aggressive
use of constant-folding via replanning of nominally-prepared statements).
Pre-9.4, also back-patch text added in commit 0ce627d4 about CASE versus
aggregate functions.
Tom Lane and David Johnston, per discussion of bug #12273.

Use SSPI authentication to allow connections exclusively from the OS
user that launched the test suite. This closes on Windows the
vulnerability that commit be76a6d39e2832d4b88c0e1cc381aa44a7f86881
closed on other platforms. Users of "make installcheck" or custom test
harnesses can run "pg_regress --config-auth=DATADIR" to activate the
same authentication configuration that "make check" would use.
Back-patch to 9.0 (all supported versions).
Security: CVE-2014-0067

Fix off-by-one loop count in MapArrayTypeName, and get rid of static array.

MapArrayTypeName would copy up to NAMEDATALEN-1 bytes of the base type
name, which of course is wrong: after prepending '_' there is only room for
NAMEDATALEN-2 bytes. Aside from being the wrong result, this case would
lead to overrunning the statically allocated work buffer. This would be a
security bug if the function were ever used outside bootstrap mode, but it
isn't, at least not in any currently supported branches.
Aside from fixing the off-by-one loop logic, this patch gets rid of the
static work buffer by having MapArrayTypeName pstrdup its result; the sole
caller was already doing that, so this just requires moving the pstrdup
call. This saves a few bytes but mainly it makes the API a lot cleaner.
Back-patch on the off chance that there is some third-party code using
MapArrayTypeName with less-secure input. Pushing pstrdup into the function
should not cause any serious problems for such hypothetical code; at worst
there might be a short term memory leak.
Per Coverity scanning.

Fix file descriptor leak after failure of a \setshell command in pgbench.

If the called command fails to return data, runShellCommand forgot to
pclose() the pipe before returning. This is fairly harmless in the current
code, because pgbench would then abandon further processing of that client
thread; so no more than nclients descriptors could be leaked this way. But
it's not hard to imagine future improvements whereby that wouldn't be true.
In any case, it's sloppy coding, so patch all branches. Found by Coverity.

Ordinarily we can omit checking of a WHERE condition that matches a partial
index's condition, when we are using an indexscan on that partial index.
However, in SELECT FOR UPDATE we must include the "redundant" filter
condition in the plan so that it gets checked properly in an EvalPlanQual
recheck. The planner got this mostly right, but improperly omitted the
filter condition if the index in question was on an inheritance child
table. In READ COMMITTED mode, this could result in incorrectly returning
just-updated rows that no longer satisfy the filter condition.
The cause of the error is using get_parse_rowmark() when get_plan_rowmark()
is what should be used during planning. In 9.3 and up, also fix the same
mistake in contrib/postgres_fdw. It's currently harmless there (for lack
of inheritance support) but wrong is wrong, and the incorrect code might
get copied to someplace where it's more significant.
Report and fix by Kyotaro Horiguchi. Back-patch to all supported branches.

In READ COMMITTED mode, if a SELECT FOR UPDATE discovers it has to redo
WHERE-clause checking on rows that have been updated since the SELECT's
snapshot, it invokes EvalPlanQual processing to do that. If this first
occurs within a non-first child table of an inheritance tree, the previous
coding could accidentally re-return a matching row from an earlier,
already-scanned child table. (And, to add insult to injury, I think this
could make it miss returning a row that should have been returned, if the
updated row that this happens on should still have passed the WHERE qual.)
Per report from Kyotaro Horiguchi; the added isolation test is based on his
test case.
This has been broken for quite awhile, so back-patch to all supported
branches.

We were not checking to see if the supplied dscale was valid for the given
digit array when receiving binary-format numeric values. While dscale can
validly be more than the number of nonzero fractional digits, it shouldn't
be less; that case causes fractional digits to be hidden on display even
though they're there and participate in arithmetic.
Bug #12053 from Tommaso Sala indicates that there's at least one broken
client library out there that sometimes supplies an incorrect dscale value,
leading to strange behavior. This suggests that simply throwing an error
might not be the best response; it would lead to failures in applications
that might seem to be working fine today. What seems the least risky fix
is to truncate away any digits that would be hidden by dscale. This
preserves the existing behavior in terms of what will be printed for the
transmitted value, while preventing subsequent arithmetic from producing
results inconsistent with that.
In passing, throw a specific error for the case of dscale being outside
the range that will fit into a numeric's header. Before you got "value
overflows numeric format", which is a bit misleading.
Back-patch to all supported branches.

Coverity complained that the "else" added to fillPGconn() was unreachable,
which it was. Remove the dead code. In passing, rearrange the tests so as
not to bother trying to fetch values for options that can't be assigned.
Pre-9.3 did not have that issue, but it did have a "return" that should be
"goto oom_error" to ensure that a suitable error message gets filled in.

Mark Simonetti reported that libxslt sometimes crashes for him, and that
swapping xslt_process's object-freeing calls around to do them in reverse
order of creation seemed to fix it. I've not reproduced the crash, but
valgrind clearly shows a reference to already-freed memory, which is
consistent with the idea that shutdown of the xsltTransformContext is
trying to reference the already-freed stylesheet or input document.
With this patch, valgrind is no longer unhappy.
I have an inquiry in to see if this is a libxslt bug or if we're just
abusing the library; but even if it's a library bug, we'd want to adjust
our code so it doesn't fail with unpatched libraries.
Back-patch to all supported branches, because we've been doing this in
the wrong(?) order for a long time.

Allow “dbname” from connection string to be overridden in PQconnectDBParams

If the "dbname" attribute in PQconnectDBParams contained a connection string
or URI (and expand_dbname = TRUE), the database name from the connection
string could not be overridden by a subsequent "dbname" keyword in the
array. That was not intentional; all other options can be overridden.
Furthermore, any subsequent "dbname" caused the connection string from the
first dbname value to be processed again, overriding any values for the same
options that were given between the connection string and the second dbname
option.
In the passing, clarify in the docs that only the first dbname option in the
array is parsed as a connection string.
Alex Shulgin. Backpatch to all supported versions.

An out-of-memory in most of these would lead to strange behavior, like
connecting to a different database than intended, but some would lead to
an outright segfault.
Alex Shulgin and me. Backpatch to all supported versions.

In bug #12000, Andreas Kunert complained that the documentation was
misleading in saying "FROM T1 CROSS JOIN T2 is equivalent to FROM T1, T2".
That's correct as far as it goes, but the equivalence doesn't hold when
you consider three or more tables, since JOIN binds more tightly than
comma. I added a <note> to explain this, and ended up rearranging some
of the existing text so that the note would make sense in context.
In passing, rewrite the description of JOIN USING, which was unnecessarily
vague, and hadn't been helped any by somebody's reliance on markup as a
substitute for clear writing. (Mostly this involved reintroducing a
concrete example that was unaccountably removed by commit 032f3b7e166cfa28.)
Back-patch to all supported branches.

The regression test cases added in commits b2cbced9e et al depended in part
on the Russian timezone offset changes of Oct 2014. While this is of no
particular concern for a default Postgres build, it was possible for a
build using --with-system-tzdata to fail the tests if the system tzdata
database wasn't au courant. Bjorn Munch and Christoph Berg both complained
about this while packaging 9.4rc1, so we probably shouldn't insist on the
system tzdata being up-to-date. Instead, make an equivalent test using a
zone change that occurred in Venezuela in 2007. With this patch, the
regression tests should pass using any tzdata set from 2012 or later.
(I can't muster much sympathy for somebody using --with-system-tzdata
on a machine whose system tzdata is more than three years out-of-date.)

Unlogged relations are only reset when performing a unclean
restart. That means they have to be synced to disk during clean
shutdowns. During normal processing that's achieved by registering a
buffer's file to be fsynced at the next checkpoint when flushed. But
ResetUnloggedRelations() doesn't go through the buffer manager, so
nothing will force reset relations to disk before the next shutdown
checkpoint.
So just make ResetUnloggedRelations() fsync the newly created main
forks to disk.
Discussion: 20140912112246.GA4984@alap3.anarazel.de
Backpatch to 9.1 where unlogged tables were introduced.
Abhijit Menon-Sen and Andres Freund

Unlogged relations are reset at the end of crash recovery as they're
only synced to disk during a proper shutdown. Unfortunately that and
later steps can fail, e.g. due to running out of space. This reset
was, up to now performed after marking the database as having finished
crash recovery successfully. As out of space errors trigger a crash
restart that could lead to the situation that not all unlogged
relations are reset.
Once that happend usage of unlogged relations could yield errors like
"could not open file "...": No such file or directory". Luckily
clusters that show the problem can be fixed by performing a immediate
shutdown, and starting the database again.
To fix, just call ResetUnloggedRelations(UNLOGGED_RELATION_INIT)
earlier, before marking the database as having successfully recovered.
Discussion: 20140912112246.GA4984@alap3.anarazel.de
Backpatch to 9.1 where unlogged tables were introduced.
Abhijit Menon-Sen and Andres Freund

Fix breakage induced by commits d8d3d2a4f37f6df5d0118b7f5211978cca22091a
and 463f2625a5fb183b6a8925ccde98bb3889f921d9: pg_dumpall has crashed when
attempting to dump from pre-8.1 servers since then, due to faulty
construction of the query used for dumping roles from older servers.
The query was erroneous as of the earlier commit, but it wasn't exposed
unless you tried to use --binary-upgrade, which you presumably wouldn't
with a pre-8.1 server. However commit 463f2625a made it fail always.
In HEAD, also fix additional breakage induced in the same query by
commit 491c029dbc4206779cf659aa0ff986af7831d2ff, which evidently wasn't
tested against pre-8.1 servers either.
The bug is only latent in 9.1 because 463f2625a hadn't landed yet, but
it seems best to back-patch all branches containing the faulty query.
Gilles Darold

There was a window in RestoreBackupBlock where a page would be zeroed out,
but not yet locked. If a backend pinned and locked the page in that window,
it saw the zeroed page instead of the old page or new page contents, which
could lead to missing rows in a result set, or errors.
To fix, replace RBM_ZERO with RBM_ZERO_AND_LOCK, which atomically pins,
zeroes, and locks the page, if it's not in the buffer cache already.
In stable branches, the old RBM_ZERO constant is renamed to RBM_DO_NOT_USE,
to avoid breaking any 3rd party extensions that might use RBM_ZERO. More
importantly, this avoids renumbering the other enum values, which would
cause even bigger confusion in extensions that use ReadBufferExtended, but
haven't been recompiled.
Backpatch to all supported versions; this has been racy since hot standby
was introduced.

This fixes a scenario in which pgp_sym_decrypt() failed with "Wrong key
or corrupt data" on messages whose length is 6 less than a power of 2.
Per bug #11905 from Connor Penhale. Fix by Marko Tiikkaja, regression
test case from Jeff Janes.

Fix dependency searching for case where column is visited before table.

When the recursive search in dependency.c visits a column and then later
visits the whole table containing the column, it needs to propagate the
drop-context flags for the table to the existing target-object entry for
the column. Otherwise we might refuse the DROP (if not CASCADE) on the
incorrect grounds that there was no automatic drop pathway to the column.
Remarkably, this has not been reported before, though it's possible at
least when an extension creates both a datatype and a table using that
datatype.
Rather than just marking the column as allowed to be dropped, it might
seem good to skip the DROP COLUMN step altogether, since the later DROP
of the table will surely get the job done. The problem with that is that
the datatype would then be dropped before the table (since the whole
situation occurred because we visited the datatype, and then recursed to
the dependent column, before visiting the table). That seems pretty risky,
and the case is rare enough that it doesn't seem worth expending a lot of
effort or risk to make the drops happen in a safe order. So we just play
dumb and delete the column separately according to the existing drop
ordering rules.
Per report from Petr Jelinek, though this is different from his proposed
patch.
Back-patch to 9.1, where extensions were introduced. There's currently
no evidence that such cases can arise before 9.1, and in any case we would
also need to back-patch cb5c2ba2d82688d29b5902d86b993a54355cad4d to 9.0
if we wanted to back-patch this.

dict_thesaurus stored phrase IDs in uint16 fields, so it would get confused
and even crash if there were more than 64K entries in the configuration
file. It turns out to be basically free to widen the phrase IDs to uint32,
so let's just do so.
This was complained of some time ago by David Boutin (in bug #7793);
he later submitted an informal patch but it was never acted on.
We now have another complaint (bug #11901 from Luc Ouellette) so it's
time to make something happen.
This is basically Boutin's patch, but for future-proofing I also added a
defense against too many words per phrase. Note that we don't need any
explicit defense against overflow of the uint32 counters, since before that
happens we'd hit array allocation sizes that repalloc rejects.
Back-patch to all supported branches because of the crash risk.

Prevent the unnecessary creation of .ready file for the timeline history file.

Previously .ready file was created for the timeline history file at the end
of an archive recovery even when WAL archiving was not enabled.
This creation is unnecessary and causes .ready file to remain infinitely.
This commit changes an archive recovery so that it creates .ready file for
the timeline history file only when WAL archiving is enabled.
Backpatch to all supported versions.

In general, datatype I/O functions are supposed to be immutable or at
worst stable. Some contrib I/O functions were, through oversight, not
marked with any volatility property at all, which made them VOLATILE.
Since (most of) these functions actually behave immutably, the erroneous
marking isn't terribly harmful; but it can be user-visible in certain
circumstances, as per a recent bug report from Joe Van Dyk in which a
cast to text was disallowed in an expression index definition.
To fix, just adjust the declarations in the extension SQL scripts. If we
were being very fussy about this, we'd bump the extension version numbers,
but that seems like more trouble (for both developers and users) than the
problem is worth.
A fly in the ointment is that chkpass_in actually is volatile, because
of its use of random() to generate a fresh salt when presented with a
not-yet-encrypted password. This is bad because of the general assumption
that I/O functions aren't volatile: the consequence is that records or
arrays containing chkpass elements may have input behavior a bit different
from a bare chkpass column. But there seems no way to fix this without
breaking existing usage patterns for chkpass, and the consequences of the
inconsistency don't seem bad enough to justify that. So for the moment,
just document it in a comment.
Since we're not bumping version numbers, there seems no harm in
back-patching these fixes; at least future installations will get the
functions marked correctly.

The previous coding assumed that we could just let buffers for the
database's old tablespace age out of the buffer arena naturally.
The folly of that is exposed by bug #11867 from Marc Munro: the user could
later move the database back to its original tablespace, after which any
still-surviving buffers would match lookups again and appear to contain
valid data. But they'd be missing any changes applied while the database
was in the new tablespace.
This has been broken since ALTER SET TABLESPACE was introduced, so
back-patch to all supported branches.

pgp_sym_encrypt's option is spelled "sess-key", not "enable-session-key".
Spotted by Jeff Janes.
In passing, improve a comment in pgp-pgsql.c to make it clearer that
the debugging options are intentionally undocumented.

Test IsInTransactionChain, not IsTransactionBlock, in vac_update_relstats.

As noted by Noah Misch, my initial cut at fixing bug #11638 didn't cover
all cases where ANALYZE might be invoked in an unsafe context. We need to
test the result of IsInTransactionChain not IsTransactionBlock; which is
notationally a pain because IsInTransactionChain requires an isTopLevel
flag, which would have to be passed down through several levels of callers.
I chose to pass in_outer_xact (ie, the result of IsInTransactionChain)
rather than isTopLevel per se, as that seemed marginally more apropos
for the intermediate functions to know about.

VACUUM and ANALYZE update the target table's pg_class row in-place, that is
nontransactionally. This is OK, more or less, for the statistical columns,
which are mostly nontransactional anyhow. It's not so OK for the DDL hint
flags (relhasindex etc), which might get changed in response to
transactional changes that could still be rolled back. This isn't a
problem for VACUUM, since it can't be run inside a transaction block nor
in parallel with DDL on the table. However, we allow ANALYZE inside a
transaction block, so if the transaction had earlier removed the last
index, rule, or trigger from the table, and then we roll back the
transaction after ANALYZE, the table would be left in a corrupted state
with the hint flags not set though they should be.
To fix, suppress the hint-flag updates if we are InTransactionBlock().
This is safe enough because it's always OK to postpone hint maintenance
some more; the worst-case consequence is a few extra searches of pg_index
et al. There was discussion of instead using a transactional update,
but that would change the behavior in ways that are not all desirable:
in most scenarios we're better off keeping ANALYZE's statistical values
even if the ANALYZE itself rolls back. In any case we probably don't want
to change this behavior in back branches.
Per bug #11638 from Casey Shobe. This has been broken for a good long
time, so back-patch to all supported branches.
Tom Lane and Michael Paquier, initial diagnosis by Andres Freund

If you call PQreset() repeatedly, and the connection cannot be
re-established, the error messages from the failed connection attempts
kept accumulating in the error string.
Fixes bug #11455 reported by Caleb Epstein. Backpatch to all supported
versions.

1. The comparison for matching terms used only the CRC to decide if there's
a match. Two different terms with the same CRC gave a match.
2. It assumed that if the second operand has more terms than the first, it's
never a match. That assumption is bogus, because there can be duplicate
terms in either operand.
Rewrite the implementation in a way that doesn't have those bugs.
Backpatch to all supported versions.

Don't crash if an ispell dictionary definition contains flags but not
any compound affixes. (This isn't a security issue since only superusers
can install affix files, but still it's a bad thing.)
Also, be more careful about detecting whether an affix-file FLAG command
is old-format (ispell) or new-format (myspell/hunspell). And change the
error message about mixed old-format and new-format commands into something
intelligible.
Per bug #11770 from Emre Hasegeli. Back-patch to all supported branches.

The EOF-detection logic in pqReadData was a bit confused about who should
set up the error message in case the kernel gives us read-ready-but-no-data
rather than ECONNRESET or some other explicit error condition. Since the
whole point of this situation is that the lower-level functions don't know
there's anything wrong, pqReadData itself must set up the message. But
keep the assumption that if an errno was reported, a message was set up at
lower levels.
Per bug #11712 from Marko Tiikkaja. It's been like this for a very long
time, so back-patch to all supported branches.

CREATE DATABASE and ALTER DATABASE .. SET TABLESPACE copy the source
database directory on the filesystem level. To ensure the on disk
state is consistent they block out users of the affected database and
force a checkpoint to flush out all data to disk. Unfortunately, up to
now, that checkpoint didn't flush out dirty buffers from unlogged
relations.
That bug means there could be leftover dirty buffers in either the
template database, or the database in its old location. Leading to
problems when accessing relations in an inconsistent state; and to
possible problems during shutdown in the SET TABLESPACE case because
buffers belonging files that don't exist anymore are flushed.
This was reported in bug #10675 by Maxim Boguk.
Fix by Pavan Deolasee, modified somewhat by me. Reviewed by MauMau and
Fujii Masao.
Backpatch to 9.1 where unlogged tables were introduced.

Follow our usual style of providing an "extern" for a standard library
function only when we're also providing the implementation. This avoids
issues when the system headers declare the function slightly differently
than we do, as noted by Caleb Welton.
We might have to go to the extent of probing to see if the system headers
declare the function, but let's not do that until it's demonstrated to be
necessary.
Oversight in commit 9e6b1bf258170e62dac555fc82ff0536dfe01d29. Back-patch
to all supported branches, as that was.

Avoid core dump in _outPathInfo() for Path without a parent RelOptInfo.

Nearly all Paths have parents, but a ResultPath representing an empty FROM
clause does not. Avoid a core dump in such cases. I believe this is only
a hazard for debugging usage, not for production, else we'd have heard
about it before. Nonetheless, back-patch to 9.1 where the troublesome code
was introduced. Noted while poking at bug #11703.

This reverts nearly all of commit 28f6cab61ab8958b1a7dfb019724687d92722538
in favor of just using the typrelid we already have in pg_dump's TypeInfo
struct for the composite type. As coded, it'd crash if the composite type
had no attributes, since then the query would return no rows.
Back-patch to all supported versions. It seems to not really be a problem
in 9.0 because that version rejects the syntax "create type t as ()", but
we might as well keep the logic similar in all affected branches.
Report and fix by Rushabh Lathia.

Up to now, PG has assumed that any given timezone abbreviation (such as
"EDT") represents a constant GMT offset in the usage of any particular
region; we had a way to configure what that offset was, but not for it
to be changeable over time. But, as with most things horological, this
view of the world is too simplistic: there are numerous regions that have
at one time or another switched to a different GMT offset but kept using
the same timezone abbreviation. Almost the entire Russian Federation did
that a few years ago, and later this month they're going to do it again.
And there are similar examples all over the world.
To cope with this, invent the notion of a "dynamic timezone abbreviation",
which is one that is referenced to a particular underlying timezone
(as defined in the IANA timezone database) and means whatever it currently
means in that zone. For zones that use or have used daylight-savings time,
the standard and DST abbreviations continue to have the property that you
can specify standard or DST time and get that time offset whether or not
DST was theoretically in effect at the time. However, the abbreviations
mean what they meant at the time in question (or most recently before that
time) rather than being absolutely fixed.
The standard abbreviation-list files have been changed to use this behavior
for abbreviations that have actually varied in meaning since 1970. The
old simple-numeric definitions are kept for abbreviations that have not
changed, since they are a bit faster to resolve.
While this is clearly a new feature, it seems necessary to back-patch it
into all active branches, because otherwise use of Russian zone
abbreviations is going to become even more problematic than it already was.
This change supersedes the changes in commit 513d06ded et al to modify the
fixed meanings of the Russian abbreviations; since we've not shipped that
yet, this will avoid an undesirably incompatible (not to mention incorrect)
change in behavior for timestamps between 2011 and 2014.
This patch makes some cosmetic changes in ecpglib to keep its usage of
datetime lookup tables as similar as possible to the backend code, but
doesn't do anything about the increasingly obsolete set of timezone
abbreviation definitions that are hard-wired into ecpglib. Whatever we
do about that will likely not be appropriate material for back-patching.
Also, a potential free() of a garbage pointer after an out-of-memory
failure in ecpglib has been fixed.
This patch also fixes pre-existing bugs in DetermineTimeZoneOffset() that
caused it to produce unexpected results near a timezone transition, if
both the "before" and "after" states are marked as standard time. We'd
only ever thought about or tested transitions between standard and DST
time, but that's not what's happening when a zone simply redefines their
base GMT offset.
In passing, update the SGML documentation to refer to the Olson/zoneinfo/
zic timezone database as the "IANA" database, since it's now being
maintained under the auspices of IANA.

This file used __int64, which is specific to native Windows, rather than
int64. Suppress the long-unused union field of this type. Noticed on
Cygwin x86_64 with -lcrypt not installed. Back-patch to 9.0 (all
supported versions).

The code wrote a value into the caller's field[] array before checking
to see if there was room, which of course is backwards. Per report from
Michael Paquier.
I fixed the equivalent bug in the backend's version of this code way back
in 630684d3a130bb93, but failed to think about ecpg's copy. Fortunately
this doesn't look like it would be exploitable for anything worse than a
core dump: an external attacker would have no control over the single word
that gets written.

Most zones in the Russian Federation are subtracting one or two hours
as of 2014-10-26. Update the meanings of the abbreviations IRKT, KRAT,
MAGT, MSK, NOVT, OMST, SAKT, VLAT, YAKT, YEKT to match.
The IANA timezone database has adopted abbreviations of the form AxST/AxDT
for all Australian time zones, reflecting what they believe to be current
majority practice Down Under. These names do not conflict with usage
elsewhere (other than ACST for Acre Summer Time, which has been in disuse
since 1994). Accordingly, adopt these names into our "Default" timezone
abbreviation set. The "Australia" abbreviation set now contains only
CST,EAST,EST,SAST,SAT,WST, all of which are thought to be mostly historical
usage. Note that SAST has also been changed to be South Africa Standard
Time in the "Default" abbreviation set.
Add zone abbreviations SRET (Asia/Srednekolymsk) and XJT (Asia/Urumqi),
and use WSST/WSDT for western Samoa.
Also a DST law change in the Turks & Caicos Islands (America/Grand_Turk),
and numerous corrections for historical time zone data.

This updates known_abbrevs.txt to be what it should have been already,
were my -P patch not broken; and updates some tznames/ entries that
missed getting any love in previous timezone data updates because zic
failed to flag the change of abbreviation.
The non-cosmetic updates:
* Remove references to "ADT" as "Arabia Daylight Time", an abbreviation
that's been out of use since 2007; therefore, claiming there is a conflict
with "Atlantic Daylight Time" doesn't seem especially helpful. (We have
left obsolete entries in the files when they didn't conflict with anything,
but that seems like a different situation.)
* Fix entirely incorrect GMT offsets for CKT (Cook Islands), FJT, FJST
(Fiji); we didn't even have them on the proper side of the date line.
(Seems to have been aboriginal errors in our tznames data; there's no
evidence anything actually changed recently.)
* FKST (Falkland Islands Summer Time) is now used all year round, so
don't mark it as a DST abbreviation.
* Update SAKT (Sakhalin) to mean GMT+11 not GMT+10.
In cosmetic changes, I fixed a bunch of wrong (or at least obsolete)
claims about abbreviations not being present in the zic files, and
tried to be consistent about how obsolete abbreviations are labeled.
Note the underlying timezone/data files are still at release 2014e;
this is just trying to get us in sync with what those files actually
say before we go to the next update.

When there are cost-delay-related storage options set for a table,
trying to make that table participate in the autovacuum cost-limit
balancing algorithm produces undesirable results: instead of using the
configured values, the global values are always used,
as illustrated by Mark Kirkwood in
http://www.postgresql.org/message-id/52FACF15.8020507@catalyst.net.nz
Since the mechanism is already complicated, just disable it for those
cases rather than trying to make it cope. There are undesirable
side-effects from this too, namely that the total I/O impact on the
system will be higher whenever such tables are vacuumed. However, this
is seen as less harmful than slowing down vacuum, because that would
cause bloat to accumulate. Anyway, in the new system it is possible to
tweak options to get the precise behavior one wants, whereas with the
previous system one was simply hosed.
This has been broken forever, so backpatch to all supported branches.
This might affect systems where cost_limit and cost_delay have been set
for individual tables.

The page splitting code would go into infinite recursion if you try to
insert an index tuple that doesn't fit even on an empty page.
Per analysis and suggested fix by Andrew Gierth. Fixes bug #11555, reported
by Bryan Seitz (analysis happened over IRC). Backpatch to all supported
versions.

As of commit a87c72915 (which later got backpatched as far as 9.1),
we're explicitly supporting the notion that append relations can be
nested; this can occur when UNION ALL constructs are nested, or when
a UNION ALL contains a table with inheritance children.
Bug #11457 from Nelson Page, as well as an earlier report from Elvis
Pranskevichus, showed that there were still nasty bugs associated with such
cases: in particular the EquivalenceClass mechanism could try to generate
"join" clauses connecting an appendrel child to some grandparent appendrel,
which would result in assertion failures or bogus plans.
Upon investigation I concluded that all current callers of
find_childrel_appendrelinfo() need to be fixed to explicitly consider
multiple levels of parent appendrels. The most complex fix was in
processing of "broken" EquivalenceClasses, which are ECs for which we have
been unable to generate all the derived equality clauses we would like to
because of missing cross-type equality operators in the underlying btree
operator family. That code path is more or less entirely untested by
the regression tests to date, because no standard opfamilies have such
holes in them. So I wrote a new regression test script to try to exercise
it a bit, which turned out to be quite a worthwhile activity as it exposed
existing bugs in all supported branches.
The present patch is essentially the same as far back as 9.2, which is
where parameterized paths were introduced. In 9.0 and 9.1, we only need
to back-patch a small fragment of commit 5b7b5518d, which fixes failure to
propagate out the original WHERE clauses when a broken EC contains constant
members. (The regression test case results show that these older branches
are noticeably stupider than 9.2+ in terms of the quality of the plans
generated; but we don't really care about plan quality in such cases,
only that the plan not be outright wrong. A more invasive fix in the
older branches would not be a good idea anyway from a plan-stability
standpoint.)

Without this fix, parallel restore of a schema-only dump can deadlock,
because when the dump is schema-only, the dependency will still be
pointing at the TABLE item rather than the TABLE DATA item.
Robert Haas and Tom Lane

Fix VPATH builds of the replication parser from git for some !gcc compilers.

Some compilers don't automatically search the current directory for
included files. 9cc2c182fc2 fixed that for builds from tarballs by
adding an include to the source directory. But that doesn't work when
the scanner is generated in the VPATH directory. Use the same search
path as the other parsers in the tree.
One compiler that definitely was affected is solaris' sun cc.
Backpatch to 9.1 which introduced using an actual parser for
replication commands.

The documentation used to suggest setting this parameter with ALTER ROLE
SET, but that never worked, so replace it with a working suggestion.
Reported-by: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp>

In psql, expanded mode was not being displayed correctly when using
the normal ascii or unicode linestyles and border set to '3'. Now,
per the documentation, border '3' is really only sensible for HTML
and LaTeX formats, however, that's no excuse for ascii/unicode to
break in that case, and provisions had been made for psql to cleanly
handle this case (and it did, in non-expanded mode).
This was broken when ascii/unicode was initially added a good five
years ago because print_aligned_vertical_line wasn't passed in the
border setting being used by print_aligned_vertical but instead was
given the whole printTableContent. There really isn't a good reason
for vertical_line to have the entire printTableContent structure, so
just pass in the printTextFormat and border setting (similar to how
this is handled in horizontal_line).
Pointed out by Pavel Stehule, fix by me.
Back-patch to all currently-supported versions.

The code for raising a NUMERIC value to an integer power wasn't very
careful about large powers. It got an outright wrong answer for an
exponent of INT_MIN, due to failure to consider overflow of the Abs(exp)
operation; which is fixable by using an unsigned rather than signed
exponent value after that point. Also, even though the number of
iterations of the power-computation loop is pretty limited, it's easy for
the repeated squarings to result in ridiculously enormous intermediate
values, which can take unreasonable amounts of time/memory to process,
or even overflow the internal "weight" field and so produce a wrong answer.
We can forestall misbehaviors of that sort by bailing out as soon as the
weight value exceeds what will fit in int16, since then the final answer
must overflow (if exp > 0) or underflow (if exp < 0) the packed numeric
format.
Per off-list report from Pavel Stehule. Back-patch to all supported
branches.

Some Sparc CPUs can be run in various coherence models, ranging from
RMO (relaxed) over PSO (partial) to TSO (total). Solaris has always
run CPUs in TSO mode while in userland, but linux didn't use to and
the various *BSDs still don't. Unfortunately the sparc TAS/S_UNLOCK
were only correct under TSO. Fix that by adding the necessary memory
barrier instructions. On sparcv8+, which should be all relevant CPUs,
these are treated as NOPs if the current consistency model doesn't
require the barriers.
Discussion: 20140630222854.GW26930@awork2.anarazel.de
Will be backpatched to all released branches once a few buildfarm
cycles haven't shown up problems. As I've no access to sparc, this is
blindly written.

psql's \s (print command history) doesn't work at all with recent libedit
versions when printing to the terminal, because libedit tries to do an
fchmod() on the target file which will fail if the target is /dev/tty.
(We'd already noted this in the context of the target being /dev/null.)
Even before that, it didn't work pleasantly, because libedit likes to
encode the command history file (to ensure successful reloading), which
renders it nigh unreadable, not to mention significantly different-looking
depending on exactly which libedit version you have. So let's forget using
write_history() for this purpose, and instead print the data ourselves,
using logic similar to that used to iterate over the history for newline
encoding/decoding purposes.
While we're at it, insert the ability to use the pager when \s is printing
to the terminal. This has been an acknowledged shortcoming of \s for many
years, so while you could argue it's not exactly a back-patchable bug fix
it still seems like a good improvement. Anyone who's seriously annoyed
at this can use "\s /dev/tty" or local equivalent to get the old behavior.
Experimentation with this showed that the history iteration logic was
actually rather broken when used with libedit. It turns out that with
libedit you have to use previous_history() not next_history() to advance
to more recent history entries. The easiest and most robust fix for this
seems to be to make a run-time test to verify which function to call.
We had not noticed this because libedit doesn't really need the newline
encoding logic: its own encoding ensures that command entries containing
newlines are reloaded correctly (unlike libreadline). So the effective
behavior with recent libedits was that only the oldest history entry got
newline-encoded or newline-decoded. However, because of yet other bugs in
history_set_pos(), some old versions of libedit allowed the existing loop
logic to reach entries besides the oldest, which means there may be libedit
~/.psql_history files out there containing encoded newlines in more than
just the oldest entry. To ensure we can reload such files, it seems
appropriate to back-patch this fix, even though that will result in some
incompatibility with older psql versions (ie, multiline history entries
written by a psql with this fix will look corrupted to a psql without it,
if its libedit is reasonably up to date).
Stepan Rutz and Tom Lane

The old claim is from my commit d06ebdb8d3425185d7e641d15e45908658a0177d of
2000-07-17, but it seems to have been a plain old thinko; sum(float4) has
been distinct from sum(float8) since Berkeley days. Noted by KaiGai Kohei.
While at it, mention the existence of sum(money), which is also of
embarrassingly ancient vintage.

In commit 45e02e3232ac7cc5ffe36f7986159b5e0b1f6fdc, we intentionally
disallowed updates on individual elements of oidvector columns. While that
still seems like a sane idea in the abstract, we (I) forgot that citext's
"upgrade from unpackaged" script did in fact perform exactly such updates,
in order to fix the problem that citext indexes should have a collation
but would not in databases dumped or upgraded from pre-9.1 installations.
Even if we wanted to add casts to allow such updates, there's no practical
way to do so in the back branches, so the only real alternative is to make
citext's kluge even klugier. In this patch, I cast the oidvector to text,
fix its contents with regexp_replace, and cast back to oidvector. (Ugh!)
Since the aforementioned commit went into all active branches, we have to
fix this in all branches that contain the now-broken update script.
Per report from Eric Malm.

Fix typos in some error messages thrown by extension scripts when fed to psql.

Some of the many error messages introduced in 458857cc missed 'FROM
unpackaged'. Also e016b724 and 45ffeb7e forgot to quote extension
version numbers.
Backpatch to 9.1, just like 458857cc which introduced the messages. Do
so because the error messages thrown when the wrong command is copy &
pasted aren't easy to understand.

The old text explained what happened if we didn't have working int64
arithmetic. Since that case has been explicitly rejected by configure
since 8.4.3, documenting it in the 9.x branches can only produce confusion.

FreeBSD hasn't made any use of kern.ipc.semmap since 1.1, and newer
releases reject attempts to set it altogether; so stop recommending
that it be adjusted. Per bug #11161.
Back-patch to all supported branches. Before 9.3, also incorporate
commit 7a42dff47, which touches the same text and for some reason
was not back-patched at the time.

Specifically this commit updates forkname_to_number() so that the HINT
message includes "init" fork, and also adds the description of "init" fork
into pg_relation_size() document.
This is a part of the commit 2d00190495b22e0d0ba351b2cda9c95fb2e3d083
which has fixed the same oversight in master and 9.4. Back-patch to
9.1 where "init" fork was added.

The initialization fork was added in 9.1, but has not been taken into
consideration in documents of get_raw_page function in pageinspect and
storage layout. This commit fixes those oversights.
get_raw_page can read not only a table but also an index, etc. So it
should be documented that the function can read any relation. This commit
also fixes the document of pageinspect that way.
Back-patch to 9.1 where those oversights existed.
Vik Fearing, review by MauMau

The user documentation was vague and not entirely accurate about how
we treat domain inputs for ambiguous operators/functions. Clarify
that, and add an example and some commentary. Per a recent question
from Adam Mackler.
It's acted like this ever since we added domains, so back-patch
to all supported branches.

Such cases are disallowed by the SQL spec, and even if we wanted to allow
them, the semantics seem ambiguous: how should the FK columns be matched up
with the columns of a unique index? (The matching could be significant in
the presence of opclasses with different notions of equality, so this issue
isn't just academic.) However, our code did not previously reject such
cases, but instead would either fail to match to any unique index, or
generate a bizarre opclass-lookup error because of sloppy thinking in the
index-matching code.
David Rowley

When autovacuum is nominally off, we will still launch autovac workers
to vacuum tables that are at risk of XID wraparound. But after we'd done
that, an autovac worker would proceed to autovacuum every table in the
targeted database, if they meet the usual thresholds for autovacuuming.
This is at best pretty unexpected; at worst it delays response to the
wraparound threat. Fix it so that if autovacuum is nominally off, we
*only* do forced vacuums and not any other work.
Per gripe from Andrey Zhidenkov. This has been like this all along,
so back-patch to all supported branches.

There were several oversights in recovery code where COMMIT/ABORT PREPARED
records were ignored:
* pg_last_xact_replay_timestamp() (wasn't updated for 2PC commits)
* recovery_min_apply_delay (2PC commits were applied immediately)
* recovery_target_xid (recovery would not stop if the XID used 2PC)
The first of those was reported by Sergiy Zuban in bug #11032, analyzed by
Tom Lane and Andres Freund. The bug was always there, but was masked before
commit d19bd29f07aef9e508ff047d128a4046cc8bc1e2, because COMMIT PREPARED
always created an extra regular transaction that was WAL-logged.
Backpatch to all supported versions (older versions didn't have all the
features and therefore didn't have all of the above bugs).

unix_socket_directories was introduced in 9.3, but the document
in older versions wrongly have mentioned it. This commit replaces
it with the correct older name unix_socket_directory.
This is applied to only 9.2 and older supported versions.
Guillaume Lelarge

findDependencyLoops() was not bright about cases where there are multiple
dependency paths between the same two dumpable objects. In most scenarios
this did not hurt us too badly; but since the introduction of section
boundary pseudo-objects in commit a1ef01fe163b304760088e3e30eb22036910a495,
it was possible for this code to take unreasonable amounts of time (tens
of seconds on a database with a couple thousand objects), as reported in
bug #11033 from Joe Van Dyk. Joe's particular problem scenario involved
"pg_dump -a" mode with long chains of foreign key constraints, but I think
that similar problems could arise with other situations as long as there
were enough objects. To fix, add a flag array that lets us notice when we
arrive at the same object again while searching from a given start object.
This simple change seems to be enough to eliminate the performance problem.
Back-patch to 9.1, like the patch that introduced section boundary objects.

Break the list of available options into an <itemizedlist> instead of
inline sentences. This is mostly motivated by wanting to ensure that the
cross-references to the FSM and VM docs don't cross page boundaries in PDF
format; but it seems to me to read more easily this way anyway. I took the
liberty of editorializing a bit further while at it.
Per complaint from Magnus about 9.0.18 docs not building in A4 format.
Patch all active branches so we don't get blind-sided by this particular
issue again in future.

This is consistent with the POSIX verdict that kill() shall not report
ESRCH for a zombie process. Back-patch to 9.0 (all supported versions).
Test code from commit d7cdf6ee36adeac9233678fb8f2a112e6678a770 depends
on it, and log messages about kill() reporting "Invalid argument" will
cease to appear for this not-unexpected condition.

get_raw_page tried to validate the supplied block number against
RelationGetNumberOfBlocks(), which of course is only right when
accessing the main fork. In most cases, the main fork is longer
than the others, so that the check was too weak (allowing a
lower-level error to be reported, but no real harm to be done).
However, very small tables could have an FSM larger than their heap,
in which case the mistake prevented access to some FSM pages.
Per report from Torsten Foertsch.
In passing, make the bad-block-number error into an ereport not elog
(since it's certainly not an internal error); and fix sloppily
maintained comment for RelationGetNumberOfBlocksInFork.
This has been wrong since we invented relation forks, so back-patch
to all supported branches.

With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, PostgreSQL
backends can crash at exit. Raise a warning during "configure" based on
the compile-time OpenLDAP version number, and test the crash scenario in
the dblink test suite. Back-patch to 9.0 (all supported versions).

In commit 631dc390f49909a5c8ebd6002cfb2bcee5415a9d, we started to handle
simple numeric timezone offsets via the zic library instead of the old
CTimeZone/HasCTZSet kluge. However, we overlooked the fact that the zic
code will reject UTC offsets exceeding a week (which seems a bit arbitrary,
but not because it's too tight ...). This led to possibly setting
session_timezone to NULL, which results in crashes in most timezone-related
operations as of 9.4, and crashes in a small number of places even before
that. So check for NULL return from pg_tzset_offset() and report an
appropriate error message. Per bug #11014 from Duncan Gillis.
Back-patch to all supported branches, like the previous patch.
(Unfortunately, as of today that no longer includes 8.4.)