This oversight led to a massive memory leak --- upwards of 10KB per tuple
--- during creation-time verification of an exclusion constraint based on a
GIST index. In most other scenarios it'd just be a leak of 10KB that would
be recovered at end of query, so not too significant; though perhaps the
leak would be noticeable in a situation where a GIST index was being used
in a nestloop inner indexscan. In any case, it's a real leak of long
standing, so patch all supported branches. Per report from Harald Fuchs.

Trailing-zero stripping applied by the FM specifier could strip zeroes
to the left of the decimal point, for a format with no digit positions
after the decimal point (such as "FM999.").
Reported and diagnosed by Marti Raudsepp, though I didn't use his patch.

In pg_upgrade, disallow migration of 8.3 clusters using contrib/ltree because its internal format was changed in 8.4.

The code in shift_jis_20042euc_jis_2004() would fetch two bytes even when
only one remained in the string. Since conversion functions aren't
supposed to assume null-terminated input, this poses a small risk of
fetching past the end of memory and incurring SIGSEGV. No such crash has
been identified in the field, but we've certainly seen the equivalent
happen in other code paths, so patch this one all the way back.
Report and patch by Noah Misch.

Avoid possibly accessing off the end of memory in examine_attribute().

Since the last couple of columns of pg_type are often NULL,
sizeof(FormData_pg_type) can be an overestimate of the actual size of the
tuple data part. Therefore memcpy'ing that much out of the catalog cache,
as analyze.c was doing, poses a small risk of copying past the end of
memory and incurring SIGSEGV. No such crash has been identified in the
field, but we've certainly seen the equivalent happen in other code paths,
so patch this one all the way back.
Per valgrind testing by Noah Misch, though this is not his proposed patch.
I chose to use SearchSysCacheCopy1 rather than inventing special-purpose
infrastructure for copying only the minimal part of a pg_type tuple.

The $(PERL) macro will be set by configure if it finds perl at all,
but $(perl_privlibexp) isn't configured unless you said --with-perl.
This results in confusing error messages if someone cd's into
src/pl/plperl and tries to build there despite the configure omission,
as reported by Tomas Vondra in bug #6198. Add simple checks to
provide a more useful report, while not disabling other use of the
makefile such as "make clean".
Back-patch to 9.0, which is as far as the patch applies easily.

">" should be ">>". This typo results in failure to use all of the bits
of the provided seed.
This might rise to the level of a security bug if we were relying on
srand48 for any security-critical purposes, but we are not --- in fact,
it's not used at all unless the platform lacks srandom(), which is
improbable. Even on such a platform the exposure seems minimal.
Reported privately by Andres Freund.

Move the line to undefine setlocale() macro on Win32 outside USE_REPL_SNPRINTF ifdef block. It has nothing to do with whether the replacement snprintf function is used. It caused no live bug, because the replacement snprintf function is always used on Win32, but it was nevertheless misplaced.

Examination of examples provided by Mark Kirkwood and others has convinced
me that actually commit 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 was quite
a few bricks shy of a load. The useful part of that patch was clamping
ndistinct for the inner side of a semi or anti join, and the reason why
that's needed is that it's the only way that restriction clauses
eliminating rows from the inner relation can affect the estimated size of
the join result. I had not clearly understood why the clamping was
appropriate, and so mis-extrapolated to conclude that we should clamp
ndistinct for the outer side too, as well as for both sides of regular
joins. These latter actions were all wrong, and are reverted with this
patch. In addition, the clamping logic is now made to affect the behavior
of both paths in eqjoinsel_semi, with or without MCV lists to compare.
When we have MCVs, we suppose that the most common values are the ones
that are most likely to survive the decimation resulting from a lower
restriction clause, so we think of the clamping as eliminating non-MCV
values, or potentially even the least-common MCVs for the inner relation.
Back-patch to 8.4, same as previous fixes in this area.

This patch fixes an oversight in my commit
7f3eba30c9d622d1981b1368f2d79ba0999cdff2 of 2008-10-23. That patch
accounted for baserel restriction clauses that reduced the number of rows
coming out of a table (and hence the number of possibly-distinct values of
a join variable), but not for join restriction clauses that might have been
applied at a lower level of join. To account for the latter, look up the
sizes of the min_lefthand and min_righthand inputs of the current join,
and clamp with those in the same way as for the base relations.
Noted while investigating a complaint from Ben Chobot, although this in
itself doesn't seem to explain his report.
Back-patch to 8.4; previous versions used different estimation methods
for which this heuristic isn't relevant.

It is possible for VACUUM to scan no pages at all, if the visibility map
shows that all pages are all-visible. In this situation VACUUM has no new
information to report about the relation's tuple density, so it wasn't
changing pg_class.reltuples ... but it updated pg_class.relpages anyway.
That's wrong in general, since there is no evidence to justify changing the
density ratio reltuples/relpages, but it's particularly bad if the previous
state was relpages=reltuples=0, which means "unknown tuple density".
We just replaced "unknown" with "zero". ANALYZE would eventually recover
from this, but it could take a lot of repetitions of ANALYZE to do so if
the relation size is much larger than the maximum number of pages ANALYZE
will scan, because of the moving-average behavior introduced by commit
b4b6923e03f4d29636a94f6f4cc2f5cf6298b8c8.
The only known situation where we could have relpages=reltuples=0 and yet
the visibility map asserts everything's visible is immediately following
a pg_upgrade. It might be advisable for pg_upgrade to try to preserve the
relpages/reltuples statistics; but in any case this code is wrong on its
own terms, so fix it. Per report from Sergey Koposov.
Back-patch to 8.4, where the visibility map was introduced, same as the
previous change.

Actually, all of parallel restore’s limitations should be tested earlier.

On closer inspection, whining in restore_toc_entries_parallel is really
much too late for any user-facing error case. The right place to do it
is at the start of RestoreArchive(), before we've done anything interesting
(suh as trying to DROP all the targets ...)
Back-patch to 8.4, where parallel restore was introduced.

Be more user-friendly about unsupported cases for parallel pg_restore.

If we are unable to do a parallel restore because the input file is stdin
or is otherwise unseekable, we should complain and fail immediately, not
after having done some of the restore. Complaining once per thread isn't
so cool either, and the messages should be worded to make it clear this is
an unsupported case not some weird race-condition bug. Per complaint from
Lonni Friedman.
Back-patch to 8.4, where parallel restore was introduced.

These days, such a response is far more likely to signify a server-side
problem, such as fork failure. Reporting "server does not support SSL"
(in sslmode=require) could be quite misleading. But the results could
be even worse in sslmode=prefer: if the problem was transient and the
next connection attempt succeeds, we'll have silently fallen back to
protocol version 2.0, possibly disabling features the user needs.
Hence, it seems best to just eliminate the assumption that backing off
to non-SSL/2.0 protocol is the way to recover from an "E" response, and
instead treat the server error the same as we would in non-SSL cases.
I tested this change against a pre-7.0 server, and found that there
was a second logic bug in the "prefer" path: the test to decide whether
to make a fallback connection attempt assumed that we must have opened
conn->ssl, which in fact does not happen given an "E" response. After
fixing that, the code does indeed connect successfully to pre-7.0,
as long as you didn't set sslmode=require. (If you did, you get
"Unsupported frontend protocol", which isn't completely off base
given the server certainly doesn't support SSL.)
Since there seems no reason to believe that pre-7.0 servers exist anymore
in the wild, back-patch to all supported branches.

Ensure we discard unread/unsent data when abandoning a connection attempt.

There are assorted situations wherein PQconnectPoll() will abandon a
connection attempt and try again with different parameters (eg, SSL versus
not SSL). However, the code forgot to discard any pending data in libpq's
I/O buffers when doing this. In at least one case (server returns E
message during SSL negotiation), there is unread input data which bollixes
the next connection attempt. I have not checked to see whether this is
possible in the other cases where we close the socket and retry, but it
seems like a matter of good defensive programming to add explicit
buffer-flushing code to all of them.
This is one of several issues exposed by Daniel Farina's report of
misbehavior after a server-side fork failure.
This has been wrong since forever, so back-patch to all supported branches.

tsvector_concat() allocated its result workspace using the "conservative"
estimate of the sum of the two input tsvectors' sizes. Unfortunately that
wasn't so conservative as all that, because it supposed that the number of
pad bytes required could not grow. Which it can, as per test case from
Jesper Krogh, if there's a mix of lexemes with positions and lexemes
without them in the input data. The fix is to assume that we might add
a not-previously-present pad byte for each and every lexeme in the two
inputs; which really is conservative, but it doesn't seem worthwhile to
try to be more precise.
This is an aboriginal bug in tsvector_concat, so back-patch to all
versions containing it.

In pg_upgrade, limit schema name filter to include toast tables. Bug introduced recently when trying to filter out temp tables.

Per previous experimentation, backtracking slows down lexing performance
significantly (by about a third). It's usually pretty easy to avoid, just
need to have rules that accept an incomplete construct and do whatever the
lexer would have done otherwise.
The backtracking was introduced by the patch that added quoted variable
substitution. Back-patch to 9.0 where that was added.

For an empty index, the pgstatindex() function would compute 0.0/0.0 for
its avg_leaf_density and leaf_fragmentation outputs. On machines that
follow the IEEE float arithmetic standard with any care, that results in
a NaN. However, per report from Rushabh Lathia, Microsoft couldn't
manage to get this right, so you'd get a bizarre error on Windows.
Fix by forcing the results to be NaN explicitly, rather than relying on
the division operator to give that or the snprintf function to print it
correctly. I have some doubts that this is really the most useful
definition, but it seems better to remain backward-compatible with
those platforms for which the behavior wasn't completely broken.
Back-patch to 8.2, since the code is like that in all current releases.

Due to tuple-slot mismanagement, evaluation of WHEN conditions for AFTER
ROW UPDATE triggers could crash if there had been a BEFORE ROW trigger
fired for the same update. Fix by not trying to overload the use of
estate->es_trig_tuple_slot. Per report from Yoran Heling.
Back-patch to 9.0, when trigger WHEN conditions were introduced.

As pointed out by Sergey Koposov, repeated invocations of tbm_lossify can
make building a large tidbitmap into an O(N^2) operation. To fix, make
sure we remove more than the minimum amount of information per call, and
add a fallback path to behave sanely if we're unable to fit the bitmap
within the requested amount of memory.
This has been wrong since the tidbitmap code was written, so back-patch
to all supported branches.

The TID isn't stable enough: we might queue an sinval event before a VACUUM
FULL, and then process it afterwards, when the target tuple no longer has
the same TID. So we must invalidate entries on the basis of hash value
only. The old coding can be shown to result in various bizarre,
hard-to-reproduce errors in the presence of concurrent VACUUM FULLs on
system catalogs, and could easily result in permanent catalog corruption,
up to and including complete loss of tables.
This commit is just a minimal fix that removes the unsafe comparison.
We should remove transmission of the tuple TID from sinval messages
altogether, and then arrange to suppress the extra message in the common
case of a heap_update that doesn't change the key hashvalue. But that's
going to be much more invasive, and will only produce a probably-marginal
performance gain, so it doesn't seem like material for a back-patch.
Back-patch to 9.0. Before that, VACUUM FULL refused to do any tuple moving
if it found any INSERT_IN_PROGRESS or DELETE_IN_PROGRESS tuples (and
CLUSTER would give up altogether), so there was no risk of moving a tuple
that might be the subject of an unsent sinval message.

We have to be sure that we have revalidated each nailed-in-cache relcache
entry before we try to use it to load data for some other relcache entry.
The introduction of "mapped relations" in 9.0 broke this, because although
we updated the state kept in relmapper.c early enough, we failed to
propagate that information into relcache entries soon enough; in
particular, we could try to fetch pg_class rows out of pg_class before
we'd updated its relcache entry's rd_node.relNode value from the map.
This bug accounts for Dave Gould's report of failures after "vacuum full
pg_class", and I believe that there is risk for other system catalogs
as well.
The core part of the fix is to copy relmapper data into the relcache
entries during "phase 1" in RelationCacheInvalidate(), before they'll be
used in "phase 2". To try to future-proof the code against other similar
bugs, I also rearranged the order in which nailed relations are visited
during phase 2: now it's pg_class first, then pg_class_oid_index, then
other nailed relations. This should ensure that RelationClearRelation can
apply RelationReloadIndexInfo to all nailed indexes without risking use
of not-yet-revalidated relcache entries.
Back-patch to 9.0 where the relation mapper was introduced.

This works around the problem that a catalog cache entry might contain a
toast pointer that we try to dereference just as a VACUUM FULL completes
on that catalog. We will see the sinval message on the cache entry when
we acquire lock on the toast table, but by that point we've already told
tuptoaster.c "here's the pointer to fetch", so it's difficult from a code
structural standpoint to update the pointer before we use it. Much less
painful to ensure that toast pointers are not invalidated in the first
place. We have to add a bit of code to deal with the case that a value
that previously wasn't toasted becomes so; but that should be a
seldom-exercised corner case, so the inefficiency shouldn't be significant.
Back-patch to 9.0. In prior versions, we didn't allow CLUSTER on system
catalogs, and VACUUM FULL didn't result in reassignment of toast OIDs, so
there was no problem.

The previous code tried to synchronize by unlinking the init file twice,
but that doesn't actually work: it leaves a window wherein a third process
could read the already-stale init file but miss the SI messages that would
tell it the data is stale. The result would be bizarre failures in catalog
accesses, typically "could not read block 0 in file ..." later during
startup.
Instead, hold RelCacheInitLock across both the unlink and the sending of
the SI messages. This is more straightforward, and might even be a bit
faster since only one unlink call is needed.
This has been wrong since it was put in (in 2002!), so back-patch to all
supported releases.

When updating or deleting a system catalog tuple, it's necessary to acquire
RowExclusiveLock on the catalog before looking up the tuple; otherwise a
concurrent VACUUM FULL on the catalog might move the tuple to a different
TID before we can apply the update. Coding patterns that find the tuple
via a table scan aren't at risk here, but when obtaining the tuple from a
catalog cache, correct ordering is important; and several routines in
foreigncmds.c got it wrong. Noted while running the regression tests in
parallel with VACUUM FULL of assorted system catalogs.
For consistency I moved all the heap_open calls to the starts of their
functions, including a couple for which there was no actual bug.
Back-patch to 8.4 where foreigncmds.c was added.

The statement start timestamp was not set before initiating the transaction
that is used to look up client authentication information in pg_authid.
In consequence, enable_sig_alarm computed a wrong value (far in the past)
for statement_fin_time. That didn't have any immediate effect, because the
timeout alarm was set without reference to statement_fin_time; but if we
subsequently blocked on a lock for a short time, CheckStatementTimeout
would consult the bogus value when we cancelled the lock timeout wait,
and then conclude we'd timed out, leading to immediate failure of the
connection attempt. Thus an innocent "vacuum full pg_authid" would cause
failures of concurrent connection attempts. Noted while testing other,
more serious consequences of vacuum full on system catalogs.
We should set the statement timestamp before StartTransactionCommand(),
so that the transaction start timestamp is also valid. I'm not sure if
there are any non-cosmetic effects of it not being valid, but the xact
timestamp is at least sent to the statistics machinery.
Back-patch to 9.0. Before that, the client authentication timeout was done
outside any transaction and did not depend on this state to be valid.

Fix nested PlaceHolderVar expressions that appear only in targetlists.

A PlaceHolderVar's expression might contain another, lower-level
PlaceHolderVar. If the outer PlaceHolderVar is used, the inner one
certainly will be also, and so we have to make sure that both of them get
into the placeholder_list with correct ph_may_need values during the
initial pre-scan of the query (before deconstruct_jointree starts).
We did this correctly for PlaceHolderVars appearing in the query quals,
but overlooked the issue for those appearing in the top-level targetlist;
with the result that nested placeholders referenced only in the targetlist
did not work correctly, as illustrated in bug #6154.
While at it, add some error checking to find_placeholder_info to ensure
that we don't try to create new placeholders after it's too late to do so;
they have to all be created before deconstruct_jointree starts.
Back-patch to 8.4 where the PlaceHolderVar mechanism was introduced.

This kluge was inserted in a spot apparently chosen at random: the lock
manager's state is not yet fully set up for the wait, and in particular
LockWaitCancel hasn't been armed by setting lockAwaited, so the ProcLock
will not get cleaned up if the ereport is thrown. This seems to not cause
any observable problem in trivial test cases, because LockReleaseAll will
silently clean up the debris; but I was able to cause failures with tests
involving subtransactions.
Fixes breakage induced by commit c85c941470efc44494fd7a5f426ee85fc65c268c.
Back-patch to all affected branches.

It was initialized in the wrong place and to the wrong value. With bad
luck this could result in incorrect query-cancellation failures in hot
standby sessions, should a HS backend be holding pin on buffer number 1
while trying to acquire a lock.

pg_backup_db.c contained a mini SQL lexer with which it tried to identify
boundaries between SQL commands, but that code was not designed to cope
with standard_conforming_strings, and would get the wrong answer if a
backslash immediately precedes a closing single quote in such a string,
as per report from Julian Mehnle. The bug only affects direct-to-database
restores from archive files made with standard_conforming_strings = on.
Rather than complicating the code some more to try to fix that, let's just
rip it all out. The only reason it was needed was to cope with COPY data
embedded into ordinary archive entries, which was a layout that was used
only for about the first three weeks of the archive format's existence,
and never in any production release of pg_dump. Instead, just rely on the
archive file layout to tell us whether we're printing COPY data or not.
This bug represents a data corruption hazard in all releases in which
standard_conforming_strings can be turned on, ie 8.2 and later, so
back-patch to all supported branches.

In many cases, pqsecure_read/pqsecure_write set up useful error messages,
which were then overwritten with useless ones by their callers. Fix this
by defining the responsibility to set an error message to be entirely that
of the lower-level function when using SSL.
Back-patch to 8.3; the code is too different in 8.2 to be worth the
trouble.

This disables an entirely unnecessary "sanity check" that causes failures
in nonblocking mode, because OpenSSL complains if we move or compact the
write buffer. The only actual requirement is that we not modify pending
data once we've attempted to send it, which we don't. Per testing and
research by Martin Pihlak, though this fix is a lot simpler than his patch.
I put the same change into the backend, although it's less clear whether
it's necessary there. We do use nonblock mode in some situations in
streaming replication, so seems best to keep the same behavior in the
backend as in libpq.
Back-patch to all supported releases.

PQsetvalue unnecessarily duplicated the logic in pqAddTuple, and didn't
duplicate it exactly either --- pqAddTuple does not care what is in the
tuple-pointer array positions beyond the last valid entry, whereas the
code in PQsetvalue assumed such positions would contain NULL. This led
to possible crashes if PQsetvalue was applied to a PGresult that had
previously been enlarged with pqAddTuple, for instance one built from a
server query. Fix by relying on pqAddTuple instead of duplicating logic,
and not assuming anything about the contents of res->tuples[res->ntups].
Back-patch to 8.4, where PQsetvalue was introduced.
Andrew Chernow

This fixes SSPI login failures showing "The function
requested is not supported", often showing up when connecting
to localhost. The reason was not properly updating the SSPI
handle when multiple roundtrips were required to complete the
authentication sequence.
Report and analysis by Ahmed Shinwari, patch by Magnus Hagander

Fix two ancient bugs in GiST code to re-find a parent after page split:

First, when following a right-link, we incorrectly marked the current page
as the parent of the right sibling. In reality, the parent of the right page
is the same as the parent of the current page (or some page to the right of
it, gistFindCorrectParent() will sort that out).
Secondly, when we follow a right-link, we must prepend, not append, the right
page to our list of pages to visit. That's because we assume that once we
hit a leaf page in the list, all the rest are leaf pages too, and give up.
To hit these bugs, you need concurrent actions and several unlucky accidents.
Another backend must split the root page, while you're in process of
splitting a lower-level page. Furthermore, while you scan the internal nodes
to re-find the parent, another backend needs to again split some more internal
pages. Even then, the bugs don't necessarily manifest as user-visible errors
or index corruption.
While we're at it, make the error reporting a bit better if gistFindPath()
fails to re-find the parent. It used to be an assertion, but an elog() seems
more appropriate.
Backpatch to all supported branches.

There's a heuristic in estimate_rel_size() to clamp the minimum size
estimate for a table to 10 pages, unless we can see that vacuum or analyze
has been run (and set relpages to something nonzero, so this will always
happen for a table that's actually empty). However, it would be better
not to do this for inheritance parent tables, which very commonly are
really empty and can be expected to stay that way. Per discussion of a
recent pgsql-performance report from Anish Kejariwal. Also prevent it
from happening for indexes (although this is more in the nature of
documentation, since CREATE INDEX normally initializes relpages to
something nonzero anyway).
Back-patch to 9.0, because the ability to collect statistics across a
whole inheritance tree has improved the planner's estimates to the point
where this relatively small error makes a significant difference. In the
referenced report, merge or hash joins were incorrectly estimated as
cheaper than a nestloop with inner indexscan on the inherited table.
That was less likely before 9.0 because the lack of inherited stats would
have resulted in a default (and rather pessimistic) estimate of the cost
of a merge or hash join.

Fix another oversight in logging of changes in postgresql.conf settings.

We were using GetConfigOption to collect the old value of each setting,
overlooking the possibility that it didn't exist yet. This does happen
in the case of adding a new entry within a custom variable class, as
exhibited in bug #6097 from Maxim Boguk.
To fix, add a missing_ok parameter to GetConfigOption, but only in 9.1
and HEAD --- it seems possible that some third-party code is using that
function, so changing its API in a minor release would cause problems.
In 9.0, create a near-duplicate function instead.

In the example for decode(), show the bytea result in hex format,
since that's now the default. Use an E'' string in the example for
quote_literal(), so that it works regardless of the
standard_conforming_strings setting. On the functions-for-binary-strings
page, leave the examples as-is for readability, but add a note pointing out
that they are shown in escape format. Per comments from Thom Brown.
Also, improve the description for encode() and decode() a tad.
Backpatch to 9.0, where bytea_output was introduced.

handleCopyIn incremented pset.lineno for each line of COPY data read from
a file. This is correct when reading from the current script file (i.e.,
we are doing COPY FROM STDIN followed by in-line data), but it's wrong if
the data is coming from some other file. Per bug #6083 from Steve Haslam.
Back-patch to all supported versions.

Somehow, column rolconfig got removed from the documentation of the
pg_roles view in the 9.0 cycle, although the column is actually still
there. In 9.1, we'd also forgotten to document the rolreplication column.
Spotted by Sakamoto Masahiko.

Fix EXPLAIN to handle gating Result nodes within inner-indexscan subplans.

It is possible for a NestLoop plan node to pass an OUTER Var into an
"inner indexscan" that is an Append construct (derived from an inheritance
tree or UNION ALL subquery). The OUTER tuple is then passed down at
runtime to the leaf indexscan node(s) where it will actually be used.
EXPLAIN has to likewise pass the information about the nestloop's outer
subplan down through the Append node, else it will fail to print the
outer-reference Vars (with complaints like "bogus varno: 65001").
However, there was a case missed in all this: we could also have gating
Result nodes that were inserted into the appendrel plan tree to deal with
pseudoconstant qual conditions. So EXPLAIN has to pass down the outer plan
node to a Result's subplan, too. Per example from Jon Nelson.
The problem is gone in 9.1 because we replaced the nestloop outer-tuple
kluge with a Param-based data transfer mechanism. Also, so far as I can
tell, the case can't happen before 8.4 because of restrictions on what
sorts of appendrel members could be pulled up into the parent query.
So this patch is only needed for 8.4 and 9.0.

In pg_upgrade 9.0 and 9.1, document suggestion of using a non-default port number to avoid unintended client connections.

Such a condition is unsatisfiable in combination with any other type of
btree-indexable condition (since we assume btree operators are always
strict). 8.3 and 8.4 had an explicit test for this, which I removed in
commit 29c4ad98293e3c5cb3fcdd413a3f4904efff8762, mistakenly thinking that
the case would be subsumed by the more general handling of IS (NOT) NULL
added in that patch. Put it back, and improve the comments about it, and
add a regression test case.
Per bug #6079 from Renat Nasyrov, and analysis by Dean Rasheed.

Reduce impact of btree page reuse on Hot Standby by fixing off-by-1 error. WAL records of type XLOG_BTREE_REUSE_PAGE were generated using a latestRemovedXid one higher than actually needed because xid used was page opaque->btpo.xact rather than an actually removed xid. Noticed on an otherwise quiet system by Noah Misch.

A password containing a character with the high bit set was misprocessed
on machines where char is signed (which is most). This could cause the
preceding one to three characters to fail to affect the hashed result,
thus weakening the password. The result was also unportable, and failed
to match some other blowfish implementations such as OpenBSD's.
Since the fix changes the output for such passwords, upstream chose
to provide a compatibility hack: password salts beginning with $2x$
(instead of the usual $2a$ for blowfish) are intentionally processed
"wrong" to give the same hash as before. Stored password hashes can
thus be modified if necessary to still match, though it'd be better
to change any affected passwords.
In passing, sync a couple other upstream changes that marginally improve
performance and/or tighten error checking.
Back-patch to all supported branches. Since this issue is already
public, no reason not to commit the fix ASAP.

When recursing after an optimization in pull_up_sublinks_qual_recurse, the
available_rels value passed down must include only the relations that are
in the righthand side of the new SEMI or ANTI join; it's incorrect to pull
up a sub-select that refers to other relations, as seen in the added test
case. Per report from BangarRaju Vadapalli.
While at it, rethink the idea of recursing below a NOT EXISTS. That is
essentially the same situation as pulling up ANY/EXISTS sub-selects that
are in the ON clause of an outer join, and it has the same disadvantage:
we'd force the two joins to be evaluated according to the syntactic nesting
order, because the lower join will most likely not be able to commute with
the ANTI join. That could result in having to form a rather large join
product, whereas the handling of a correlated subselect is not quite that
dumb. So until we can handle those cases better, #ifdef NOT_USED that
case. (I think it's okay to pull up in the EXISTS/ANY cases, because SEMI
joins aren't so inflexible about ordering.)
Back-patch to 8.4, same as for previous patch in this area. Fortunately
that patch hadn't made it into any shipped releases yet.

I mis-simplified the test where ANALYZE decided if it could get away
without doing anything: under the new regime, that's never allowed. Per
bug #6068 from Jeff Janes. Back-patch to 8.4, just like previous patch.

This is a dangerous example to provide because on machines with GNU cp,
it will silently do the wrong thing and risk archive corruption. Worse,
during the 9.0 cycle somebody "improved" the discussion by removing the
warning that used to be there about that, and instead leaving the
impression that the command would work as desired on most Unixen.
It doesn't. Try to rectify the damage by providing an example that is safe
most everywhere, and then noting that you can try cp -i if you want but
you'd better test that.
In back-patching this to all supported branches, I also added an example
command for Windows, which wasn't provided before 9.0.

For some reason, when we (I) added table lock acquisition to pg_dump,
we didn't think about making it happen as soon as possible after the
start of the transaction. What with subsequent additions, there was
actually quite a lot going on before we got around to that; which sort
of defeats the purpose. Rearrange the order of calls in dumpSchema()
to close the risk window as much as we easily can. Back-patch to all
supported branches.

The previous code went into an infinite loop after overflow. In fact,
an overflow is not really an error; it just means that the current
value is the last one we need to return. So, just arrange to stop
immediately when overflow is detected.
Back-patch all the way.

Respect Hot Standby controls while recycling btree index pages. Btree pages were recycled after VACUUM deletes all records on a page and then a subsequent VACUUM occurs after the RecentXmin horizon is reached. Using RecentXmin meant that we did not respond correctly to the user controls provide to avoid Hot Standby conflicts and so spurious conflicts could be generated in some workload combinations. We now reuse pages only when we reach RecentGlobalXmin, which can be much later in the presence of long running queries and is also controlled by vacuum_defer_cleanup_age.

This oversight could result in a tuplestore using much more than the
intended amount of memory. It would only happen in a code path that loaded
a tuplestore via tuplestore_putvalues(), and many of those won't emit huge
amounts of data; but cases such as holdable cursors and plpgsql's RETURN
NEXT command could have the problem. The fix ensures that the tuplestore
will switch to write-to-disk mode when it overruns work_mem.
The potential overrun was finite, because we would still count the space
used by the tuple pointer array, so the tuplestore code would eventually
flip into write-to-disk mode anyway. When storing wide tuples we would
go far past the expected work_mem usage before that happened; but this
may account for the lack of prior reports.
Back-patch to 8.4, where tuplestore_putvalues was introduced.
Per bug #6061 from Yann Delorme.

In pg_upgrade, document that link mode has to have data directories on the same file system, and that authentication should lock out normal users.

The previous wording wasn't explicit enough, which could misled readers
into thinking that the locks acquired are more restricted in nature than
they really are. The resulting optimism can be damaging to morale when
confronted with reality, as has been observed in the field.
Greg Smith

ReadRecord's habit of using both direct references to tmpRecPtr and
references to *RecPtr (which is pointing at tmpRecPtr) triggers an
optimization bug in gcc 4.6.0, which apparently has forgotten about
aliasing rules. Avoid the compiler bug, and make the code more readable
to boot, by getting rid of the direct references. Improve the comments
while at it.
Back-patch to all supported versions, in case they get built with 4.6.0.
Tom Lane, with some cosmetic suggestions from Alex Hunsaker

The documentation of the columns collection_type_identifier and
dtd_identifier was wrong. This effectively reverts commits
8e1ccad51901e83916dae297cd9afa450957a36c and
57352df66d3a0885899d39c04c067e63c7c0ba30 and updates the name
array_type_identifier (the name in SQL:1999) to
collection_type_identifier.
closes bug #5926

We were trying to make that strictly an internal implementation detail,
but it turns out that it's exposed anyway when dumping a view defined
like
CREATE VIEW test_view AS VALUES (1), (2), (3) ORDER BY 1;
This comes out as
CREATE VIEW ... ORDER BY "*VALUES*".column1;
which fails to parse when reloading the dump.
Hacking ruleutils.c to suppress the column qualification looks like it'd
be a risky business, so instead promote the RTE alias to full-fledged
usability.
Per bug #6049 from Dylan Adams. Back-patch to all supported branches.

My previous commit disallowed this operation, but did nothing about
cleaning up the damage if one had already been done. With the operation
disallowed, it's okay to just forcibly clear xmax in a sequence's tuple,
since any value seen there could not represent a live transaction's lock.
So, any sequence-specific operation will repair the problem automatically,
whether or not the user has already seen "could not access status of
transaction" failures.

We can't allow this because such an operation stores its transaction XID
into the sequence tuple's xmax. Because VACUUM doesn't process sequences
(and we don't want it to start doing so), such an xmax value won't get
frozen, meaning it will eventually refer to nonexistent pg_clog storage,
and even wrap around completely. Since the row lock is ignored by nextval
and setval, the usefulness of the operation is highly debatable anyway.
Per reports of trouble with pgpool 3.0, which had ill-advisedly started
using such commands as a form of locking.
In HEAD, also disallow SELECT FOR UPDATE/SHARE on toast tables. Although
this does work safely given the current implementation, there seems no
good reason to allow it. I refrained from changing that behavior in
back branches, however.

Apparently sane-looking penalty code might return small negative values,
for example because of roundoff error. This will confuse places like
gistchoose(). Prevent problems by clamping negative penalty values to
zero. (Just to be really sure, I also made it force NaNs to zero.)
Back-patch to all supported branches.
Alexander Korotkov

Fix portability bugs in use of credentials control messages for peer auth.

Even though our existing code for handling credentials control messages has
been basically unchanged since 2001, it was fundamentally wrong: it did not
ensure proper alignment of the supplied buffer, and it was calculating
buffer sizes and message sizes incorrectly. This led to failures on
platforms where alignment padding is relevant, for instance FreeBSD on
64-bit platforms, as seen in a recent Debian bug report passed on by
Martin Pitt (http://bugs.debian.org//cgi-bin/bugreport.cgi?bug=612888).
Rewrite to do the message-whacking using the macros specified in RFC 2292,
following a suggestion from Theo de Raadt in that thread. Tested by me
on Debian/kFreeBSD-amd64; since OpenBSD and NetBSD document the identical
CMSG API, it should work there too.
Back-patch to all supported branches.

When we added the ability for vacuum to skip heap pages by consulting the
visibility map, we made it just not update the reltuples/relpages
statistics if it skipped any pages. But this could leave us with extremely
out-of-date stats for a table that contains any unchanging areas,
especially for TOAST tables which never get processed by ANALYZE. In
particular this could result in autovacuum making poor decisions about when
to process the table, as in recent report from Florian Helmberger. And in
general it's a bad idea to not update the stats at all. Instead, use the
previous values of reltuples/relpages as an estimate of the tuple density
in unvisited pages. This approach results in a "moving average" estimate
of reltuples, which should converge to the correct value over multiple
VACUUM and ANALYZE cycles even when individual measurements aren't very
good.
This new method for updating reltuples is used by both VACUUM and ANALYZE,
with the result that we no longer need the grotty interconnections that
caused ANALYZE to not update the stats depending on what had happened
in the parent VACUUM command.
Also, fix the logic for skipping all-visible pages during VACUUM so that it
looks ahead rather than behind to decide what to do, as per a suggestion
from Greg Stark. This eliminates useless scanning of all-visible pages at
the start of the relation or just after a not-all-visible page. In
particular, the first few pages of the relation will not be invariably
included in the scanned pages, which seems to help in not overweighting
them in the reltuples estimate.
Back-patch to 8.4, where the visibility map was introduced.

parse_xml_decl's header comment says you can pass NULL for any unwanted
output parameter, but it failed to honor this contract for the "standalone"
flag. The only currently-affected caller is xml_recv, so the net effect is
that sending a binary XML value containing a standalone parameter in its
xml declaration would crash the backend. Per bug #6044 from Christopher
Dillard.
In passing, remove useless initializations of parse_xml_decl's output
parameters in xml_parse.
Back-patch to 8.3, where this code was introduced.

This is necessary to avoid long-term memory leakage, because the main loop
in PostgresMain expects to be executing in MessageContext, and hence is a
bit sloppy about freeing stuff that is only needed for the duration of
processing the current client message. The known case of an actual leak
is when encoding conversion has to be done on the incoming command string,
but there might be others. Per report from Per-Olov Esgard.
Back-patch to 9.0, where the bug was introduced by the LISTEN/NOTIFY
rewrite.

We had some hacks in ruleutils.c to cope with various odd transformations
that the optimizer could do on a CASE foo WHEN "CaseTestExpr = RHS" clause.
However, the fundamental impossibility of covering all cases was exposed
by Heikki, who pointed out that the "=" operator could get replaced by an
inlined SQL function, which could contain nearly anything at all. So give
up on the hacks and just print the expression as-is if we fail to recognize
it as "CaseTestExpr = RHS". (We must cover that case so that decompiled
rules print correctly; but we are not under any obligation to make EXPLAIN
output be 100% valid SQL in all cases, and already could not do so in some
other cases.) This approach requires that we have some printable
representation of the CaseTestExpr node type; I used "CASE_TEST_EXPR".
Back-patch to all supported branches, since the problem case fails in all.

convert_tuples_by_position was rejecting attempts to coerce a record field
with -1 typmod to the same type with a non-default typmod. This is in fact
the "correct" thing to do (since we're just going to do a type relabeling,
not invoke any length-conversion cast function); but it results in
rejecting valid cases like bug #6020, because the source record's tupdesc
is built from Params that don't have typmod assigned. Since that's a
regression from previous versions, which accepted this code, we have to do
something about it. In HEAD, I've fixed the problem properly by causing
the Params to receive the correct typmods; but the potential for incidental
behavioral changes seems high enough to make it unattractive to make the
same change in released branches. (And it couldn't be fixed that way in
8.4 anyway...) Hence this patch just modifies convert_tuples_by_position
to not complain if either the input or the output tupdesc has typmod -1.
This is still a shade tighter checking than we did before 9.0, since before
that plpgsql failed to consider typmods at all when checking record
compatibility. (convert_tuples_by_position is currently used only by
plpgsql, so we're not affecting other behavior.)
Back-patch to 8.4, since we recently back-ported convert_tuples_by_position
into that branch.

The planner can sometimes compute very large values for numGroups, and in
cases where we have no alternative to building a hashtable, such a value
will get fed directly to BuildTupleHashTable as its nbuckets parameter.
There were two ways in which that could go bad. First, BuildTupleHashTable
declared the parameter as "int" but most callers were passing "long"s,
so on 64-bit machines undetected overflow could occur leading to a bogus
negative value. The obvious fix for that is to change the parameter to
"long", which is what I've done in HEAD. In the back branches that seems a
bit risky, though, since third-party code might be calling this function.
So for them, just put in a kluge to treat negative inputs as INT_MAX.
Second, hash_create can go nuts with extremely large requested table sizes
(notably, my_log2 becomes an infinite loop for inputs larger than
LONG_MAX/2). What seems most appropriate to avoid that is to bound the
initial table size request to work_mem.
This fixes bug #6035 reported by Daniel Schreiber. Although the reported
case only occurs back to 8.4 since it involves WITH RECURSIVE, I think
it's a good idea to install the defenses in all supported branches.

The code to assemble ldap_get_values_len's output into a single string
wrote the terminating null one byte past where it should. Fix that,
and make some other cosmetic adjustments to make the code a trifle more
readable and more in line with usual Postgres coding style.
Also, free the "result" string when done with it, to avoid a permanent
memory leak.
Bug report and patch by Albe Laurenz, cosmetic adjustments by me.

Shut down WAL receiver if it’s still running at end of recovery. We used to just check that it’s not running and PANIC if it was, but that can rightfully happen if recovery stops at recovery target.

After finding an EXISTS or ANY sub-select that can be converted to a
semi-join or anti-join, we should recurse into the body of the sub-select.
This allows cases such as EXISTS-within-EXISTS to be optimized properly.
The original coding would leave the lower sub-select as a SubLink, which
is no better and often worse than what we can do with a join. Per example
from Wayne Conrad.
Back-patch to 8.4. There is a related issue in older versions' handling
of pull_up_IN_clauses, but they're lame enough anyway about the whole area
that it seems not worth the extra work to try to fix.

We must lock out autovacuuming of the old toast table before computing the
OldestXmin horizon we will use. Otherwise, autovacuum could start on the
toast table later, compute a later OldestXmin horizon, and remove as DEAD
toast tuples that we still need (because we think their parent tuples are
only RECENTLY_DEAD). Per further thought about bug #5998.

VACUUM was willing to remove a committed-dead tuple immediately if it was
deleted by the same transaction that inserted it. The idea is that such a
tuple could never have been visible to any other transaction, so we don't
need to keep it around to satisfy MVCC snapshots. However, there was
already an exception for tuples that are part of an update chain, and this
exception created a problem: we might remove TOAST tuples (which are never
part of an update chain) while their parent tuple stayed around (if it was
part of an update chain). This didn't pose a problem for most things,
since the parent tuple is indeed dead: no snapshot will ever consider it
visible. But MVCC-safe CLUSTER had a problem, since it will try to copy
RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST
data too, and would fail if VACUUM had already removed the toast tuples.
Easiest fix is to get rid of the special case for xmin == xmax. This may
delay reclaiming dead space for a little bit in some cases, but it's by far
the most reliable way to fix the issue.
Per bug #5998 from Mark Reid. Back-patch to 8.3, which is the oldest
version with MVCC-safe CLUSTER.

Convert it to use successive shifts right instead of increasing a divisor.
This is probably a tad more efficient than the original coding, and it's
nicer-looking than the previous patch because we don't need a special case
to avoid overflow in the last branch. But the real reason to do it is to
avoid a Solaris compiler bug, as per results from buildfarm member moa.

The arguments to pg_ctl kill are not optional - remove brackets in the docs.

There was already one recommendation in the documentation about writing
C functions to ensure padding bytes are zeroes, but make it stronger.
Also fix an example that was still using direct assignment to a varlena
length word, which no longer works since the varvarlena changes.

Per recent discussion, it's important for all computed datums (not only the
results of input functions) to not contain any ill-defined (uninitialized)
bits. Failing to ensure that can result in equal() reporting that
semantically indistinguishable Consts are not equal, which in turn leads to
bizarre and undesirable planner behavior, such as in a recent example from
David Johnston. We might eventually try to fix this in a general manner by
allowing datatypes to define identity-testing functions, but for now the
path of least resistance is to expect datatypes to force all unused bits
into consistent states.
Per some testing by Noah Misch, array and path functions seem to be the
only ones presenting risks at the moment, so I looked through all the
functions in adt/array*.c and geo_ops.c and fixed them as necessary. In
the array functions, the easiest/safest fix is to allocate result arrays
with palloc0 instead of palloc. Possibly in future someone will want to
look into whether we can just zero the padding bytes, but that looks too
complex for a back-patchable fix. In the path functions, we already had a
precedent in path_in for just zeroing the one known pad field, so duplicate
that code as needed.
Back-patch to all supported branches.

Most commenters agreed that this is more friendly than silently failing
to match the line during actual connection attempts. Also, this will
prevent corner cases that might arise when trying to handle such a line
when the SSL code isn't turned on. An example is that specifying
clientcert=1 in such a line would formerly result in a completely
misleading complaint that root.crt wasn't present, as seen in a recent
report from Marc-Andre Laverdiere. While we could have instead fixed
that specific behavior, it seems likely that we'd have a continuing stream
of such bizarre behaviors if we keep on allowing hostssl lines when SSL is
disabled.
Back-patch to 8.4, where clientcert was introduced. Earlier versions don't
have this specific issue, and the code is enough different to make this
patch not applicable without more work than it seems worth.

The expression that tried to round the value to the nearest TB could
overflow, leading to bogus output as reported in bug #5993 from Nicola
Cossu. This isn't likely to ever happen in the intended usage of the
function (if it could, we'd be needing to use a wider datatype instead);
but it's not hard to give the expected output, so let's do so.

If we find a DELETE_IN_PROGRESS HOT-updated tuple, it is impossible to know
whether to index it or not except by waiting to see if the deleting
transaction commits. If it doesn't, the tuple might again be LIVE, meaning
we have to index it. So wait and recheck in that case.
Also, we must not rely on ii_BrokenHotChain to decide that it's possible to
omit tuples from the index. That could result in omitting tuples that we
need, particularly in view of yesterday's fixes to not necessarily set
indcheckxmin (but it's broken even without that, as per my analysis today).
Since this is just an extremely marginal performance optimization, dropping
the test shouldn't hurt.
These cases are only expected to happen in system catalogs (they're
possible there due to early release of RowExclusiveLock in most
catalog-update code paths). Since reindexing of a system catalog isn't a
particularly performance-critical operation anyway, there's no real need to
be concerned about possible performance degradation from these changes.
The worst aspects of this bug were introduced in 9.0 --- 8.x will always
wait out a DELETE_IN_PROGRESS tuple. But I think dropping index entries
on the strength of ii_BrokenHotChain is dangerous even without that, so
back-patch removal of that optimization to 8.3 and 8.4.

Set indcheckxmin true when REINDEX fixes an invalid or not-ready index.

Per comment from Greg Stark, it's less clear that HOT chains don't conflict
with the index than it would be for a valid index. So, let's preserve the
former behavior that indcheckxmin does get set when there are
potentially-broken HOT chains in this case. This change does not cause any
pg_index update that wouldn't have happened anyway, so we're not
re-introducing the previous bug with pg_index updates, and surely the case
is not significant from a performance standpoint; so let's be as
conservative as possible.

Quotes in strings injected into bki file need to escaped. In particular, “People’s Republic of China” locale on Windows was causing initdb to fail.

This fixes bug #5818 reported by yulei. On master, this makes the mapping
of "People's Republic of China" to just "China" obsolete. In 9.0 and 8.4,
just fix the escaping. Earlier versions didn't have locale names in bki
file.

There can never be a need to push the indcheckxmin horizon forward, since
any HOT chains that are actually broken with respect to the index must
pre-date its original creation. So we can just avoid changing pg_index
altogether during a REINDEX operation.
This offers a cleaner solution than my previous patch for the problem
found a few days ago that we mustn't try to update pg_index while we are
reindexing it. System catalog indexes will always be created with
indcheckxmin = false during initdb, and with this modified code we should
never try to change their pg_index entries. This avoids special-casing
system catalogs as the former patch did, and should provide a performance
benefit for many cases where REINDEX formerly caused an index to be
considered unusable for a short time.
Back-patch to 8.3 to cover all versions containing HOT. Note that this
patch changes the API for index_build(), but I believe it is unlikely that
any add-on code is calling that directly.

The places that attempt to change pg_index.indcheckxmin during a reindexing
operation cannot be executed safely if pg_index itself is the subject of
the operation. This is the explanation for a couple of recent reports of
VACUUM FULL failing with
ERROR: duplicate key value violates unique constraint "pg_index_indexrelid_index"
DETAIL: Key (indexrelid)=(2678) already exists.
However, there isn't any real need to update indcheckxmin in such a
situation, if we assume that pg_index can never contain a truly broken HOT
chain. This assumption holds if new indexes are never created on it during
concurrent operations, which is something we don't consider safe for any
system catalog, not just pg_index. Accordingly, modify the code to not
manipulate indcheckxmin when reindexing any system catalog.
Back-patch to 8.3, where HOT was introduced. The known failure scenarios
involve 9.0-style VACUUM FULL, so there might not be any real risk before
9.0, but let's not assume that.