Per discussion, it does not seem like a good idea to change the behavior of
age(xid) in a minor release, even though the old definition causes the
function to fail on hot standby slaves. Therefore, revert commit
5829387381d2e4edf84652bb5a712f6185860670 and follow-on commits in the back
branches only.

It's not very sensible to set such attributes on a handler function;
but if one were to do so, fmgr.c went into infinite recursion because
it would call fmgr_security_definer instead of the handler function proper.
There is no way for fmgr_security_definer to know that it ought to call the
handler and not the original function referenced by the FmgrInfo's fn_oid,
so it tries to do the latter, causing the whole process to start over
again.
Ordinarily such misconfiguration of a procedural language's handler could
be written off as superuser error. However, because we allow non-superuser
database owners to create procedural languages and the handler for such a
language becomes owned by the database owner, it is possible for a database
owner to crash the backend, which ideally shouldn't be possible without
superuser privileges. In 9.2 and up we will adjust things so that the
handler functions are always owned by superusers, but in existing branches
this is a minor security fix.
Problem noted by Noah Misch (after several of us had failed to detect
it :-(). This is CVE-2012-2655.

Expand the allowed range of timezone offsets to +/-15:59:59 from Greenwich.

We used to only allow offsets less than +/-13 hours, then it was +/14,
then it was +/-15. That's still not good enough though, as per today's bug
report from Patric Bechtel. This time I actually looked through the Olson
timezone database to find the largest offsets used anywhere. The winners
are Asia/Manila, at -15:56:00 until 1844, and America/Metlakatla, at
+15:13:42 until 1867. So we'd better allow offsets less than +/-16 hours.
Given the history, we are way overdue to have some greppable #define
symbols controlling this, so make some ... and also remove an obsolete
comment that didn't get fixed the last time.
Back-patch to all supported branches.

Overly tight coding caused the password transformation loop to stop
examining input once it had processed a byte equal to 0x80. Thus, if the
given password string contained such a byte (which is possible though not
highly likely in UTF8, and perhaps also in other non-ASCII encodings), all
subsequent characters would not contribute to the hash, making the password
much weaker than it appears on the surface.
This would only affect cases where applications used DES crypt() to encode
passwords before storing them in the database. If a weak password has been
created in this fashion, the hash will stop matching after this update has
been applied, so it will be easy to tell if any passwords were unexpectedly
weak. Changing to a different password would be a good idea in such a case.
(Since DES has been considered inadequately secure for some time, changing
to a different encryption algorithm can also be recommended.)
This code, and the bug, are shared with at least PHP, FreeBSD, and OpenBSD.
Since the other projects have already published their fixes, there is no
point in trying to keep this commit private.
This bug has been assigned CVE-2012-2143, and credit for its discovery goes
to Rubin Xu and Joseph Bonneau.

AbortOutOfAnyTransaction failed to do anything if the state it saw on
entry corresponded to failing partway through StartTransaction. I fixed
AbortCurrentTransaction to cope with that case way back in commit
60b2444cc3ba037630c9b940c3c9ef01b954b87b, but evidently overlooked that
AbortOutOfAnyTransaction should do likewise.
Back-patch to all supported branches. It's not clear that this omission
has any more-than-cosmetic consequences, but it's also not clear that it
doesn't, so back-patching seems the least risky choice.

Write the file to a temporary name and then rename() it into the
permanent name, to ensure it can't end up half-written and corrupt
in case of a crash during shutdown.
Unlink the file after it has been read so it's removed from the data
directory and not included in base backups going to replication slaves.

The only interesting-for-performance case wherein we force heapscan here
is when we're rebuilding the relcache init file, and the only such case
that is likely to be examining a catalog big enough to be syncscanned is
RelationBuildTupleDesc. But the early-exit optimization in that code gets
broken if we start the scan at a random place within the catalog, so that
allowing syncscan is actually a big deoptimization if pg_attribute is large
(at least for the normal case where the rows for core system catalogs have
never been changed since initdb). Hence, prevent syncscan here. Per my
testing pursuant to complaints from Jeff Frost and Greg Sabino Mullane,
though neither of them seem to have actually hit this specific problem.
Back-patch to 8.3, where syncscan was introduced.

Fix string truncation to be multibyte-aware in text_name and bpchar_name.

Previously, casts to name could generate invalidly-encoded results.
Also, make these functions match namein() more exactly, by consistently
using palloc0() instead of ad-hoc zeroing code.
Back-patch to all supported branches.
Karl Schnaitter and Tom Lane

The previous coding presented a significant bottleneck when dumping
databases containing many thousands of schemas, since the total time
spent searching would increase roughly as O(N^2) in the number of objects.
Noted by Jeff Janes, though I rewrote his proposed patch to use the
existing findObjectByOid infrastructure.
Since this is a longstanding performance bug, backpatch to all supported
versions.

If a seqscan encounters many consecutive pages containing only dead tuples,
it can remain in the loop in heapgettup for a long time, and there was no
CHECK_FOR_INTERRUPTS anywhere in that loop. This meant there were
real-world situations where a query would be effectively uncancelable for
long stretches. Add a check placed to occur once per page, which should be
enough to provide reasonable response time without adding any measurable
overhead.
Report and patch by Merlin Moncure (though I tweaked it a bit).
Back-patch to all supported branches.

If the tablespace directory is missing entirely, we allow DROP TABLESPACE
to go through, on the grounds that it should be possible to clean up the
catalog entry in such a situation. However, we forgot that the pg_tblspc
symlink might still be there. We should try to remove the symlink too
(but not fail if it's no longer there), since not doing so can lead to
weird behavior subsequently, as per report from Michael Nolan.
There was some discussion of adding dependency links to prevent DROP
TABLESPACE when the catalogs still contain references to the tablespace.
That might be worth doing too, but it's an orthogonal question, and in
any case wouldn't be back-patchable.
Back-patch to 9.0, which is as far back as the logic looks like this.
We could possibly do something similar in 8.x, but given the lack of
reports I'm not sure it's worth the trouble, and anyway the case could
not arise in the form the logic is meant to cover (namely, a post-DROP
transaction rollback having resurrected the pg_tablespace entry after
some or all of the filesystem infrastructure is gone).

The original coding failed to reset ImmediateInterruptOK before returning,
which would potentially allow a subsequent query-cancel interrupt to be
accepted at an unsafe point. This is a really nasty bug since it's so hard
to predict the consequences, but they could be unpleasant.
Also, ensure that signal handlers are serviced before this function
returns, even if the semaphore is already set. This should make the
behavior more like Unix.
Back-patch to all supported versions.

PL/pgSQL RETURN NEXT was leaking converted tuples, causing out of memory when looping through large numbers of rows. Flag the converted tuples to be freed. Complaint and patch by Joe.

Normally whole-row Vars are printed as "tabname.*". However, that does not
work at top level of a targetlist, because per SQL standard the parser will
think that the "*" should result in column-by-column expansion; which is
not at all what a whole-row Var implies. We used to just print the table
name in such cases, which works most of the time; but it fails if the table
name matches a column name available anywhere in the FROM clause. This
could lead for instance to a view being interpreted differently after dump
and reload. Adding parentheses doesn't fix it, but there is a reasonably
simple kluge we can use instead: attach a no-op cast, so that the "*" isn't
syntactically at top level anymore. This makes the printing of such
whole-row Vars a lot more consistent with other Vars, and may indeed fix
more cases than just the reported one; I'm suspicious that cases involving
schema qualification probably didn't work properly before, either.
Per bug report and fix proposal from Abbas Butt, though this patch is quite
different in detail from his.
Back-patch to all supported versions.

If it fails to open a new log file, the syslogger assumes there's something
wrong with its parameters (such as log_directory), and stops attempting
automatic time-based or size-based log file rotations. Sending it SIGHUP
is supposed to start that up again. However, the original coding for that
was really bogus, involving clobbering a couple of GUC variables and hoping
that SIGHUP processing would restore them. Get rid of that technique in
favor of maintaining a separate flag showing we've turned rotation off.
Per report from Mark Kirkwood.
Also, the syslogger will automatically attempt to create the log_directory
directory if it doesn't exist, but that was only happening at startup.
For consistency and ease of use, it should do the same whenever the value
of log_directory is changed by SIGHUP.
Back-patch to all supported branches.

Due to rather sloppy thinking (on my part, I'm afraid) about the
appropriate behavior for boundary conditions, pg_next_dst_boundary() gave
undefined, platform-dependent results when the input time is exactly the
last recorded DST transition time for the specified time zone, as a result
of fetching values one past the end of its data arrays.
Change its specification to be that it always finds the next DST boundary
*after* the input time, and adjust code to match that. The sole existing
caller, DetermineTimeZoneOffset, doesn't actually care about this
distinction, since it always uses a probe time earlier than the instant
that it does care about. So it seemed best to me to change the API to make
the result=1 and result=0 cases more consistent, specifically to ensure
that the "before" outputs always describe the state at the given time,
rather than hacking the code to obey the previous API comment exactly.
Per bug #6605 from Sergey Burladyan. Back-patch to all supported versions.

A number of utility programs were rather careless about paremeters
that can be set via both an option argument and a positional
argument. This leads to results which can violate the Principal
Of Least Astonishment. These changes refuse to use positional
arguments to override settings that have been made via positional
arguments. The changes are backpatched to all live branches.

Clamp indexscan filter condition cost estimate to be not less than zero.

cost_index tries to estimate the per-tuple costs of evaluating filter
conditions (a/k/a qpquals) by subtracting the estimated cost of the
indexqual conditions from that of the baserestrictinfo conditions. This is
correct so long as the indexquals list is a subset of the baserestrictinfo
list. However, in the presence of derived indexable conditions it's
completely wrong, leading to bogus or even negative scan cost estimates,
as seen for example in bug #6579 from Istvan Endredy. In practice the
problem isn't severe except in the specific case of a LIKE optimization on
a functional index containing a very expensive function.
A proper fix for this might change cost estimates by more than people would
like for stable branches, so in the back branches let's just clamp the cost
difference to be not less than zero. That will at least prevent completely
insane behavior, while not changing the results normally.

Fix pg_upgrade to properly upgrade a table that is stored in the cluster default tablespace, but part of a database that is in a user-defined tablespace. Caused “file not found” error during upgrade.

It's still non-deterministic in some sense ... but given fixed settings
and identical planning problems, it will now always choose the same plan,
so we probably shouldn't tar it with that brush. Per bug #6565 from
Guillaume Cottenceau. Back-patch to 9.0 where the behavior was fixed.

estimate_num_groups() gets unhappy with
create table empty();
select * from empty except select * from empty e2;
I can't see any actual use-case for such a query (and the table is illegal
per SQL spec), but it seems like a good idea that it not cause an assert
failure.

We used to only initialize the stack base pointer when starting up a regular
backend, not in other processes. In particular, autovacuum workers can run
arbitrary user code, and without stack-depth checking, infinite recursion
in e.g an index expression will bring down the whole cluster.
The comment about PL/Java using set_stack_base() is not yet true. As the
code stands, PL/java still modifies the stack_base_ptr variable directly.
However, it's been discussed in the PL/Java mailing list that it should be
changed to use the function, because PL/Java is currently oblivious to the
register stack used on Itanium. There's another issues with PL/Java, namely
that the stack base pointer it sets is not really the base of the stack, it
could be something close to the bottom of the stack. That's a separate issue
that might need some further changes to this code, but that's a different
story.
Backpatch to all supported releases.

XLOG_GIN_UPDATE_META_PAGE and XLOG_GIN_DELETE_LISTPAGE records were printed
with a list link field labeled as "blkno", which was confusing, especially
when the link was empty (InvalidBlockNumber). Print the metapage block
number instead, since that's what's actually being updated. We could
include the link values too as a separate field, but not clear it's worth
the trouble.
Back-patch to 8.4 where the dubious code was added.

The original coding of the syslogger had an arbitrary limit of 20 large
messages concurrently in progress, after which it would just punt and dump
message fragments to the output file separately. Our ambitions are a bit
higher than that now, so allow the data structure to expand as necessary.
Reported and patched by Andrew Dunstan; some editing by Tom

dblink_exec leaked temporary database connections if any error occurred
after connection setup, for example
SELECT dblink_exec('...connect string...', 'select 1/0');
Add a PG_TRY block to ensure PQfinish gets done when it is needed.
(dblink_record_internal is on the hairy edge of needing similar treatment,
but seems not to be actively broken at the moment.)
Also, in 9.0 and up, only one of the three functions using tuplestore
return mode was properly checking that the query context would allow
a tuplestore result.
Noted while reviewing dblink patch. Back-patch to all supported branches.

Fix O(N^2) behavior in pg_dump when many objects are in dependency loops.

Combining the loop workspace with the record of already-processed objects
might have been a cute trick, but it behaves horridly if there are many
dependency loops to repair: the time spent in the first step of findLoop()
grows as O(N^2). Instead use a separate flag array indexed by dump ID,
which we can check in constant time. The length of the workspace array
is now never more than the actual length of a dependency chain, which
should be reasonably short in all cases of practical interest. The code
is noticeably easier to understand this way, too.
Per gripe from Mike Roest. Since this is a longstanding performance bug,
backpatch to all supported versions.

The loop that matched owned sequences to their owning tables required time
proportional to number of owned sequences times number of tables; although
this work was only expended in selective-dump situations, which is probably
why the issue wasn't recognized long since. Refactor slightly so that we
can perform this work after the index array for findTableByOid has been
set up, reducing the time to O(M log N).
Per gripe from Mike Roest. Since this is a longstanding performance bug,
backpatch to all supported versions.

The DBLINK_GET_CONN and DBLINK_GET_NAMED_CONN macros did not set the
surrounding function's conname variable, causing errors to be incorrectly
reported as having occurred on the "unnamed" connection in some cases.
This bug was actually visible in two cases in the regression tests,
but apparently whoever added those cases wasn't paying attention.
Noted by Kyotaro Horiguchi, though this is different from his proposed
patch.
Back-patch to 8.4; 8.3 does not have the same type of error reporting
so the patch is not relevant.

Correct epoch of txid_current() when executed on a Hot Standby server. Initialise ckptXidEpoch from starting checkpoint and maintain the correct value as we roll forwards. This allows GetNextXidAndEpoch() to return the correct epoch when executed during recovery. Backpatch to 9.0 when the problem is first observable by a user.

The COPY documentation says "COPY FROM matches the input against the null
string before removing backslashes". It is therefore reasonable to presume
that null markers like E'\\0' will work ... and they did, until someone put
the tests in the wrong order during microoptimization-driven rewrites.
Since then, we've been failing if the null marker is something that would
de-escape to an invalidly-encoded string. Since null markers generally
need to be something that can't appear in the data, this represents a
nontrivial loss of functionality; surprising nobody noticed it earlier.
Per report from Jeff Davis. Backpatch to 8.4 where this got broken.

For some reason, in the original coding of the PlaceHolderVar mechanism
I had supposed that PlaceHolderVars couldn't propagate into subqueries.
That is of course entirely possible. When it happens, we need to treat
an outer-level PlaceHolderVar much like an outer Var or Aggref, that is
SS_replace_correlation_vars() needs to replace the PlaceHolderVar with
a Param, and then when building the finished SubPlan we have to provide
the PlaceHolderVar expression as an actual parameter for the SubPlan.
The handling of the contained expression is a bit delicate but it can be
treated exactly like an Aggref's expression.
In addition to the missing logic in subselect.c, prepjointree.c was failing
to search subqueries for PlaceHolderVars that need their relids adjusted
during subquery pullup. It looks like everyplace else that touches
PlaceHolderVars got it right, though.
Per report from Mark Murawski. In 9.1 and HEAD, queries affected by this
oversight would fail with "ERROR: Upper-level PlaceHolderVar found where
not expected". But in 9.0 and 8.4, you'd silently get possibly-wrong
answers, since the value transmitted into the subquery wouldn't go to null
when it should.

Fix GET DIAGNOSTICS for case of assignment to function’s first variable.

An incorrect and entirely unnecessary "safety check" in exec_stmt_getdiag()
caused the code to treat an assignment to a variable with dno zero as a
no-op. Unfortunately, that's a perfectly valid dno. This has been broken
since GET DIAGNOSTICS was invented. It's not terribly surprising that the
bug went unnoticed for so long, since in most cases you probably wouldn't
use the function's first-created variable (normally its first parameter)
as a GET DIAGNOSTICS target. Nonetheless, it's broken. Per bug #6551
from Adam Buraczewski.

Since 9.0, removing lots of large objects in a single transaction risks
exceeding max_locks_per_transaction, because we merged large object removal
into the generic object-drop mechanism, which takes out an exclusive lock
on each object to be dropped. This creates a hazard for contrib/vacuumlo,
which has historically tried to drop all unreferenced large objects in one
transaction. There doesn't seem to be any correctness requirement to do it
that way, though; we only need to drop enough large objects per transaction
to amortize the commit costs.
To prevent a regression from pre-9.0 releases wherein vacuumlo worked just
fine, back-patch commits b69f2e36402aaa222ed03c1769b3de6d5be5f302 and
64c604898e812aa93c124c666e8709fff1b8dd26, which break vacuumlo's deletions
into multiple transactions with a user-controllable upper limit on the
number of objects dropped per transaction.
Tim Lewis, Robert Haas, Tom Lane

This was never intended to be allowed, and is blocked for an ordinary
CREATE TABLE, but CREATE TABLE AS slipped through the cracks. This
commit won't do anything to fix existing cases where this has loophole
has been exploited, but it still seems prudent to lock it down going
forward.
Back-branch commit only, as this problem has been refactored away
on the master branch.
Andres Freund

When converting source files, pg_regress' inputdir and outputdir options were
ignored when computing the locations of the destination files. In consequence,
these options were effectively unusable when the regression inputs need to
be adjusted by pg_regress. This patch makes pg_regress put the converted files
in the same place that these options specify non-converted input or results
files are to be found. Backpatched to all live branches.

Due to an apparent thinko, when printing a table in expanded mode
(\x), space would be allocated for 1 slot plus 1 byte per line,
instead of 1 slot per line plus 1 slot for the NULL terminator. When
the line count is small, reading or writing the terminator would
therefore access memory beyond what was allocated.

In backup.sgml, point out that you need to be using the logging collector
if you want to log messages from a failing archive_command script. (This
is an oversimplification, in that it will work without the collector as
long as you're not sending postmaster stderr to /dev/null; but it seems
like a good idea to encourage use of the collector to avoid problems
with multiple processes concurrently scribbling on one file.)
In config.sgml, do some wordsmithing of logging_collector discussion.
Per bug #6518 from Janning Vygen

In commit 4016bdef8aded77b4903c457050622a5a1815c16 I fixed a bunch of
ginxlog.c bugs having to do with not handling XLogReadBuffer failures
correctly. However, in ginRedoUpdateMetapage and ginRedoDeleteListPages,
I unaccountably thought that failure to read the metapage would be
impossible and just put in an elog(PANIC) call. This is of course wrong:
failure is exactly what will happen if the index got dropped (or rebuilt)
between creation of the WAL record and the crash we're trying to recover
from. I believe this explains Nicholas Wilson's recent report of these
errors getting reached.
Also, fix memory leak in forgetIncompleteSplit. This wasn't of much
concern when the code was written, but in a long-running standby server
page split records could be expected to accumulate indefinitely.
Back-patch to 8.4 --- before that, GIN didn't have a metapage.