There was an incorrect Assert in hstoreValidOldFormat(), which would cause
immediate core dumps when attempting to work with pre-9.0 hstore data,
but of course only in an assert-enabled build.
Also, ghstore_decompress() incorrectly applied DatumGetHStoreP() to a datum
that wasn't actually an hstore, but rather a ghstore (ie, a gist signature
bitstring). That used to be harmless, but could now result in misbehavior
if the hstore format conversion code happened to trigger. In reality,
since ghstore is not marked toastable (and doesn't need to be), this
function is useless anyway; we can lobotomize it down to returning the
passed-in pointer.
Both bugs found by Andrew Gierth, though this isn't exactly his proposed
patch.

Add a compatibility note about plpgsql’s treatment of SELECT INTO rec.fld when fld is of composite type. Per discussion of bug #5644 from Valentine Gogichashvili.

In these cases a qual can get marked with the removable rel in its
required_relids, but this is just to schedule its evaluation correctly, not
because it really depends on the rel. We were assuming that, in effect,
we could throw away *all* quals so marked, which is nonsense. Tighten up
the logic to be a little more paranoid about which quals belong to the
outer join being considered for removal, and arrange for all quals that
don't belong to be updated so they will still get evaluated correctly.
Also fix another problem that happened to be exposed by this test case,
which was that make_join_rel() was failing to notice some cases where
a constant-false qual could be used to prove a join relation empty. If it's
a pushed-down constant false, then the relation is empty even if it's an
outer join, because the qual applies after the outer join expansion.
Per report from Nathan Grange. Back-patch into 9.0.

Don’t warn about an in-progress online backup, when we’re recovering from an online backup instead of performing one. pg_ctl can detect that by checking if recovery.conf exists.

Backup label file is renamed away early in recovery, so the window where
backup label exists during recovery is normally very small, but you can run
into it e.g if restore_command is set incorrectly and the startup process
never finds even the first WAL segment containing the checkpoint record to
start recovery from.
Fujii Masao with comments by me.

Process options from the startup packed in walsender. Only few options make sense for walsender, but for example application_name and client_encoding do. We still don’t apply per-role settings from pg_db_role_setting, because that would require connecting to a database to read the table.

A long time ago, this didn't work nicely, but it seems to work on all recent
versions of OS X. The blank-pad method is less desirable since it results
in lots of extra space in ps' output. Per Alexey Klyukin.

Clean up description of ecpg’s dtcvfmtasc function. Per KOIZUMI Satoru.

In the previous coding, we had to flush a composite-type typcache entry
whenever we discarded the corresponding relcache entry. This caused problems
at least when testing with RELCACHE_FORCE_RELEASE, as shown in recent report
from Jeff Davis, and might result in real-world problems given the kind of
unexpected relcache flush that that test mechanism is intended to model.
The new coding decouples relcache and typcache management, which is a good
thing anyway from a structural perspective. The cost is that we have to
search the typcache linearly to find entries that need to be flushed. There
are a couple of ways we could avoid that, but at the moment it's not clear
it's worth any extra trouble, because the typcache contains very few entries
in typical operation.
Back-patch to 8.2, the same as some other recent fixes in this general area.
The patch could be carried back to 8.0 with some additional work, but given
that it's only hypothetical whether we're fixing any problem observable in
the field, it doesn't seem worth the work now.

Clarify documentation of handling of null arguments for aggregates. Per discussion.

This patch changes _bt_split() and _bt_pagedel() to throw a plain ERROR,
rather than PANIC, for several cases that are reported from the field
from time to time:
* right sibling's left-link doesn't match;
* PageAddItem failure during _bt_split();
* parent page's next child isn't right sibling during _bt_pagedel().
In addition the error messages for these cases have been made a bit
more verbose, with additional values included.
The original motivation for PANIC here was to capture core dumps for
subsequent analysis. But with so many users whose platforms don't capture
core dumps by default, or who are unprepared to analyze them anyway, it's hard
to justify a forced database restart when we can fairly easily detect the
problems before we've reached the critical sections where PANIC would be
necessary. It is not currently known whether the reports of these messages
indicate well-hidden bugs in Postgres, or are a result of storage-level
malfeasance; the latter possibility suggests that we ought to try to be more
robust even if there is a bug here that's ultimately found.
Backpatch to 8.2. The code before that is sufficiently different that
it doesn't seem worth the trouble to back-port further.

Remove obsolete remark that PQprepare() is more flexible than PREPARE.

Document the existence of the socket lock file under unix_socket_directory, which is perhaps not a terribly good spot for it but there doesn’t seem to be a better place. Also add a source-code comment pointing out a couple reasons for having a separate lock file. Per suggestion from Greg Smith.

Update time zone data files to tzdata release 2010l: DST law changes in Egypt and Palestine. Added new names for two Micronesian timezones: Pacific/Chuuk is now preferred over Pacific/Truk (and the preferred abbreviation is CHUT not TRUT) and Pacific/Pohnpei is preferred over Pacific/Ponape. Historical corrections for Finland.

Improve wording for privilege description on certain failure messages; the original misleadingly suggests that only access is meant, causing confusion. Per recent trouble report by Robert McGehee on pgsql-admin.

Fix ExecMakeTableFunctionResult to verify that all rows returned by a SRF returning “record” actually do have the same rowtype. This is needed because the parser can’t realistically enforce that they will all have the same typmod, as seen in a recent example from David Wheeler.

Back-patch to 8.0, which is as far back as we have the notion of RECORD
subtypes being distinguished by typmod. Wheeler's example depends on
8.4-and-up features, but I suspect there may be ways to provoke similar
failures before 8.4.

Don’t auto-create the subdirectories holding built documentation in a VPATH build tree. If we actually build the docs in the VPATH tree, those dirs will get created then; but if they’re present and empty, they capture the vpathsearch searches in “make install”, preventing installation of prebuilt docs that might exist in the source tree. Per bug #5595 from Dmtiriy Igrishin. Fix based on idea from Peter Eisentraut.

It turns out that some platforms return ENOMEM for a request that violates
SHMALL, whereas we were assuming that ENOSPC would always be used for that.
Apparently the latter is a Linuxism while ENOMEM is the BSD tradition.
Extend the ENOMEM hint to suggest that raising SHMALL might be needed.
Per gripe from A.M.
Backpatch to 9.0, but not further, because this doesn't seem important
enough to warrant creating extra translation work in the stable branches.
(If it were, we'd have figured this out years ago.)

This is reproducibly possible in Python 2.7 if the user turned
PendingDeprecationWarning into an error, but it's theoretically also possible
in earlier versions in case of exceptional conditions.
backpatched to 8.0

There is no reason that proc.c should have to get involved in this dirty hack
for letting the postmaster know which children are walsenders. Revert that
file to the way it was, and confine the kluge to pmsignal.c and postmaster.c.

array_in discards unquoted leading and trailing whitespace in array values,
while array_out is careful to quote array elements that contain whitespace.
This is problematic when the definition of "whitespace" varies between
locales: array_in could drop characters that were meant to be part of the
value. To avoid that, lock down "whitespace" to mean only the traditional
six ASCII space characters.
This change also works around a bug in OS X and some older BSD systems, in
which isspace() could return true for character fragments in UTF8 locales.
(There may be other places in PG where that bug could cause problems, but
this is the only one complained of so far; see recent report from Steven
Schlansker.)
Back-patch to 9.0, but not further. Given the lack of previous reports
of trouble, changing this behavior in stable branches seems to offer
more risk of breaking applications than reward of avoiding problems.

The original coding tended to break down in the face of modified restore
orders, as shown in bug #5626 from Albert Ullrich, because it would flip over
into parallel-restore operation too soon. That causes problems because we
don't have sufficient dependency information in dump archives to allow safe
parallel processing of SECTION_PRE_DATA items. Even if we did, it's probably
undesirable to allow that to override the commanded restore order.
To fix the problem of omitted items causing unexpected changes in restore
order, tweak SortTocFromFile so that omitted items end up at the head of
the list not the tail. This ensures that they'll be examined and their
dependencies will be marked satisfied before we get to any interesting
items.
In HEAD and 9.0, we can easily change restore_toc_entries_parallel so that
all SECTION_PRE_DATA items are guaranteed to be processed in the initial
serial-restore loop, and hence in commanded order. Only DATA and POST_DATA
items are candidates for parallel processing. For them there might be
variations from the commanded order because of parallelism, but we should
do it in a safe order thanks to dependencies.
In 8.4 it's much harder to make such a guarantee. I settled for not
letting the initial loop break out into parallel processing mode if
it sees a DATA/POST_DATA item that's not to be restored; this at least
prevents a non-restorable item from causing premature exit from the loop.
This means that 8.4 will be more likely to fail given a badly-ordered -L
list than 9.x, but we don't really promise any such thing will work anyway.

Bring some sanity to the trace_recovery_messages code and docs. Per gripe from Fujii Masao, though this is not exactly his proposed patch. Categorize as DEVELOPER_OPTIONS and set context PGC_SIGHUP, as per Fujii, but set the default to LOG because higher values aren’t really sensible (see the code for trace_recovery()). Fix the documentation to agree with the code and to try to explain what the variable actually does. Get rid of no-op calls trace_recovery(LOG), which accomplish nothing except to demonstrate that this option confuses even its author.

Aside from being more forgiving, this prevents a rather surprising misbehavior
when the "wrong" order was used: the old code didn't throw a syntax error,
but absorbed the INTO clause into the last USING expression, which then did
strange things downstream.
Intentionally not changing the documentation; we'll continue to advertise
only the "standard" clause order.
Backpatch to 8.4, where the USING clause was added to EXECUTE.

Keep exec_simple_check_plan() from thinking “SELECT foo INTO bar” is simple.

It's not clear if this situation can occur in plpgsql other than via the
EXECUTE USING case Heikki illustrated, which I will shortly close off.
However, ignoring the intoClause if it's there is surely wrong, so let's
patch it for safety.
Backpatch to 8.3, which is as far back as this code has a PlannedStmt
to deal with. There might be another way to make an equivalent test
before that, but since this is just preventing hypothetical bugs,
I'm not going to obsess about it.

Be a bit less cavalier with both the code and the comment for UNKNOWN fix.

Revert patch to coerce ‘unknown’ type parameters in the backend. As Tom pointed out, it would need a 2nd pass after the whole query is processed to correctly check that an unknown Param is coerced to the same target type everywhere. Adding the 2nd pass would add a lot more code, which doesn’t seem worth the risk given that there isn’t much of a use case for passing unknown Params in the first place. The code would work without that check, but it might be confusing and the behavior would be different from the varparams case.

Instead, just coerce all unknown params in a PL/pgSQL USING clause to text.
That's simple, and is usually what users expect.
Revert the patch in CVS HEAD and master, and backpatch the new solution to
8.4. Unlike the previous solution, this applies easily to 8.4 too.

Allocate local buffers in a context of their own, rather than dumping them into TopMemoryContext. This makes no functional difference, but makes it easier to see what the space is being used for in MemoryContextStats dumps. Per a recent example in which I was surprised by the size of TopMemoryContext.

afterTriggerInvokeEvents failed to adjust events->tailfree when truncating
the last chunk of an event list. This could result in the data being
"de-truncated" by afterTriggerRestoreEventList during a subsequent
subtransaction abort. Even that wouldn't kill us, because the re-added data
would just be events marked DONE --- unless the data had been partially
overwritten by new events. Then we might crash, or in any case misbehave
(perhaps fire triggers twice, or fire triggers with the wrong event data).
Per bug #5622 from Thue Janus Kristensen.
Back-patch to 8.4 where the current trigger list representation was introduced.

Reset the per-output-tuple exprcontext each time through the main loop in ExecModifyTable(). This avoids memory leakage when trigger functions leave junk behind in that context (as they more or less must). Problem and solution identified by Dean Rasheed.

I'm a bit concerned about the longevity of this solution --- once a plan can
have multiple ModifyTable nodes, we are very possibly going to have to do
something different. But it should hold up for 9.0.

The implicitly created sequence was created as owned by the current user,
who could be different from the table owner, eg if current user is a
superuser or some member of the table's owning role. This caused sanity
checks in the SEQUENCE OWNED BY code to spit up. Although possibly we
don't need those sanity checks, the safest fix seems to be to make sure
the implicit sequence is assigned the same owner role as the table has.
(We still do all permissions checks as the current user, however.)
Per report from Josh Berkus.
Back-patch to 9.0. The bug goes back to the invention of SEQUENCE OWNED BY
in 8.2, but the fix requires an API change for DefineRelation(), which seems
to have potential for breaking third-party code if done in a minor release.
Given the lack of prior complaints, it's probably not worth fixing in the
stable branches.

_outPlannedStmt is only debug support, so the omission there was not very
serious, but the omission in _copyPlannedStmt is a real bug. The consequence
would be that a copied plan tree would never be marked as a transient plan,
so that we would forget we ought to replan it after some not-yet-ready index
becomes ready for use. This might explain some past complaints about indexes
created with CREATE INDEX CONCURRENTLY not being used right away. Problem
spotted by Yeb Havinga.
Back-patch to 8.3, where the field was added.

Coerce ‘unknown’ type parameters to the right type in the fixed-params parse_analyze() function. That case occurs e.g with PL/pgSQL EXECUTE … USING ‘stringconstant’.

The coercion with a CoerceViaIO node. The result is similar to the coercion
via input function performed for unknown constants in coerce_type(),
except that this happens at runtime.
Backpatch to 9.0. The issue is present in 8.4 as well, but the coerce param
hook infrastructure this patch relies on was introduced in 9.0. Given the
lack of user reports and harmlessness of the bug, it's not worth attempting
a different fix just for 8.4.

Arrange to fsync the contents of lockfiles (both postmaster.pid and the socket lockfile) when writing them. The lack of an fsync here may well explain two different reports we’ve seen of corrupted lockfile contents, which doesn’t particularly bother the running server but can prevent a new server from starting if the old one crashes. Per suggestion from Alvaro.

Fix psql’s copy of utf2ucs() to match the backend’s copy exactly; in particular, propagate a fix in the test to see whether a UTF8 character has length 4 bytes. This is likely of little real-world consequence because 5-or-more-byte UTF8 sequences are not supported by Postgres nor seen anywhere in the wild, but still we may as well get it right. Problem found by Joseph Adams.

Fix planner to make a reasonable assumption about the amount of memory space used by array_agg(), string_agg(), and similar aggregate functions that use “internal” as their transition datatype. The previous coding thought this took no extra space, since “internal” is pass-by-value; but actually these aggregates typically consume a great deal of space. Per bug #5608 from Itagaki Takahiro, and fix suggestion from Hitoshi Harada.

Fix Assert failure in PushOverrideSearchPath when trying to restore a search path that specifies useTemp, but there is no active temp schema in the current session. (This can happen if the path was saved during a transaction that created a temp schema and was later rolled back.) For existing callers it’s sufficient to ignore the useTemp flag in this case, though we might later want to offer an option to create a fresh temp schema. So far as I can tell this is just an Assert failure: in a non-assert build, the code would push a zero onto the new search path, which is useless but not very harmful. Per bug report from Heikki.

Since the only purpose of WAL-loggin SharedInvalidationMessages is to support
Hot Standby operation, they needn't be included when wal_level < hot_standby.
Back-patch to 9.0.
Review by Heikki Linnakanagas and Fujii Masao.

Fix pg_restore to complain if any arguments remain after parsing the switches and input file name, per bug #5617 from Leo Shklovskii. Rearrange the corresponding code in pg_dump and pg_dumpall so that all three programs handle this in a consistent, straightforward fashion.

Back-patch to 9.0, but no further. Although this is certainly a bug, it's
possible that people have scripts that will be broken by the added error
check, so it seems better not to change the behavior in stable branches.

Fix incorrect logic in plpgsql for cleanup after evaluation of non-simple expressions. We need to deal with this when handling subscripts in an array assignment, and also when catching an exception. In an Assert-enabled build these omissions led to Assert failures, but I think in a normal build the only consequence would be short-term memory leakage; which may explain why this wasn’t reported from the field long ago.

Add a very specific hint for the case that we’re unable to locate a function matching a call like f(x, ORDER BY y,z). It could be that what the user really wants is f(x,z ORDER BY y). We now have pretty conclusive evidence that many people won’t understand this problem without concrete guidance, so give it to them. Per further discussion of the string_agg() problem.

Remove the single-argument form of string_agg(). It added nothing much in functionality, while creating an ambiguity in usage with ORDER BY that at least two people have already gotten seriously confused by. Also, add an opr_sanity test to check that we don’t in future violate the newly minted policy of not having built-in aggregates with the same name and different numbers of parameters. Per discussion of a complaint from Thom Brown.

Without this patch, constraints inherited by children of a parent
table which itself has multiple inheritance parents can end up with
the wrong coninhcount. After dropping the constraint, the children
end up with a leftover copy of the constraint that is not dumped
and cannot be dropped. There is a similar problem with ALTER TABLE
.. ADD COLUMN, but that looks significantly more difficult to
resolve, so I'm committing this fix separately.
Back-patch to 8.4, which is the first release that has coninhcount.
Report by Hank Enting.

Fix core dump in QTNodeCompare when tsquery_cmp() is applied to two empty tsqueries. CompareTSQ has to have a guard for the case rather than blindly applying QTNodeCompare to random data past the end of the datums. Also, change QTNodeCompare to be a little less trusting: use an actual test rather than just Assert’ing that the input is sane. Problem encountered while investigating another issue (I saw a core dump in autoanalyze on a table containing multiple empty tsquery values).

Back-patch to all branches with tsquery support.
In HEAD, also fix some bizarre (though not outright wrong) coding in
tsq_mcontains().

Don’t try to force use of -no-cpp-precomp on OS X. It’s been five years since Apple shipped a compiler that needed this switch, and there’s increasing interest in using other compilers that won’t accept the switch at all. Better to let anybody who still needs the switch inject it via CPPFLAGS. Per gripe from Neil Conway.

Fix an ancient typo that prevented the detection of conflicting fields when interval input “invalid” was specified together with other fields. Spotted by Neil Conway with the help of a clang warning. Although this has been wrong since the interval code was written more than 10 years ago, it doesn’t affect anything beyond which error message you get for a wrong input, so not worth back-patching very far.

Fix ANALYZE’s ancient deficiency of not trying to collect stats for expression indexes when the index column type (the opclass opckeytype) is different from the expression’s datatype. When coded, this limitation wasn’t worth worrying about because we had no intelligence to speak of in stats collection for the datatypes used by such opclasses. However, now that there’s non-toy estimation capability for tsvector queries, it amounts to a bug that ANALYZE fails to do this.

The fix changes struct VacAttrStats, and therefore constitutes an API break
for custom typanalyze functions. Therefore we can't back-patch it into
released branches, but it was agreed that 9.0 isn't yet frozen hard enough
to make such a change unacceptable. Ergo, back-patch to 9.0 but no further.
The API break had better be mentioned in 9.0 release notes.

Fix an additional set of problems in GIN’s handling of lossy page pointers. Although the key-combining code claimed to work correctly if its input contained both lossy and exact pointers for a single page in a single TID stream, in fact this did not work, and could not work without pretty fundamental redesign. Modify keyGetItem so that it will not return such a stream, by handling lossy-pointer cases a bit more explicitly than we did before.

Per followup investigation of a gripe from Artur Dabrowski.
An example of a query that failed given his data set is
select count(*) from search_tab where
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee:* | dd:*')) and
(to_tsvector('german', keywords ) @@ to_tsquery('german', 'aa:*'));
Back-patch to 8.4 where the lossy pointer code was introduced.

Rewrite the rbtree routines so that an RBNode is the first field of the struct representing a tree entry, rather than being a separately allocated piece of storage. This API is at least as clean as the old one (if not more so — there were some bizarre choices in there) and it permits a very substantial memory savings, on the order of 2X in ginbulk.c’s usage.

Also, fix minor memory leaks in code called by ginEntryInsert, in
particular in ginInsertValue and entryFillRoot, as well as ginEntryInsert
itself. These leaks resulted in the GIN index build context continuing
to bloat even after we'd filled it to maintenance_work_mem and started
to dump data out to the index.
In combination these fixes restore the GIN index build code to honoring
the maintenance_work_mem limit about as well as it did in 8.4. Speed
seems on par with 8.4 too, maybe even a bit faster, for a non-pathological
case in which HEAD was formerly slower.
Back-patch to 9.0 so we don't have a performance regression from 8.4.

Tweak tsmatchsel() so that it examines the structure of the tsquery whenever possible (ie, whenever the tsquery is a constant), even when no statistics are available for the tsvector. For example, foo @@ ‘a & b’::tsquery can be expected to be more selective than foo @@ ‘a’::tsquery, whether or not we know anything about foo. We use DEFAULT_TS_MATCH_SEL as the assumed selectivity of individual query terms when no stats are available, then combine the terms according to the query’s AND/OR structure as usual.

Per experimentation with Artur Dabrowski's example. (The fact that there
are no stats available in that example is a problem in itself, but
nonetheless tsmatchsel should be smarter about the case.)
Back-patch to 8.4 to keep all versions of tsmatchsel() in sync.

Rewrite the key-combination logic in GIN’s keyGetItem() and scanGetItem() routines to make them behave better in the presence of “lossy” index pointers. The previous coding was outright incorrect for some cases, as recently reported by Artur Dabrowski: scanGetItem would fail to return index entries in cases where one index key had multiple exact pointers on the same page as another key had a lossy pointer. Also, keyGetItem was extremely inefficient for cases where a single index key generates multiple “entry” streams, such as an @@ operator with a multiple-clause tsquery. The presence of a lossy page pointer in any one stream defeated its ability to use the opclass consistentFn, resulting in probing many heap pages that didn’t really need to be visited. In Artur’s example case, a query like WHERE tsvector @@ to_tsquery(‘a & b’) was about 50X slower than the theoretically equivalent WHERE tsvector @@ to_tsquery(‘a’) AND tsvector @@ to_tsquery(‘b’) The way that I chose to fix this was to have GIN call the consistentFn twice with both TRUE and FALSE values for the in-doubt entry stream, returning a hit if either call produces TRUE, but not if they both return FALSE. The code handles this for the case of a single in-doubt entry stream, but punts (falling back to the stupid behavior) if there’s more than one lossy reference to the same page. The idea could be scaled up to deal with multiple lossy references, but I think that would probably be wasted complexity. At least to judge by Artur’s example, such cases don’t occur often enough to be worth trying to optimize.

Improved version of patch to protect pg_get_expr() against misuse: look through join alias Vars to avoid breaking join queries, and move the test to someplace where it will catch more possible ways of calling a function. We still ought to throw away the whole thing in favor of a data-type-based solution, but that’s not feasible in the back branches.

Clean up some inconsistencies in the volatility marking of various I/O related functions. Per today’s discussion, we will henceforth assume that datatype I/O functions are either stable or immutable, never volatile. (This implies in particular that domain CHECK constraint expressions shouldn’t be volatile, since domain_in executes them.) In turn, functions that execute the I/O functions of arbitrary datatypes should always be labeled stable. This affects the labeling of array_to_string, which was unsafely marked immutable, and record_in, record_out, record_recv, record_send, domain_in, domain_recv, which were over-conservatively marked volatile. The array I/O functions were already marked stable, which is correct per this policy but would have been wrong if we maintained domain_in as volatile.

Back-patch to 9.0, along with an earlier fix to correctly mark cash_in
and cash_out as stable not immutable (since they depend on lc_monetary).
No catversion bump --- the implications of this are not currently
severe enough to justify a forced initdb.

Block elements with verbatim formatting (literallayout, programlisting,
screen, synopsis) should be aligned at column 0 independent of the surrounding
SGML, because whitespace is significant, and indenting them creates erratic
whitespace in the output. The CSS stylesheets already take care of indenting
the output.
Assorted markup improvements to go along with it.

Fix another longstanding problem in copy_relation_data: it was blithely assuming that a local char[] array would be aligned on at least a word boundary. There are architectures on which that is pretty much guaranteed to NOT be the case … and those arches also don’t like non-aligned memory accesses, meaning that log_newpage() would crash if it ever got invoked. Even on Intel-ish machines there’s a potential for a large performance penalty from doing I/O to an inadequately aligned buffer. So palloc it instead.

Work around a documentation toolchain problem by replacing the “AIX-fixlevels” table with a carrying the same information. Previously the 9.0 documentation was failing to build as a US-size PDF file. It’s quite obscure what the real problem is or why this avoids it, but we need a hack now so we can build docs for beta4.

If a zeroed page is present in the heap, ALTER TABLE .. SET TABLESPACE will
set the LSN and TLI while copying it, which is wrong, and heap_xlog_newpage()
will do the same thing during replay, so the corruption propagates to any
standby. Note, however, that the bug can't be demonstrated unless archiving
is enabled, since in that case we skip WAL logging altogether, and the LSN/TLI
are not set.
Back-patch to 8.0; prior releases do not have tablespaces.
Analysis and patch by Jeff Davis. Adjustments for back-branches and minor
wordsmithing by me.

Fix oversight in new EvalPlanQual logic: the second loop over the ExecRowMark list in ExecLockRows() forgot to allow for the possibility that some of the rowmarks are for child tables that aren’t relevant to the current row. Per report from Kenichiro Tanaka.

Fix potential failure when hashing the output of a subplan that produces a pass-by-reference datatype with a nontrivial projection step. We were using the same memory context for the projection operation as for the temporary context used by the hashtable routines in execGrouping.c. However, the hashtable routines feel free to reset their temp context at any time, which’d lead to destroying input data that was still needed. Report and diagnosis by Tao Ma.

Back-patch to 8.1, where the problem was introduced by the changes that
allowed us to work with "virtual" tuples instead of materializing intermediate
tuple values everywhere. The earlier code looks quite similar, but it doesn't
suffer the problem because the data gets copied into another context as a
result of having to materialize ExecProject's output tuple.

Remove unnecessary “Not safe to send CSV data” complaint from elog.c’s fallback path when CSV logging is configured but not yet operational. It’s sufficient to send the message to stderr, as we were already doing, and the “Not safe” gripe has already confused at least two core members …

Backpatch to 9.0, but not further --- doesn't seem appropriate to change
this behavior in stable branches.

Allow ORDER BY/GROUP BY/etc items to match targetlist items regardless of any implicit casting previously applied to the targetlist item. This is reasonable because the implicit cast, by definition, wasn’t written by the user; so we are preserving the expected behavior that ORDER BY items match textually equivalent tlist items. The case never arose before because there couldn’t be any implicit casting of a top-level SELECT item before we process ORDER BY etc. But now it can arise in the context of aggregates containing ORDER BY clauses, since the “targetlist” is the already-casted list of arguments for the aggregate. The net effect is that the datatype used for ORDER BY/DISTINCT purposes is the aggregate’s declared input type, not that of the original input column; which is a bit debatable but not horrendous, and to do otherwise would require major rework that doesn’t seem justified.

Fix several problems in pg_dump’s handling of SQL/MED objects, notably failure to dump a PUBLIC user mapping correctly, as per bug #5560 from Shigeru Hanada. Use the pg_user_mappings view rather than trying to access pg_user_mapping directly, so that the code doesn’t fail when run by a non-superuser. And clean up some minor carelessness such as unsafe usage of fmtId().

Allow full SSL certificate verification (wherein libpq checks its host name parameter against server cert’s CN field) to succeed in the case where both host and hostaddr are specified. As with the existing precedents for Kerberos, GSSAPI, SSPI, it is the calling application’s responsibility that host and hostaddr match up — we just use the host name as given. Per bug #5559 from Christopher Head.

In passing, make the error handling and messages for the no-host-name-given
failure more consistent among these four cases, and correct a lie in the
documentation: we don't attempt to reverse-lookup host from hostaddr
if host is missing.
Back-patch to 8.4 where SSL cert verification was introduced.

In pg_upgrade, prevent psql AUTOCOMMIT=off by not loading .psqlrc.
In pg_upgrade, report /bin directory checks independent of /data checks.
Remove incorrect email address for pg_upgrade bug reports.
On Win32, pg_upgrade cannot sent any server log output to the log file
because of file access limitations on that platform.

Oops, in the previous fix to prevent a cursor that’s being used in a FOR loop from being dropped, I missed subtransaction cleanup. Pinned portals must be dropped at subtransaction cleanup just as they are at main transaction cleanup.

Avoid an Assert failure in deconstruct_array() by making get_attstatsslot() use the actual element type of the array it’s disassembling, rather than trusting the type OID passed in by its caller. This is needed because sometimes the planner passes in a type OID that’s only binary-compatible with the target column’s type, rather than being an exact match. Per an example from Bernd Helmle.

Possibly we should refactor get_attstatsslot/free_attstatsslot to not expect
the caller to supply type ID data at all, but for now I'll just do the
minimum-change fix.
Back-patch to 7.4. Bernd's test case only crashes back to 8.0, but since
these subroutines are the same in 7.4, I suspect there may be variant
cases that would crash 7.4 as well.

We might at some point find we ought to back-patch this further than 9.0,
but I think that such Vars can only occur as resjunk members of upper-level
tlists, in which case the problem can't arise because prior versions didn't
print resjunk tlist items in EXPLAIN VERBOSE.

Properly report errno/out-of-disk-space error from pg_upgrade when in copy mode, per report from depstein@alliedtesting.com.

Minor kibitzing on previous patch: no need to run check more than once. (_PG_init should be called only once anyway, but as long as it’s got an internal guard against repeat calls, that should be in front of the version check.)

Fix variant float8 expected files to have exactly the expected spacing. This wasn’t important when we used diff’s -w (–ignore-all-space) option to compare regression result files, but it is now. Per buildfarm member canary, which evidently has been offline since we did that in November, but came to life again today.

Fix “cannot handle unplanned sub-select” error that can occur when a sub-select contains a join alias reference that expands into an expression containing another sub-select. Per yesterday’s report from Merlin Moncure and subsequent off-list investigation.

Back-patch to 7.4. Older versions didn't attempt to flatten sub-selects in
ways that would trigger this problem.

Adjust mbutils.c so it won’t get broken by future pgindent runs. To do that, replace L’\0’ by (WCHAR) 0. Perhaps someday we should teach pgindent about wide-character literals, but so long as this is the only use-case in the entire Postgres sources, a workaround seems easier.

Per extensive discussion on pgsql-hackers. We are deliberately not
back-patching this even though the behavior of 8.3 and 8.4 is
unquestionably broken, for fear of breaking existing users of this
parameter. This incompatibility should be release-noted.

Accept slightly grotty coding in Makefile.global in order to keep the -L flag for src/port/ in front of any -L flags placed in LDFLAGS by configure. This undoes an L-flag-ordering change that I had thought would be safe, but seems to be making at least one buildfarm member fail — the only theory for orca’s failure that I can think of is that it’s got an old copy of libpgport.a in /usr/lib. Also allow for LDFLAGS_SL to be set by contrib makefiles before they invoke Makefile.global.

Still more third thoughts: when linking shared libraries, LDFLAGS probably needs to appear before anything placed in SHLIB_LINK. This is because SHLIB_LINK is typically a subset of LIBS, and LIBS has to appear after LDFLAGS on platforms that are sensitive to the relative order of -L and -l switches.

Fix a few single-file (MODULES, not MODULE_big) contrib makefiles that were supposing that they should set SHLIB_LINK rather than LDFLAGS_SL. Since these don’t go through Makefile.shlib that was a no-op on most platforms. Also regularize the few platform-specific Makefiles that did pay attention to SHLIB_LINK: it seems that the real value of that is to pull in BE_DLLLIBS, so do that instead. Per buildfarm failures on cygwin.

Split the LDFLAGS make variable into two parts: LDFLAGS is now used for linking both executables and shared libraries, and we add on LDFLAGS_EX when linking executables or LDFLAGS_SL when linking shared libraries. This provides a significantly cleaner way of dealing with link-time switches than the former behavior. Also, make sure that the various platform-specific %.so: %.o rules incorporate LDFLAGS and LDFLAGS_SL; most of them missed that before. (I did not add these variables for the platforms that invoke $(LD) directly, however. It’s not clear if we can do that safely, since for the most part we assume these variables use CC command-line syntax.)

The previous fix in CVS HEAD and 8.4 for handling the case where a cursor being used in a PL/pgSQL FOR loop is closed was inadequate, as Tom Lane pointed out. The bug affects FOR statement variants too, because you can close an implicitly created cursor too by guessing the “” name created for it.

Don’t set recoveryLastXTime when replaying a checkpoint — that was a bogus idea from the start since the variable is only meant to track commit/abort events. This patch reverts the logic around the variable to what it was in 8.4, except that the value is now kept in shared memory rather than a static variable, so that it can be reported correctly by CreateRestartPoint (which is executed in the bgwriter).

Make vacuum_defer_cleanup_age be PGC_SIGHUP level, since it’s not sensible to have different values in different processes of the primary server. Also put it into the “Streaming Replication” GUC category; it doesn’t belong in “Standby Servers” because you use it on the master not the standby. In passing also correct guc.c’s idea of wal_keep_segments’ category.

Replace max_standby_delay with two parameters, max_standby_archive_delay and max_standby_streaming_delay, and revise the implementation to avoid assuming that timestamps found in WAL records can meaningfully be compared to clock time on the standby server. Instead, the delay limits are compared to the elapsed time since we last obtained a new WAL segment from archive or since we were last “caught up” to WAL data arriving via streaming replication. This avoids problems with clock skew between primary and standby, as well as other corner cases that the original coding would misbehave in, such as the primary server having significant idle time between transactions. Per my complaint some time ago and considerable ensuing discussion.

Backpatch to 8.3, which is as far back as we have opfamilies.
The opclass portion could probably be backpatched to 8.2, when
REASSIGN OWNED was added, but for now I have not done that.
Asko Tiidumaa, with minor adjustments by me.

Fix assorted misstatements and poor wording in the descriptions of the I/O formats for geometric types. Per bug #5536 from Jon Strait, and my own testing.

The previous commit to make copydir() interruptible prevented
postgres.exe from linking on MinGW and Cygwin, because on those
platforms libpgport_srv.a can't freely reference symbols defined
by the backend. Since that code is already backend-specific anyway,
just move the whole file into the backend rather than adding further
kludges to deal with the symbols needed by CHECK_FOR_INTERRUPTS().
This probably needs some further cleanup, but this commit just moves
the file as-is, which should hopefully be enough to turn the
buildfarm green again.

This makes ALTER DATABASE .. SET TABLESPACE and CREATE DATABASE more
sensitive to interrupts. Backpatch to 8.4, where ALTER DATABASE .. SET
TABLESPACE was introduced. We could go back further, but in the absence
of complaints about the CREATE DATABASE case it doesn't seem worth it.
Guillaume Lelarge, with a small correction by me.

stringToNode() and deparse_expression_pretty() crash on invalid input, but we have nevertheless exposed them to users via pg_get_expr(). It would be too much maintenance effort to rigorously check the input, so put a hack in place instead to restrict pg_get_expr() so that the argument must come from one of the system catalog columns known to contain valid expressions.

Add compatibility note warning that plpgsql is now stricter about the column datatypes of composite results, per gripe from Marcel Asio. Some desultory copy-editing of plpgsql-related sections of the release notes.

Improve pg_dump’s checkSeek() function to verify the functioning of ftello as well as fseeko, and to not assume that fseeko(fp, 0, SEEK_CUR) proves anything. Also improve some related comments. Per my observation that the SEEK_CUR test didn’t actually work on some platforms, and subsequent discussion with Robert Haas.

Back-patch to 8.4. In earlier releases it's not that important whether
we get the hasSeek test right, but with parallel restore it matters.

Fix pg_restore so parallel restore doesn’t fail when the input file doesn’t contain data offsets (which it won’t, if pg_dump thought its output wasn’t seekable). To do that, remove an unnecessarily aggressive error check, and instead fail if we get to the end of the archive without finding the desired data item. Also improve the error message to be more specific about the cause of the problem. Per discussion of recent report from Igor Neyman.

The revised documentation makes it more clear that these are client-side
parameters, rather than server side parameters. It also puts the main
point of each parameter first, and consolidates the conditions under which
it might be ignored in a single list at the end.

Fix thinko in tok_is_keyword(): it was looking at the wrong union variant of YYSTYPE, and hence returning the wrong answer for cases where a plpgsql “unreserved keyword” really does conflict with a variable name. Obviously I didn’t test this enough :-(. Per bug #5524 from Peter Gagarinov.

This adds four additional connection parameters to libpq: keepalives,
keepalives_idle, keepalives_count, and keepalives_interval.
keepalives default to on, per discussion, but can be turned off by
specifying keepalives=0. The remaining parameters, where supported,
can be used to adjust how often keepalives are sent and how many
can be lost before the connection is broken.
The immediate motivation for this patch is to make sure that
walreceiver will eventually notice if the master reboots without
closing the connection cleanly, but it should be helpful in other
cases as well.
Tollef Fog Heen, Fujii Masao, and me.

Add username designations to all pg_upgrade utility calls that support it.

In HEAD, emit a warning when an operator named => is defined.
In both HEAD and the backbranches (except in 8.2, where contrib
modules do not have documentation), document that hstore's text =>
text operator may be removed in a future release, and encourage the
use of the hstore(text, text) function instead. This function only
exists in HEAD (previously, it was called tconvert), so backpatch
it back to 8.2, when hstore was added. Per discussion.

In a PL/pgSQL “FOR cursor” statement, the statements executed in the loop might close the cursor, rendering the Portal pointer to it invalid. Closing the cursor in the middle of the loop is not a very sensible thing to do, but we must handle it gracefully and throw an error instead of crashing.

Fix mishandling of whole-row Vars referencing a view or sub-select. If such a Var appeared within a nested sub-select, we failed to translate it correctly during pullup of the view, because the recursive call to replace_rte_variables_mutator was looking for the wrong sublevels_up value. Bug was introduced during the addition of the PlaceHolderVar mechanism. Per bug #5514 from Marcos Castedo.

Don’t allow walsender to send WAL data until it’s been safely fsync’d on the master. Otherwise a subsequent crash could cause the master to lose WAL that has already been applied on the slave, resulting in the slave being out of sync and soon corrupt. Per recent discussion and an example from Robert Haas.

Change the interpretation of the primary_key_attnums parameter of dblink_build_sql_insert() and related functions. Now the column numbers are treated as logical not physical column numbers. This will provide saner behavior in the presence of dropped columns; furthermore, if we ever get around to allowing rearrangement of logical column ordering, the original definition would become nearly untenable from a usability standpoint. Per recent discussion of dblink’s handling of dropped columns. Not back-patched for fear of breaking existing applications.

This is not yet in any released version, so we still have the option to
backtrack. Instead, document hstore(text[], text[]). Per discussion.

Fix dblink_build_sql_insert() and related functions to handle dropped columns correctly. In passing, get rid of some dead logic in the underlying get_sql_insert() etc functions — there is no caller that will pass null value-arrays to them.

In particular, note that autovacuum does not yet understand that it might
need to vacuum inheritance parents as a result of changes to the child
tables.

Consolidate and improve checking of key-column-attnum arguments for dblink_build_sql_insert() and related functions. In particular, be sure to reject references to dropped and out-of-range column numbers. The numbers are still interpreted as physical column numbers, though, for backward compatibility.

Add new GUC categories corresponding to sections in docs, and move description for vacuum_defer_cleanup_age to the correct category. Sections in postgresql.conf are also sorted in the same order with docs.

Rearrange dblink’s dblink_build_sql_insert() and related routines to open and lock the target relation just once per SQL function call. The original coding obtained and released lock several times per call. Aside from saving a not-insignificant number of cycles, this eliminates possible race conditions if someone tries to modify the relation’s schema concurrently. Also centralize locking and permission-checking logic.

If a corrupt WAL record is received by streaming replication, disconnect and retry. If the record is genuinely corrupt in the master database, there’s little hope of recovering, but it’s better than simply retrying to apply the corrupt WAL record in a tight loop without even trying to retransmit it, which is what we used to do.

Use “replication” as the database name when constructing a connection string for a streaming replication connection. It’s ignored by the server, but allows libpq to pick up the password from .pgpass where “replication” is specified as the database name.

Return NULL instead of 0/0 in pg_last_xlog_receive_location() and pg_last_xlog_replay_location(). Per Robert Haas’s suggestion, after Itagaki Takahiro pointed out an issue in the docs. Also, some wording changes in the docs by me.

While my previous attempt seems to always produce valid YAML, it
doesn't always produce YAML that means what it appears to mean,
because of tokens like "0xa" and "true", which without quotes will
be interpreted as integer or Boolean literals. So, instead, just
quote everything that's not known to be a number, as we do for
JSON.
Dean Rasheed, with some changes to the comments by me.

In standby mode, respect checkpoint_segments in addition to checkpoint_timeout to trigger restartpoints. We used to deliberately only do time-based restartpoints, because if checkpoint_segments is small we would spend time doing restartpoints more often than really necessary. But now that restartpoints are done in bgwriter, they’re not as disruptive as they used to be. Secondly, because streaming replication stores the streamed WAL files in pg_xlog, we want to clean it up more often to avoid running out of disk space when checkpoint_timeout is large and checkpoint_segments small.

Make the walwriter close it’s handle to an old xlog segment if it’s no longer the current one. Not doing this would leave the walwriter with a handle to a deleted file if there was nothing for it to do for a long period of time, preventing the file from being completely removed.

The previous code failed to quote in many cases where quoting was necessary -
YAML has loads of special characters, including -:[]{},"'|*& - so quote much
more aggressively, and only refrain from quoting things where it seems fairly
clear that it isn't necessary.
Per report from Dean Rasheed.

Fix connection leak in dblink when dblink_connect() or dblink_connect_u() end with “duplicate connection name” errors.

Especially uninitialized fillfactor caused inefficient page usage
because we built a StdRdOptions struct in which fillfactor is zero
if any reloption is set for the toast table.
In addition, we disallow toast.autovacuum_analyze_threshold and
toast.autovacuum_analyze_scale_factor because we didn't actually
support them; they are always ignored.
Report by Rumko on pgsql-bugs on 12 May 2010.
Analysis by Tom Lane and Alvaro Herrera. Patch by me.
Backpatch to 8.4.

Replace “slave” to “standby” in documentation for consistent terminology. Almost all of the terms in docs and messages were replaced, but still remains in a few comments and README files in codes.

Add current WAL end (as seen by walsender, ie, GetWriteRecPtr() result) and current server clock time to SR data messages. These are not currently used on the slave side but seem likely to be useful in future, and it’d be better not to change the SR protocol after release. Per discussion. Also do some minor code review and cleanup on walsender.c, and improve the protocol documentation.

We must filter out hashtable entries with frequencies less than those
specified by the algorithm, else we risk emitting junk entries whose
actual frequency is much less than other lexemes that did not get
tabulated. This is bad enough by itself, but even worse is that
tsquerysel() believes that the minimum frequency seen in pg_statistic is a
hard upper bound for lexemes not included, and was thus underestimating
the frequency of non-MCEs.
Also, set the threshold frequency to something with a little bit of theory
behind it, to wit assume that the input distribution is approximately
Zipfian. This might need adjustment in future, but some preliminary
experiments suggest that it's not too unreasonable.
Back-patch to 8.4, where this code was introduced.
Jan Urbanski, with some editorialization by Tom

Change the notation for calling functions with named parameters from “val AS name” to “name := val”, as per recent discussion.

This patch catches everything in the original named-parameters patch,
but I'm not certain that no other dependencies snuck in later (grepping
the source tree for all uses of AS soon proved unworkable).
In passing I note that we've dropped the ball at least once on keeping
ecpg's lexer (as opposed to parser) in sync with the backend. It would
be a good idea to go through all of pgc.l and see if it's in sync now.
I didn't attempt that at the moment.

Add text to “Populating a Database” pointing out that bulk data load into a table with foreign key constraints eats memory. Per off-line discussion of bug #5480 with its reporter. Also do some minor wordsmithing elsewhere in the same section.

Abort a FETCH_COUNT-controlled query if we observe any I/O error on the output stream. This typically indicates that the user quit out of $PAGER, or that we are writing to a file and ran out of disk space. In either case we shouldn’t bother to continue fetching data.

Fix oversight in the previous patch that made LIKE throw error for \ at the end of the pattern: the code path that handles \ just after % should throw error too. As in the previous patch, not back-patching for fear of breaking apps that worked before.

In passing, also rename the TCHAR macro to GETCHAR, because pgindent is
messing with the formatting of the former (apparently it now thinks TCHAR
is a typedef name).
Back-patch to 8.3, where the bug was introduced.

PGDLLEXPORT is __declspec (dllexport) only on MSVC, but is __declspec (dllimport) on other compilers because cygwin and mingw don’t like dllexport.

Update High Availability docs. Clarify terms master/primary standby/slave, move two paragraphs that apply to log shipping in general from the “Alternative method for log shipping” section to the earlier sections. Add varname tags where missing. Some small wording changes.

Rejigger mergejoin logic so that a tuple with a null in the first merge column is treated like end-of-input, if nulls sort last in that column and we are not doing outer-join filling for that input. In such a case, the tuple cannot join to anything from the other input (because we assume mergejoinable operators are strict), and neither can any tuple following it in the sort order. If we’re not interested in doing outer-join filling we can just pretend the tuple and its successors aren’t there at all. This can save a great deal of time in situations where there are many nulls in the join column, as in a recent example from Scott Marlowe. Also, since the planner tends to not count nulls in its mergejoin scan selectivity estimates, this is an important fix to make the runtime behavior more like the estimate.

I regard this as an omission in the patch I wrote years ago to teach mergejoin
that tuples containing nulls aren't joinable, so I'm back-patching it. But
only to 8.3 --- in older versions, we didn't have a solid notion of whether
nulls sort high or low, so attempting to apply this optimization could break
things.

Change ps_status.c to explicitly track the current logical length of ps_buffer. This saves cycles in get_ps_display() on many popular platforms, and more importantly ensures that get_ps_display() will correctly return an empty string if init_ps_display() hasn’t been called yet. Per trouble report from Ray Stell, in which log_line_prefix %i produced junk early in backend startup.

Fix the volatility marking of textanycat() and anytextcat(): they were marked immutable, but that is wrong in general because the cast from the polymorphic argument to text could be stable or even volatile. Mark them volatile for safety. In the typical case where the cast isn’t volatile, the planner will deduce the correct expression volatility after inlining the function, so performance is not lost. The just-committed fix in CREATE INDEX also ensures this won’t break any indexing cases that ought to be allowed.

1. SQL function inlining could reduce the apparent volatility of the
expression, allowing an expression to be accepted where it previously would
not have been. As an example, polymorphic functions must be marked with the
worst-case volatility they have for any argument type, but for specific
argument types they might not be so volatile, so indexing could be allowed.
(Since the planner will refuse to inline functions in cases where the
apparent volatility of the expression would increase, this won't break
any cases that were accepted before.)
2. A nominally immutable function could have default arguments that are
volatile expressions. In such a case insertion of the defaults will increase
both the apparent and actual volatility of the expression, so it is
*necessary* to check this before allowing the expression to be indexed.
Back-patch to 8.4, where default arguments were introduced.

Mark PG_MODULE_MAGIC and PG_FUNCTION_INFO_V1 with PGDLLEXPORT independently from BUILDING_DLL. It is always __declspec(dllexport).

Make it more clear that you need to release savepoint with RELEASE SAVEPOINT to make an older savepoint with the same name accessible. It’s also possible to implicitly release the savepoint by rolling back to an earlier savepoint, but mentioning that too would make the note just more verbose and confusing.

In walsender, don’t sleep if there’s outstanding WAL waiting to be sent, otherwise we effectively rate-limit the streaming as pointed out by Simon Riggs. Also, send the WAL in smaller chunks, to respond to signals more promptly.

Rearrange libpq’s SSL initialization to simplify it and make it handle some additional cases correctly. The original coding failed to load additional (chain) certificates from the client cert file, meaning that indirectly signed client certificates didn’t work unless one hacked the server’s root.crt file to include intermediate CAs (not the desired approach). Another problem was that everything got loaded into the shared SSL_context object, which meant that concurrent connections trying to use different sslcert settings could well fail due to conflicting over the single available slot for a keyed certificate.

To fix, get rid of the use of SSL_CTX_set_client_cert_cb(), which is
deprecated anyway in the OpenSSL documentation, and instead just
unconditionally load the client cert and private key during connection
initialization. This lets us use SSL_CTX_use_certificate_chain_file(),
which does the right thing with additional certs, and is lots simpler than
the previous hacking about with BIO-level access. A small disadvantage is
that we have to load the primary client cert a second time with
SSL_use_certificate_file, so that that one ends up in the correct slot
within the connection's SSL object where it can get paired with the key.
Given the other overhead of making an SSL connection, that doesn't seem
worth worrying about.
Per discussion ensuing from bug #5468.

Fix bogus error message for SSL-cert authentication, due to lack of a uaCert entry in auth_failed(). Put the switch entries into a sane order, namely the one the enum is declared in.

HS Defer buffer pin deadlock check until deadlock_timeout has expired. During Hot Standby we need to check for buffer pin deadlocks when the Startup process begins to wait, in case it never wakes up again. We previously made the deadlock check immediately on the basis it was cheap, though clearer thinking and prima facie evidence shows that was too simple. Refactor existing code to make it easy to add in deferral of deadlock check until deadlock_timeout allowing a good reduction in deadlock checks since far few buffer pins are held for that duration. It’s worth doing anyway, though major goal is to prevent further reports of context switching with high numbers of users on occasional tests.

Tell openssl to include the names of the root certs the server trusts in requests for client certs. This lets a client with a keystore select the appropriate client certificate to send. In particular, this is necessary to get Java clients to work in all but the most trivial configurations. Per discussion of bug #5468.

1. If we receive a fast shutdown request while in the PM_STARTUP state,
process it just as we would in PM_RECOVERY, PM_HOT_STANDBY, or PM_RUN.
Without this change, an early fast shutdown followed by Hot Standby causes
the database to get stuck in a state where a shutdown is pending (so no new
connections are allowed) but the shutdown request is never processed unless
we end Hot Standby and enter normal running.
2. Avoid removing the backup label file when a smart or fast shutdown occurs
during recovery. It makes sense to do this once we've reached normal running,
since we must be taking a backup which now won't be valid. But during
recovery we must be recovering from a previously taken backup, and any backup
label file is needed to restart recovery from the right place.
Fujii Masao and Robert Haas

Fix oversight in construction of sort/unique plans for UniquePaths. If the original IN operator is cross-type, for example int8 = int4, we need to use int4 < int4 to sort the inner data and int4 = int4 to unique-ify it. We got the first part of that right, but tried to use the original IN operator for the equality checks. Per bug #5472 from Vlad Romascanu.

Backpatch to 8.4, where the bug was introduced by the patch that unified
SortClause and GroupClause. I was able to take out a whole lot of on-the-fly
calls of get_equality_op_for_ordering_op(), but failed to realize that
I needed to put one back in right here :-(

This was broken by the following commmit. Although the original commit was
backpatched all the way to 7.4, this particular bug exists only in the version
applied to HEAD.
http://archives.postgresql.org/pgsql-committers/2010-05/msg00058.php

Ecpg now accepts “long long” datatypes even if “long” is 64bit wide. This used to cover the equally long “long long” type. This patch closes bug #5464.

This avoids a formatting problem in the PDF output. In the HTML output this
isn't necessary, but we've done similar things elsewhere in the documentation
so I think it's OK to do it here, too. I've refrained from breaking a longish
error message which also causes problems for the PDF output, because that would
make the HTML output look wrong.
Erik Rijkers

Ensure that pg_restore -l will output DATABASE entries whether or not -C is specified. Per bug report from Russell Smith and ensuing discussion. Since this is a corner case behavioral change, I’m going to be conservative and not back-patch it.

Fix bug in processing of checkpoint time for max_standby_delay. Latest log time was incorrectly set, typically leading to dates in the past, which would cause more cancellations in Hot Standby on a quiet server.

Prevent PL/Tcl from loading the “unknown” module from pltcl_modules unless that is a regular table or view owned by a superuser. This prevents a trojan horse attack whereby any unprivileged SQL user could create such a table and insert code into it that would then get executed in other users’ sessions whenever they call pltcl functions.

Worse yet, because the code was automatically loaded into both the "normal"
and "safe" interpreters at first use, the attacker could execute unrestricted
Tcl code in the "normal" interpreter without there being any pltclu functions
anywhere, or indeed anyone else using pltcl at all: installing pltcl is
sufficient to open the hole. Change the initialization logic so that the
"unknown" code is only loaded into an interpreter when the interpreter is
first really used. (That doesn't add any additional security in this
particular context, but it seems a prudent change, and anyway the former
behavior violated the principle of least astonishment.)
Security: CVE-2010-1170

Abandon the use of Perl’s Safe.pm to enforce restrictions in plperl, as it is fundamentally insecure. Instead apply an opmask to the whole interpreter that imposes restrictions on unsafe operations. These restrictions are much harder to subvert than is Safe.pm, since there is no container to be broken out of. Backported to release 7.4.

In releases 7.4, 8.0 and 8.1 this also includes the necessary backporting of
the two interpreters model for plperl and plperlu adopted in release 8.2.
In versions 8.0 and up, the use of Perl's POSIX module to undo its locale
mangling on Windows has become insecure with these changes, so it is
replaced by our own routine, which is also faster.
Nice side effects of the changes include that it is now possible to use perl's
"strict" pragma in a natural way in plperl, and that perl's $a and
$b variables now work as expected in sort routines, and that function
compilation is significantly faster.
Tim Bunce and Andrew Dunstan, with reviews from Alex Hunsaker and
Alexey Klyukin.
Security: CVE-2010-1169

* There is no chmod() on Windows.
* Must always use the 3-parameter version of open()
* There is no dynloader.h - but it also appears unnecessary on all platforms
* Don't include shlobj.h because it causes compile errors, and from what I can
see it's not actually used. This may need to be added back for mingw
and/or cygwin in the worst case.

Ensure that top level aborts call XLogSetAsyncCommit(). Not doing so simply leads to data waiting in wal_buffers which then causes later commits to potentially do emergency writes and for all forms of replication to be potentially delayed without need or benefit. Issue pointed out exactly by Fujii Masao, following bug report by Robert Haas on a separate though related topic.

Cleanup initialization of Hot Standby. Clarify working with reanalysis of requirements and documentation on LogStandbySnapshot(). Fixes two minor bugs reported by Tom Lane that would lead to an incorrect snapshot after transaction wraparound. Also fix two other problems discovered that would give incorrect snapshots in certain cases. ProcArrayApplyRecoveryInfo() substantially rewritten. Some minor refactoring of xact_redo_apply() and ExpireTreeKnownAssignedTransactionIds().

Clean up unnecessary unportability and compiler warnings by removing the cmp parameter for pg_scandir(). The code failed to support this anyway for Sun/Windows, so pretending we could accept a parameter other than NULL was just asking for trouble.

Fixes a complaint from src/tools/pginclude/cpluspluscheck reported by
Peter Eisentraut.

Cause the archiver process to adopt new postgresql.conf settings (particularly archive_command) as soon as possible, namely just before issuing a new call of archive_command, even when there is a backlog of files to be archived. The original coding would only absorb new settings after clearing the backlog and returning to the outer loop. Per discussion.

When adding a “target IS NOT NULL” indexqual to the plan for an index-optimized MIN or MAX, we must take care to insert the added qual in a legal place among the existing indexquals, if any. The btree index AM requires the quals to appear in index-column order. We didn’t have to worry about this before because “target IS NOT NULL” was just treated as a plain scan filter condition; but as of 9.0 it can be an index qual and then it has to follow the rule. Per report from Ian Barwick.

Adjust comments about avoiding use of printf’s %.*s. My initial impression that glibc was measuring the precision in characters (which is what the Linux man page says it does) was incorrect. It does take the precision to be in bytes, but it also tries to truncate the string at a character boundary. The bottom line remains the same: it will mess up if the string is not in the encoding it expects, so we need to avoid %.*s anytime there’s a significant risk of that. Previous code changes are still good, but adjust the comments to reflect this knowledge. Per research by Hernan Gonzalez.

Work around a subtle portability problem in use of printf %s format. Depending on which spec you read, field widths and precisions in %s may be counted either in bytes or characters. Our code was assuming bytes, which is wrong at least for glibc’s implementation, and in any case libc might have a different idea of the prevailing encoding than we do. Hence, for portable results we must avoid using anything more complex than just “%s” unless the string to be printed is known to be all-ASCII.

This patch fixes the cases I could find, including the psql formatting
failure reported by Hernan Gonzalez. In HEAD only, I also added comments
to some places where it appears safe to continue using "%.*s".

ECPG connect routine only checked for NULL to find empty parameters, but user and password can also be “”.

Fix psql to not go into infinite recursion when expanding a variable that refers to itself (directly or indirectly). Instead, print a message when recursion is detected, and don’t expand the repeated reference. Per bug #5448 from Francis Markham.

Need to hold ControlFileLock while updating control file. Update minRecoveryPoint in control file when replaying a parameter change record, to ensure that we don’t allow hot standby on WAL generated without wal_level=‘hot_standby’ after a standby restart.

Add cross-reference from wal_level to hot_standby setting. Update the PITR documentation to mention that you need to set wal_level to ‘archive’ or ‘hot_standby’, to enable WAL archiving. Per Simon’s request.

Fix replay of XLOG_HEAP_NEWPAGE WAL records to pay attention to the forknum field of the WAL record. The previous coding always wrote to the main fork, resulting in data corruption if the page was meant to go into a non-default fork.

At present, the only operation that can produce such WAL records is
ALTER TABLE/INDEX SET TABLESPACE when executed with archive_mode = on.
Data corruption would be observed on standby slaves, and could occur on the
master as well if a database crash and recovery occurred after committing
the ALTER and before the next checkpoint. Per report from Gordon Shannon.
Back-patch to 8.4; the problem doesn't exist in earlier branches because
we didn't have a concept of multiple relation forks then.

Update standbycheck test output with new ERROR message changes. No changes to tests and no changes in accepted server behaviour.

Clean up some awkward, inaccurate, and inefficient processing around MaxStandbyDelay. Use the GUC units mechanism for the value, and choose more appropriate timestamp functions for performing tests with it. Make the ps_activity manipulation in ResolveRecoveryConflictWithVirtualXIDs have behavior similar to ps_activity code elsewhere, notably not updating the display when update_process_title is off and not truncating the display contents at an arbitrarily-chosen length. Improve the docs to be explicit about what MaxStandbyDelay actually measures, viz the difference between primary and standby servers’ clocks, and the possible hazards if their clocks aren’t in sync.

Add code to InternalIpcMemoryCreate() to handle the case where shmget() returns EINVAL for an existing shared memory segment. Although it’s not terribly sensible, that behavior does meet the POSIX spec because EINVAL is the appropriate error code when the existing segment is smaller than the requested size, and the spec explicitly disclaims any particular ordering of error checks. Moreover, it does in fact happen on OS X and probably other BSD-derived kernels. (We were able to talk NetBSD into changing their code, but purging that behavior from the wild completely seems unlikely to happen.) We need to distinguish collision with a pre-existing segment from invalid size request in order to behave sensibly, so it’s worth some extra code here to get it right. Per report from Gavin Kistner and subsequent investigation.

Back-patch to all supported versions, since any of them could get used
with a kernel having the debatable behavior.

Install hack workaround for failure of ‘make all’ in VPATH builds. It appears that gmake gets confused if postgres.sgml is not present in the working directory, and instantiates some default rule or other that would let postgres.sgml be built from postgres.xml. I haven’t been able to track down exactly where that’s coming from, but the problem can be dodged by specifying srcdir explicitly in the rule for postgres.xml. Per report from Vladimir Kokovic.

Adjust postgres.xml rule so that make will notice a failure exit from osx. The previous coding had it in a pipe, which on most shells won’t report the error. Per experimentation with a bug report from Vladimir Kokovic. This doesn’t actually fix his problem, but it does explain why make didn’t report that there was a problem.

Update our information about OS X shared memory configuration: it’s now possible to set most of the SHM kernel parameters without a reboot. Also, reorder the paragraph to explain the modern configuration method first. There are probably not too many people who still care about how to do it on OS X 10.3 or older.

Fix multiple memory leaks in PLy_spi_execute_fetch_result: it would leak memory if the result had zero rows, and also if there was any sort of error while converting the result tuples into Python data. Reported and partially fixed by Andres Freund.

Back-patch to all supported versions. Note: I haven't tested the 7.4 fix.
7.4's configure check for python is so obsolete it doesn't work on my
current machines :-(. The logic change is pretty straightforward though.

Fix a couple of places where the result of fgets() wasn’t checked. This is mostly to suppress compiler warnings, although in principle the cases could result in undesirable behavior.

Adjust error checks in pg_start_backup and pg_stop_backup to make it possible to perform a backup without archive_mode being enabled. This gives up some user-error protection in order to improve usefulness for streaming-replication scenarios. Per discussion.

Rename the parameter recovery_connections to hot_standby, to reduce possible confusion with streaming-replication settings. Also, change its default value to “off”, because of concern about executing new and poorly-tested code during ordinary non-replicating operation. Per discussion.

Install a workaround for ‘TeX capacity exceeded’ problem when building PDF output for recent versions of the documentation. There is probably a better answer out there somewhere, but we need something now so we can build beta releases.

We had to do something about this because many call sites were failing to
check for NULL; and changing it like this seems a lot more useful and
mistake-proof than adding checks to the call sites without them.

Make pg_stats example query result a bit less wide, and add comment about pg_stats.inherited

Introduce wal_level GUC to explicitly control if information needed for archival or hot standby should be WAL-logged, instead of deducing that from other options like archive_mode. This replaces recovery_connections GUC in the primary, where it now has no effect, but it’s still used in the standby to enable/disable hot standby.

Remove the WAL-logging of "unlogged operations", like creating an index
without WAL-logging and fsyncing it at the end. Instead, we keep a copy of
the wal_mode setting and the settings that affect how much shared memory a
hot standby server needs to track master transactions (max_connections,
max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings
change, at server restart, write a WAL record noting the new settings and
update pg_control. This allows us to notice the change in those settings in
the standby at the right moment, they used to be included in checkpoint
records, but that meant that a changed value was not reflected in the
standby until the first checkpoint after the change.
Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to
the sequence it used to follow, before hot standby and subsequent patches
changed it to 0x9003.

Modify the built-in text search parser to handle URLs more nearly according to RFC 3986. In particular, these characters now terminate the path part of a URL: ‘“’, ‘<‘, ‘>’, ‘\‘, ‘^’, ‘`‘, ‘{’, ‘|’, ‘}’. The previous behavior was inconsistent and depended on whether a “?” was present in the path. Per gripe from Donald Fraser and spec research by Kevin Grittner.

Replace the KnownAssignedXids hash table with a sorted-array data structure, and be more tense about the locking requirements for it, to improve performance in Hot Standby mode. In passing fix a few bugs and improve a number of comments in the existing HS code.

If a base backup is cancelled by server shutdown or crash, throw an error in WAL recovery when it sees the shutdown checkpoint record. It’s more user-friendly to find out about it at that point than at the end of recovery, and you’re not left wondering why your hot standby server never opens up for read-only connections.

Normal superuser processes are allowed to connect even when the database
system is shutting down, or when fewer than superuser_reserved_connection
slots remain. This is intended to make sure an administrator can log in
and troubleshoot, so don't extend these same courtesies to users connecting
for replication.

Make CheckRequiredParameterValues() depend upon correct combination of parameters. Fix bug report by Robert Haas that error message and hint was incorrect if wrong mode parameters specified on master. Internal changes only. Proposals for parameter simplification on master/primary still under way.

Further reductions in Hot Standby conflict processing. These come from the realistion that HEAP2_CLEAN records don’t always remove user visible data, so conflict processing for them can be skipped. Confirm validity using Assert checks, clarify circumstances under which we log heap_cleanup_info records. Tuning arises from bug fixing of earlier safety check failures.

Fix encoding issue when lc_monetary or lc_numeric are different encoding from lc_ctype, that could happen on Windows. We need to change lc_ctype together with lc_monetary or lc_numeric, and convert strings in lconv from lc_ctype encoding to the database encoding.

The bug reported by Mikko, original patch by Hiroshi Inoue,
with changes by Bruce and me.

Enforce superuser permissions checks during ALTER ROLE/DATABASE SET, rather than during define_custom_variable(). This entails rejecting an ALTER command if the target variable doesn’t have a known (non-placeholder) definition, unless the calling user is superuser. When the variable is known, we can correctly apply the rule that only superusers can issue ALTER for SUSET parameters. This allows define_custom_variable to apply ALTER’s values for SUSET parameters at module load time, secure in the knowledge that only a superuser could have set the ALTER value. This change fixes a longstanding gotcha in the usage of SUSET-level custom parameters; which is a good thing to fix now that plpgsql defines such a parameter.

Only send cleanup_info messages if VACUUM removes any tuples. There is no other purpose for this message type than to report the latestRemovedXid of removed tuples, prior to index scans. Removes overlooked path for sending invalid latestRemovedXid. Fixes buildfarm failure on centaur.

Relax locking during GetCurrentVirtualXIDs(). Earlier improvements to handling of btree delete records mean that all snapshot conflicts on standby now have a valid, useful latestRemovedXid. Our earlier approach using LW_EXCLUSIVE was useful when we didnt always have a valid value, though is no longer useful or necessary. Asserts added to code path to prove and ensure this is the case. This will reduce contention and improve performance of larger Hot Standby servers.

Fix oversight in collecting values for cleanup_info records. vacuum_log_cleanup_info() now generates log records with a valid latestRemovedXid set in all cases. Also be careful not to zero the value when we do a round of vacuuming part-way through lazy_scan_heap(). Incidentally, this reduces frequency of conflicts in Hot Standby.

Fix pg_hba.conf matching so that replication connections only match records with database = replication. The previous coding would allow them to match ordinary records too, but that seems like a recipe for security breaches. Improve the messages associated with no-such-pg_hba.conf entry to report replication connections as such, since that’s now a critical aspect of whether the connection matches. Make some cursory improvements in the related documentation, too.

Arrange for client authentication to occur before we select a specific database to connect to. This is necessary for the walsender code to work properly (it was previously using an untenable assumption that template1 would always be available to connect to). This also gets rid of a small security shortcoming that was introduced in the original patch to eliminate the flat authentication files: before, you could find out whether or not the requested database existed even if you couldn’t pass the authentication checks.

The changes needed to support this are mainly just to treat pg_authid and
pg_auth_members as nailed relations, so that we can read them without having
to be able to locate real pg_class entries for them. This mechanism was
already debugged for pg_database, but we hadn't recognized the value of
applying it to those catalogs too.
Since the current code doesn't have support for accessing toast tables before
we've brought up all of the relcache, remove pg_authid's toast table to ensure
that no one can store an out-of-line toasted value of rolpassword. The case
seems quite unlikely to occur in practice, and was effectively unsupported
anyway in the old "flatfiles" implementation.
Update genbki.pl to actually implement the same rules as bootstrap.c does for
not-nullability of catalog columns. The previous coding was a bit cheesy but
worked all right for the previous set of bootstrap catalogs. It does not work
for pg_authid, where rolvaliduntil needs to be nullable.
Initdb forced due to minor catalog changes (mainly the toast table removal).

Fix code that doesn’t work on machines with strict alignment requirements: must use memcpy here rather than struct assignment.

Also, make the name of the GUC and the name of the backing variable match.
Alnong the way, clean up a couple of slight typographical errors in the
related docs.

Move the responsibility for calling StartupXLOG into InitPostgres, for those process types that go through InitPostgres; in particular, bootstrap and standalone-backend cases. This ensures that we have set up a PGPROC and done some other basic initialization steps (corresponding to the if (IsUnderPostmaster) block in AuxiliaryProcessMain) before we attempt to run WAL recovery in a standalone backend. As was discovered last September, this is necessary for some corner-case code paths during WAL recovery, particularly end-of-WAL cleanup.

Add wrapper function libpqrcv_PQexec() in the walreceiver that uses async libpq to send queries, making the waiting for responses interruptible on platforms where PQexec() can’t normally be interrupted by signals, such as win32.

The logic for determining whether to materialize has been significantly
overhauled for 9.0. In case there should be any doubt about whether
materialization is a win in any particular case, this should provide a
convenient way of seeing what happens without it; but even with enable_material
turned off, we still materialize in cases where it is required for
correctness.
Thanks to Tom Lane for the review.

Fix bogus order of cleanup steps in plperl_inline_handler. Per Alex Hunsaker

Improve sequence and sense of messages from pg_stop_backup(). Now doesn’t report it is waiting until it actually is waiting, plus message doesn’t appear until at least 5 seconds wait, so we avoid reporting the wait before we’ve given the archiver a reasonable time to wake up and archive the file we just created earlier in the function. Also add new unconditional message to confirm safe completion. Now a normal, healthy execution does not report waiting at all, just safe completion.

Tune GetSnapshotData() during Hot Standby by avoiding loop through normal backends. Makes code clearer also, since we avoid various Assert()s. Performance of snapshots taken during recovery no longer depends upon number of read-only backends.

On Windows, syslogger runs in two threads. The main thread processes config reload and rotation signals, and a helper thread reads messages from the pipe and writes them to the log file. However, server code isn’t generally thread-safe, so if both try to do e.g palloc()/pfree() at the same time, bad things will happen. To fix that, use a critical section (which is like a mutex) to enforce that only one the threads are active at a time.

So far as I can tell, the only caller that actually fails when a garbage
OID is returned is exec_stmt_case(), which is new in 8.4 --- in all other
cases, we might make a useless trip through casting logic, but we won't
fail since the isnull flag will be set. Hence, backpatch only to 8.4,
just in case there are apps out there that aren't expecting an error to
be thrown if the query returns more or less than one column. (Which seems
unlikely, since the error would be thrown if the query ever did return a
row; but it's possible there's some never-exercised code out there.)
Per report from Mario Splivalo.

Fix a problem introduced by my patch of 2010-01-12 that revised the way relcache reload works. In the patched code, a relcache entry in process of being rebuilt doesn’t get unhooked from the relcache hash table; which means that if a cache flush occurs due to sinval queue overrun while we’re rebuilding it, the entry could get blown away by RelationCacheInvalidate, resulting in crash or misbehavior. Fix by ensuring that an entry being rebuilt has positive refcount, so it won’t be seen as a target for removal if a cache flush occurs. (This will mean that the entry gets rebuilt twice in such a scenario, but that’s okay.) It appears that the problem can only arise within a transaction that has previously reassigned the relfilenode of a pre-existing table, via TRUNCATE or a similar operation. Per bug #5412 from Rusty Conover.

Back-patch to 8.2, same as the patch that introduced the problem.
I think that the failure can't actually occur in 8.2, since it lacks the
rd_newRelfilenodeSubid optimization, but let's make it work like the later
branches anyway.
Patch by Heikki, slightly editorialized on by me.

Update the location of last removed WAL segment in shared memory only after actually removing one, so that if we can’t remove segments because WAL archiving is lagging behind, we don’t unnecessarily forbid streaming the old not-yet-archived segments that are still perfectly valid. Per suggestion from Fujii Masao.

Change the logic to decide when to delete old WAL segments, so that it doesn’t take into account how far the WAL senders are. This way a hung WAL sender doesn’t prevent old WAL segments from being recycled/removed in the primary, ultimately causing the disk to fill up. Instead add standby_keep_segments setting to control how many old WAL segments are kept in the primary. This also makes it more reliable to use streaming replication without WAL archiving, assuming that you set standby_keep_segments high enough.

At present, killing the startup process does not release any locks it holds,
so we must wait to stop the startup and walreceiver processes until all
read-only backends have exited. Without this patch, the startup and
walreceiver processes never exit, so the server gets permanently stuck in
a half-shutdown state.
Fujii Masao, with review, docs, and comment adjustments by me.

Fix to_char YYY, YY, Y format codes so that FM zero-suppression really works, rather than only sort-of working as the previous attempt had left it. Clean up some unnecessary differences between the way these were coded and the way the YYYY case was coded. Update the regression test cases that proved that it wasn’t working.

Allow quotes to be escaped in recovery.conf, by doubling them. This patch also makes the parsing a little bit stricter, rejecting garbage after the parameter value and values with missing ending quotes, for example.

Forbid using pg_xlogfile_name() and pg_xlogfile_name_offset() during recovery. We might want to relax this in the future, but ThisTimeLineID isn’t currently correct in backends during recovery, so the filename returned was wrong.

Arrange to remove pg_default_acl entries completely if their ACL setting is changed to match the hard-wired default. This avoids accumulating useless catalog entries, and also provides a path for dropping the owning role without using DROP OWNED BY. Per yesterday’s complaint from Jaime Casanova, the need to use DROP OWNED BY for that is less than obvious, so providing this alternative method might save some user frustration.

Fix updateAclDependencies() to not assume that ACL role dependencies can only be added during GRANT and can only be removed during REVOKE; and fix its callers to not lie to it about the existing set of dependencies when instantiating a formerly-default ACL. The previous coding accidentally failed to malfunction so long as default ACLs contain only references to the object’s owning role, because that role is ignored by updateAclDependencies. However this is obviously pretty fragile, as well as being an undocumented assumption. The new coding is a few lines longer but IMO much clearer.

\ddp should be recognized as such even if user appends S or + to it. Those options do nothing right now, but might be wanted later, and in any case it’s confusing for the command to be interpreted as \dd if anything is appended. Per Jaime Casanova.

The endterm attribute is mainly useful when the toolchain does not support
automatic link target text generation for a particular situation. In the
past, this was required by the man page tools for all reference page links,
but that is no longer the case, and it now actually gets in the way of
proper automatic link text generation. The only remaining use cases are
currently xrefs to refsects.

Allow for more room in the man page title, so that “CREATE TEXT SEARCH CONFIGURATION” is not truncated.

The previous coding failed in various scenarios possibly including vpath
builds and doing make install without preceding make all.

Move system startup message prior to any calls out of data directory. This allows us to see what mode the server is in before it starts to perform actions that can block or hang. Otherwise server messages may not appear until after messages that say FATAL the database server is starting up.

FATAL errors are meant to stop ecpg immediately, e.g. because the syntax is corrupted. This error, however, does is not a compilation problem but a runtime one, so we can keep compiling but still have to declare ERROR.

Backpatch to 8.2, which is the oldest version supported on Windows. In
8.2 and 8.3 also backpatch the earlier change to use DEVNULL instead of
NULL_DEV #define for a /dev/null-like device. NULL_DEV was hard-coded to
"/dev/null" regardless of platform, which didn't work on Windows, while
DEVNULL works on all platforms. Restarting syslogger didn't work on
Windows on versions 8.3 and below because of that.

Use a file of patterns of filenames to exclude from pgindent runs, instead if using multiple invocations of egrep. Add perl ppport.h to the current list.

The error message now makes explicit reference to the GUC that must be changed
to fix the problem, using wording suggested by Tom Lane. Along the way,
rename the GUC from MaxWalSenders to max_wal_senders for consistency and
grep-ability.

Move sections around to make the high availability chapter more
coherent: the most prominent part is now a "Log-Shipping Standby Servers"
section that describes what a standby server is (like the old
"Warm Standby Servers for High Availability" section), and how to
set up a warm standby server, including streaming replication, using the
built-in standby mode. The pg_standby method is desribed in another
section called "Alternative method for log shipping", with the added
caveat that it doesn't work with streaming replication.

Change recovery.conf.sample to match postgresql.conf by showing only default values, with example comments.

Fix “constraint_exclusion = partition” logic so that it will also attempt constraint exclusion on an inheritance set that is the target of an UPDATE or DELETE query. Per gripe from Marc Cousin. Back-patch to 8.4 where the feature was introduced.

Change the retry-loop in standby mode to also try restoring files from pg_xlog directory. This is essential for replaying WAL records that were streamed from the master, after a standby server restart.

If a corrupt record is seen in a file restored from the archive or
streamed from the master, log it as a WARNING and keep retrying. If the
corruption is permanent, and not just a glitch in the whatever copies the
files to the archive or a network error not caught by CRC checks in TCP
for example, we will keep retrying and logging the WARNING indefinitely.
But that's better than shutting down completely, the standby is still
useful for running read-only queries. In PITR the recovery ends at such a
corrupt record, which is a bit questionable, but that's the behavior we
had in previous releases and we don't feel like chaning it now. It does
make sense for tools like pg_standby.

It is no longer installed by default, but included in "make world"/"make
install-world". Documentation updated accordingly.
Also, fix vpathsearch function to work when calling make install-docs
without previous make docs.

Rework join-removal logic as per recent discussion. In particular this fixes things so that it works for cases where nested removals are possible. The overhead of the optimization should be significantly less, as well.

Derive latestRemovedXid for btree deletes by reading heap pages. The WAL record for btree delete contains a list of tids, even when backup blocks are present. We follow the tids to their heap tuples, taking care to follow LP_REDIRECT tuples. We ignore LP_DEAD tuples on the understanding that they will always have xmin/xmax earlier than any LP_NORMAL tuples referred to by killed index tuples. Iff all tuples are LP_DEAD we return InvalidTransactionId. The heap relfilenode is added to the WAL record, requiring API changes to pass down the heap Relation. XLOG_PAGE_MAGIC updated.

Fix ginint4_queryextract() to actually do what it was intended to do for an unsatisfiable query, such as indexcol && empty_array. It should return -1 to tell GIN no scan is required; but silly typo disabled the logic for that, resulting in unnecessary “GIN indexes do not support whole-index scans” error. Per bug report from Jeff Trout.

Fix an oversight in join-removal optimization: we have to check not only for plain Vars that are generated in the inner rel and used above the join, but also for PlaceHolderVars. Per report from Oleg K.

ECPG only copied #include statements instead of processing them according to commandline option “-i”. This change fixes this and adds a test case. It also honors #include_next, although this is probably never used for embedded SQL.

Further corrections of mismatching struct and btree SizeOf macros. In this case, correction is to remove now unused fields from struct. Since these were unused and full of garbage anyway, no version change.

Clear error_context_stack and debug_query_string at the beginning of proc_exit, so that we won’t try to attach any context printouts to messages that get emitted while exiting. Per report from Dennis Koegel, the context functions won’t necessarily work after we’ve started shutting down the backend, and it seems possible that debug_query_string could be pointing at freed storage as well. The context information doesn’t seem particularly relevant to such messages anyway, so there’s little lost by suppressing it.

Modify error context callback functions to not assume that they can fetch catalog entries via SearchSysCache and related operations. Although, at the time that these callbacks are called by elog.c, we have not officially aborted the current transaction, it still seems rather risky to initiate any new catalog fetches. In all these cases the needed information is readily available in the caller and so it’s just a matter of a bit of extra notation to pass it to the callback.

Per crash report from Dennis Koegel. I've concluded that the real fix for
his problem is to clear the error context stack at entry to proc_exit, but
it still seems like a good idea to make the callbacks a bit less fragile
for other cases.
Backpatch to 8.4. We could go further back, but the patch doesn't apply
cleanly. In the absence of proof that this fixes something and isn't just
paranoia, I'm not going to expend the effort.

Fix oversight in btpo.xact patch; it was in fact installing garbage in the xact field on replay, due to not writing out all the data in the wal log struct.

Clarify docs about database parameter in streaming replication primary_conninfo. Docs were unclear on whether or not database=replication was required, nor did they mention the FATAL error this causes if database parameter is mentioned explicitly, whatever its value.

Add connection messages for streaming replication. log_connections was broken for a replication connection and no messages were displayed on either standby or primary, at any debug level. Connection messages needed to diagnose session drop/reconnect events. Use LOG mode for now, discuss lowering in later releases.

Adjust comment in .history file to match recovery target specified. Comment present since 8.0 was never fully meaningful, since two recovery targets cannot be specified. Refactor recovery target type to make this change and associated code easier to understand. No change in function.

In PLy_spi_execute_plan, use the data-type specific Python-to-PostgreSQL
conversion function instead of passing everything through InputFunctionCall
as a string. The equivalent fix was already done months ago for function
parameters and return values, but this other gateway between Python and
PostgreSQL was apparently forgotten. As a result, data types that need
special treatment, such as bytea, would misbehave when used with
plpy.execute.

Add restartpoint_command option to recovery.conf. Fix bug in %r handling in recovery_end_command, it always came out as 0 because InRedo was cleared before recovery_end_command was executed. Also, always take ControlFileLock when reading checkpoint location for %r.

This variable is apparently only for Python internally. In newer releases
of Python this variable pulls in more and more libraries that users are
less likely to have, leading to potential build failures.

Pass incompletely-transformed aggregate argument lists as separate parameters to transformAggregateCall, instead of abusing fields in Aggref to carry them temporarily. No change in functionality but hopefully the code is a bit clearer now. Per gripe from Gokulakannan Somasundaram.

Add some logging code for unexpected cases in pgstat.c, particularly being unable to read a stats file for reasons other than ENOENT, and having to reset last_statrequest because it’s later than current time in the collector. Not clear if this will shed any light on the “pgstat wait timeout” business, but it seems like a good idea in general.

Sync timezone code with tzcode 2010c from the Olson group. This fixes some corner cases that come up in certain timezones (apparently, only those with lots and lots of distinct TZ transition rules, as far as I can gather from a quick scan of their archives). Per suggestion from Jeevan Chalke.

Simplify a couple of pg_dump and psql \d queries about index constraints by joining to pg_constraint.conindid, instead of the former technique of joining indirectly through pg_depend. This is much more straightforward and probably faster as well. I had originally desisted from changing these queries when conindid was added because I was worried about losing performance, but if we join on conrelid as well as conindid then the index on conrelid can be used when pg_constraint is large.

ecpg now adds a unique counter to its varchar struct definitions to make these definitions unique, too. It used to use the linenumber but in the rare case of two definitions in one line this was not unique.

Return proper exit code (3) from psql when ON_ERROR_STOP=on and –single-transaction are both used and the failure happens in commit, e.g. failed deferred trigger. Also properly free BEGIN/COMMIT result structures from –single-transaction.

Fix warning messages in restrict_and_check_grant() to include the column name when warning about column-level privileges. This is more useful than before and makes the apparent duplication complained of by Piyush Newe not so duplicate. Also fix lack of quote marks in a related message text.

When reading pg_hba.conf and similar files, do not treat @file as an inclusion unless (1) the @ isn’t quoted and (2) the filename isn’t empty. This guards against unexpectedly treating usernames or other strings in “flat files” as inclusion requests, as seen in a recent trouble report from Ed L. The empty-filename case would be guaranteed to misbehave anyway, because our subsequent path-munging behavior results in trying to read the directory containing the current input file.

I think this might finally explain the report at
http://archives.postgresql.org/pgsql-bugs/2004-05/msg00132.php
of a crash after printing "authentication file token too long, skipping",
since I was able to duplicate that message (though not a crash) on a
platform where stdio doesn't refuse to read directories. We never got
far in investigating that problem, but now I'm suspicious that the trigger
condition was an @ in the flat password file.
Back-patch to all active branches since the problem can be demonstrated in all
branches except HEAD. The test case, creating a user named "@", doesn't cause
a problem in HEAD since we got rid of the flat password file. Nonetheless it
seems like a good idea to not consider quoted @ as a file inclusion spec,
so I changed HEAD too.

In case the connection magically disappears libecpg only returns an internal error sqlstate. This change makes it return a correct value..

Fix a couple of places that would loop forever if attempts to read a stdio file set ferror() but never set feof(). This is known to be the case for recent glibc when trying to read a directory as a file, and might be true for other platforms/cases too. Per report from Ed L. (There is more that we ought to do about his report, but this is one easily identifiable issue.)

Make contrib/xml2 use core xml.c’s error handler, when available (that is, in versions >= 8.3). The core code is more robust and efficient than what was there before, and this also reduces risks involved in swapping different libxml error handler settings.

Before 8.3, there is still some risk of problems if add-on modules such as
Perl invoke libxml without setting their own error handler. Given the lack
of reports I'm not sure there's a risk in practice, so I didn't take the
step of actually duplicating the core code into older contrib/xml2 branches.
Instead I just tweaked the existing code to ensure it didn't leave a dangling
pointer to short-lived memory when throwing an error.

Export xml.c’s libxml-error-handling support so that contrib/xml2 can use it too, instead of duplicating the functionality (badly).

I renamed xml_init to pg_xml_init, because the former seemed just a bit too
generic to be safe as a global symbol. I considered likewise renaming
xml_ereport to pg_xml_ereport, but felt that the reference to ereport probably
made it sufficiently PG-centric already.

Instead of trying (and failing) to allow <> at the end of a DECLARE section, throw an error message saying explicitly that the label must go before DECLARE. Per investigation of a recent pgsql-novice question, this code did not work as intended in any modern PG version, maybe not ever. Allowing such a thing would only create ambiguity anyway, so it seems better to remove it than fix it.

Cause plpgsql to throw an error if “INTO rowtype_var” is followed by a comma. Per bug #5352, this helps to provide a useful error message if the user tries to do something presently unsupported, namely use a rowtype variable as a member of a multiple-item INTO list.

Fix numericlocale psql option when used with a null string and latex and troff formats; a null string must not be formatted as a numeric. The more exotic formats latex and troff also incorrectly formatted all strings as numerics when numericlocale was on.

This involves modifying the module to have a stable ABI, that is, the
xslt_process() function still exists even without libxslt. It throws a
runtime error if called, but doesn't prevent executing the CREATE FUNCTION
call. This is a good thing anyway to simplify cross-version upgrades.

It’s clearly now pointless to do backwards compatible parsing of this, since we released a version without it, so remove the comment that says we might want to do that.

These are unnecessary and probably dangerous. I don't see any immediate
risk situations in the core XML support or contrib/xml2 itself, but there
could be issues with external uses of libxml2, and in any case it's an
accident waiting to happen.

add EPERM to the list of return codes to expect from opening directories based on Vista results

Get rid of the code that attempted to funnel libxml2's memory allocations
into palloc. We already knew from experience with the core xml datatype
that trying to do this is simply not reliable. Unlike the core code, I
did not bother adding a lot of PG_TRY/PG_CATCH logic to try to ensure that
everything is cleaned up on error exit. Hence, we might leak some memory
if one of these functions fails partway through. Given the deprecated
status of this contrib module and the fact that errors partway through
the functions shouldn't be too common, it doesn't seem worth worrying about.
Also fix a separate bug in xpath_table, that it did the wrong things
if given a result tuple descriptor with less than 2 columns. While
such a case isn't very useful in practice, we shouldn't fail or stomp
memory when it occurs.
Add some simple regression tests based on all the reported crash cases
that I have on hand.
This should be back-patched, but let's see if the buildfarm likes it first.

Second try at fsyncing directories in CREATE DATABASE. Let’s see what the build farm says of opening directories read-only and ignoring EBADF from fsync of directories

We had originally made the stronger assumption that NOT A refutes any B
if B implies A, but this fails in three-valued logic, because we need to
prove B is false not just that it's not true. However the logic does
go through if B is equal to A.
Recognizing this limited case is enough to handle examples that arise when
we have simplified "bool_var = true" or "bool_var = false" to just "bool_var"
or "NOT bool_var". If we had not done that simplification then the
btree-operator proof logic would have been able to prove that the expressions
were contradictory, but only for identical expressions being compared to the
constants; so handling identical A and B covers all the same cases.
The motivation for doing this is to avoid unexpected asymmetrical behavior
when a partitioned table uses a boolean partitioning column, as in today's
gripe from Dominik Sander.
Back-patch to 8.2, which is as far back as predicate_refuted_by attempts to
do anything at all with NOTs.

Add configuration parameter ssl_renegotiation_limit to control how often we do SSL session key renegotiation. Can be set to 0 to disable renegotiation completely, which is required if a broken SSL library is used (broken patches to CVE-2009-3555 a known cause) or when using a client library that can’t do renegotiation.

The main motivation for changing this is bug #4921, in which it's pointed out
that it's no longer safe to apply ltree operations to the result of
ARRAY(SELECT ...) if the sub-select might return no rows. Before 8.3,
the ARRAY() construct would return NULL, which might or might not be helpful
but at least it wouldn't result in an error. Now it returns an empty array
which results in a failure for no good reason, since the ltree operations
are all perfectly capable of dealing with zero-element arrays.
As far as I can find, these ltree functions are the only places where zero
array dimensionality is rejected unnecessarily.
Back-patch to 8.3 to prevent behavioral regression of queries that worked
in older releases.

Make pg_regress use CREATE OR REPLACE LANGUAGE, so that –load-language will work whether or not the specified language is preinstalled. This responds to some complaints about having to change test scripts because plpgsql is preinstalled as of 9.0.

This operates in the same way as other CREATE OR REPLACE commands, ie,
it replaces everything but the ownership and ACL lists of an existing
entry, and requires the caller to have owner privileges for that entry.
While modifying an existing language has some use in development scenarios,
in typical usage all the "replaced" values come from pg_pltemplate so there
will be no actual change in the language definition. The reason for adding
this is mainly to allow programs to ensure that a language exists without
triggering an error if it already does exist.
This commit just adds and documents the new option. A followon patch
will use it to clean up some unpleasant cases in pg_dump and pg_regress.

Minor style policing for error messages in pg_dump tar code. Notably, change “dumping data out of order is not supported” to “restoring data out of order is not supported”, because you get that error during pg_restore not pg_dump. Also fix some comments that didn’t look so good after being pgindented as perhaps they did originally.

Let’s try forcing errno to zero before issuing fsync. The current buildfarm results claiming EBADF seem improbable enough that I’m not convinced fsync is really returning that — could it be failing to set errno at all?

Adjust pg_fsync_writethrough so that it will set errno when failing on a platform that doesn’t support this operation. The former coding would allow an unrelated errno to be reported, which would be quite misleading. Not sure if this has anything to do with the current buildfarm failures, but it’s certainly bogus as-is.

Add some checks that seem logically necessary, in particular let's make
real sure that HS slave sessions cannot create temp tables. (If they did
they would think that temp tables belonging to the master's session with
the same BackendId were theirs. We *must* not allow myTempNamespace to
become set in a slave session.)
Change setval() and nextval() so that they are only allowed on temp sequences
in a read-only transaction. This seems consistent with what we allow for
table modifications in read-only transactions. Since an HS slave can't have a
temp sequence, this also provides a nicer cure for the setval PANIC reported
by Erik Rijkers.
Make the error messages more uniform, and have them mention the specific
command being complained of. This seems worth the trifling amount of extra
code, since people are likely to see such messages a lot more than before.

Make ‘include_realm’ ordering consistent in the docs, to match recent doc change.

Reduce the rescan cost estimate for Materialize nodes to cpu_operator_cost per tuple, instead of the former cpu_tuple_cost. It is sane to charge less than cpu_tuple_cost because Materialize never does any qual-checking or projection, so it’s got less overhead than most plan node types. In particular, we want to have the same charge here as is charged for readout in cost_sort. That avoids the problem recently exhibited by Teodor wherein the planner prefers a useless sort over a materialize step in a context where a lot of rescanning will happen. The rescan costs should be just about the same for both node types, so make their estimates the same.

Not back-patching because all of the current logic for rescan cost estimates
is new in 9.0. The old handling of rescans is sufficiently not-sane that
changing this in that structure is a bit pointless, and might indeed cause
regressions.

- The message "server stopped" should be affected by the -s option, just
like "server started" already was.
- The message "could not start server" should consistently go to stderr.

Don’t use O_DIRECT when writing WAL files if archiving or streaming is enabled. Bypassing the kernel cache is counter-productive in that case, because the archiver/walsender process will read from the WAL file soon after it’s written, and if it’s not cached the read will cause a physical read, eating I/O bandwidth available on the WAL drive.

Fix STOP WAL LOCATION in backup history files no to return the next segment of XLOG_BACKUP_END record even if the the record is placed at a segment boundary. Furthermore the previous implementation could return nonexistent segment file name when the boundary is in segments that has “FE” suffix; We never use segments with “FF” suffix.

Volatile-ize all five places where we expect a PG_TRY block to restore old memory context in plpython. Before only one of them was marked volatile, but per report from Zdenek Kotala, some compilers do the wrong thing here.

Provide some rather hokey ways for EXPLAIN to print FieldStore and assignment ArrayRef expressions that are not in the immediate context of an INSERT or UPDATE targetlist. Such cases never arise in stored rules, so ruleutils.c hadn’t tried to handle them. However, they do occur in the targetlists of plans derived from such statements, and now that EXPLAIN VERBOSE tries to print targetlists, we need some way to deal with the case.

I chose to represent an assignment ArrayRef as "array[subscripts] := source",
which is fairly reasonable and doesn't omit any information. However,
FieldStore is problematic because the planner will fold multiple assignments
to fields of the same composite column into one FieldStore, resulting in a
structure that is hard to understand at all, let alone display comprehensibly.
So in that case I punted and just made it print the source expression(s).
Backpatch to 8.4 --- the lack of functionality exists in older releases,
but doesn't seem to be important for lack of anything that would call it.

Fix ExecEvalArrayRef to pass down the old value of the array element or slice being assigned to, in case the expression to be assigned is a FieldStore that would need to modify that value. The need for this was foreseen some time ago, but not implemented then because we did not have arrays of composites. Now we do, but the point evidently got overlooked in that patch. Net result is that updating a field of an array element doesn’t work right, as illustrated if you try the new regression test on an unpatched backend. Noted while experimenting with EXPLAIN VERBOSE, which has also got some issues in this area.

Force READY portals into FAILED state when a transaction or subtransaction is aborted, if they were created within the failed xact. This prevents ExecutorEnd from being run on them, which is a good idea because they may contain references to tables or other objects that no longer exist. In particular this is hazardous when auto_explain is active, but it’s really rather surprising that nobody has seen an issue with this before. I’m back-patching this to 8.4, since that’s the first version that contains auto_explain or an ExecutorEnd hook, but I wonder whether we shouldn’t back-patch further.

Fix up pg_dump’s treatment of large object ownership and ACLs. We now emit a separate archive entry for each BLOB, and use pg_dump’s standard methods for dealing with its ownership, ACL if any, and comment if any. This means that switches like –no-owner and –no-privileges do what they’re supposed to. Preliminary testing says that performance is still reasonable even with many blobs, though we’ll have to see how that shakes out in the field.

When updating ShmemVariableCache from a checkpoint record, be sure to set all the values derived from oldestXid, not just that field. Brain fade in one of my patches associated with flat file removal, exposed by a report from Fujii Masao.

Make NOTIFY_PAYLOAD_MAX_LENGTH depend explicitly on BLCKSZ and NAMEDATALEN, so this code doesn’t go nuts with smaller than default BLCKSZ or larger than default NAMEDATALEN. The standard value is still exactly 8000.

This implementation should be significantly more efficient than the old one,
and is also more compatible with Hot Standby usage. There is not yet any
facility for HS slaves to receive notifications generated on the master,
although such a thing is possible in future.
Joachim Wieland, reviewed by Jeff Davis; also hacked on by me.