Currently all schemas are owned by the super-user because they are copied from the template1 database. However, since all objects are inherited from the template database, it is not clear that setting schemas to the db owner is correct.

Allow log_min_messages to be specified on a per-module basis

This would allow administrators to see more detailed information from specific sections of the backend, e.g. checkpoints, autovacuum, etc. Another idea is to allow separate configuration files for each module, or allow arbitrary SET commands to be passed to them. See also Logging Brainstorm.

Simplify ability to create partitioned tables

This would allow creation of partitioned tables without requiring creation of triggers or rules for INSERT/UPDATE/DELETE, and constraints for rapid partition selection. Options could include range and hash partition selection. See also Table partitioning

Currently ALTER USER and ALTER DATABASE support per-user and per-database defaults. Consider adding per-user-and-database defaults so things like search_path can be defaulted for a specific user connecting to a specific database.

Configuration files

Allow pg_hba.conf to specify host names along with IP addresses

Host name lookup could occur when the postmaster reads the pg_hba.conf file, or when the backend starts. Another solution would be to reverse lookup the connection IP and check that hostname against the host names in pg_hba.conf. We could also then check that the host name maps to the IP address.

Tablespaces

Allow a database in tablespace t1 with tables created in tablespace t2 to be used as a template for a new database created with default tablespace t2

Currently all objects in the default database tablespace must have default tablespace specifications. This is because new databases are created by copying directories. If you mix default tablespace tables and tablespace-specified tables in the same directory, creating a new database from such a mixed directory would create a new database with tables that had incorrect explicit tablespaces. To fix this would require modifying pg_class in the newly copied database, which we don't currently do.

Allow reporting of which objects are in which tablespaces

This item is difficult because a tablespace can contain objects from multiple databases. There is a server-side function that returns the databases which use a specific tablespace, so this requires a tool that will call that function and connect to each database to find the objects in each database for that tablespace.

Allow WAL replay of CREATE TABLESPACE to work when the directory structure on the recovery computer is different from the original

Currently subtracting one date from another that crosses a daylight savings time adjustment can return '1 day 1 hour', but adding that back to the first date returns a time one hour in the future. This is caused by the adjustment of '25 hours' to '1 day 1 hour', and '1 day' is the same time the next day, even if daylight savings adjustments are involved.

Inline ORDER BY for XMLAGG. Example: "... XMLAGG(XMLELEMENT(...) ORDER BY col1) ..." (should be made to work with all aggregate functions)

XMLNAMESPACES support in XMLELEMENT and elsewhere

XSLT support; already available in contrib/xml2, but needs API fixes and adaptation to xml type.

XML Canonical: Convert XML documents to canonical form to compare them. libxml2 has support for this.

Pretty-printing XML: Parse a document and serialize it back in some indented form. libxml2 might support this.

XMLQUERY (from SQL/XML standard)

In some cases shredding could be better option (if there is no need in keeping XML docs entirely; if we have already developed tools that understand only relational data; etc) -- it would be a separate module that implements annotated schema decomposition technique, similar to DB2 and SQL Server functionality.

Some special format flag would be required to request such accumulation. Such functionality could also be added to EXTRACT. Prevent accumulation that crosses the month/day boundary because of the uneven number of days in a month.

to_char(INTERVAL '1 hour 5 minutes', 'MI') => 65

to_char(INTERVAL '43 hours 20 minutes', 'MI' ) => 2600

to_char(INTERVAL '43 hours 20 minutes', 'WK:DD:HR:MI') => 0:1:19:20

to_char(INTERVAL '3 years 5 months','MM') => 41

Allow SQL-language functions to reference parameters by parameter name

Currently SQL-language functions can only refer to dollar parameters, e.g. $1

Add SPI_gettypmod() to return the typemod for a TupleDesc

Enforce typmod for function inputs, function results and parameters for spi_prepare'd statements called from PLs

Right now materialized views require the user to create triggers on the main table to keep the summary table current. SQL syntax should be able to manage the triggers and summary table automatically. A more sophisticated implementation would automatically retrieve from the summary table when the main table is referenced, if possible. See Materalized Views for implementation details.

Currently anonymous (un-named) queries prepared via the wire protocol are replanned every time bind parameters are supplied --- allow SQL PREPARE to do the same. Also, allow control over replanning prepared queries either manually or automatically when statistics for execute parameters differ dramatically from those used during planning.

When this is done, backslash-quote should be prohibited in non-E'' strings because of possible confusion over how such strings treat backslashes. Basically, '' is always safe for a literal single quote, while \' might or might not be based on the backslash handling rules.

Simplify dropping roles that have objects in several databases

Allow the count returned by SELECT, etc to be represented as an int64 to allow a higher range of values

Lateral subqueries can reference columns of tables defined outside the subquery at the same level, i.e. laterally.
For example, a LATERAL subquery in a FROM clause could reference tables defined in the same FROM clause.
Currently only the columns of tables defined above subqueries are recognized.

Currently NOT NULL constraints are stored in pg_attribute without any designation of their origins, e.g. primary keys. One manifest problem is that dropping a PRIMARY KEY constraint does not remove the NOT NULL constraint designation. Another issue is that we should probably force NOT NULL to be propagated from parent tables to children, just as CHECK constraints are. (But then does dropping PRIMARY KEY affect children?)

CLUSTER

Automatically maintain clustering on a table

This might require some background daemon to maintain clustering during periods of low usage. It might also require tables to be only partially filled for easier reorganization. Another idea would be to create a merged heap/index data file so an index lookup would automatically access the heap data too. A third idea would be to store heap rows in hashed groups, perhaps using a user-supplied hash function.

On crash recovery, the table involved in the COPY would be removed or have its heap and index files truncated. One issue is that no other backend should be able to add to the table at the same time, which is something that is currently allowed. This currently is done if the table is created inside the same transaction block as the COPY because no other backends can see the table.

Integrity Constraints

Keys

[D]Allow DEFERRABLE UNIQUE constraints

This would allow UPDATE tab SET col = col + 1 to work if col has a unique index. Currently, uniqueness checks are done while the command is being executed, rather than at the end of the statement or transaction.

You have to drop and create the function(s) each time its arguments
are modified (not nice), or don't cache the input and output functions
(slower?), or check if the structure of the argument has been
altered (is this possible, easy, quick?) and recreate cache.

Clients

pg_ctl

Allow pg_ctl to work properly with configuration files located outside the PGDATA directory

pg_ctl can not read the pid file because it isn't located in the config directory but in the PGDATA directory. The solution is to allow pg_ctl to read and understand postgresql.conf to find the data_directory value.

Modify pg_ctl behavior and exit codes to make it easier to write an LSB conforming init script

It may be desirable to condition some of the changes on a command-line switch, to avoid breaking existing scripts. A Linux shell (sh) script is referenced which has been tested and seems to provide a high degree of conformance in multiple environments. Study of this script might suggest areas where pg_ctl could be modified to make writing an LSB conforming script easier; however, some aspects of that script would be unnecessary with other suggested changes to pg_ctl, and discussion on the lists did not reach consensus on support for all aspects of this script. Further discussion of particular changes is needed before beginning any work.

Add auto-expanded mode so expanded output is used if the row length is wider than the screen width.

Consider using auto-expanded mode for backslash commands like \df+.

Prevent tab completion of SET TRANSACTION from querying the database and therefore preventing the transaction isolation level from being set.

Currently SET <tab> causes a database lookup to check all supported session variables. This query causes problems because setting the transaction isolation level must be the first statement of a transaction.

Add a \set variable to control whether \s displays line numbers

Another option is to add \# which lists line numbers, and allows command execution.

Using native locale names puts roadblocks in the way of porting a dump to another platform. One possible solution is to get
CREATE DATABASE to accept some agreed-on set of locale names and fix them up to meet the platform's requirements.

In a selective dump, allow dumping of an object and all its dependencies

Add options like pg_restore -l and -L to pg_dump

Add support for multiple pg_restore -t options, like pg_dump

pg_restore's -t switch is less useful than pg_dump's in quite a few ways: no multiple switches, no pattern matching, no ability to pick up indexes and other dependent items for a selected table. It should be made to handle this switch just like pg_dump does.

Stop dumping CASCADE on DROP TYPE commands in clean mode

Allow pg_dump --clean to drop roles that own objects or have privileges

tgl says: if this is about pg_dumpall, it's done as of 8.4. If it's really about pg_dump, what does it mean? pg_dump has no business dropping roles.

Change pg_dump so that a comment on the dumped database is applied to the loaded database, even if the database has a different name. This will require new backend syntax, perhaps COMMENT ON CURRENT DATABASE.

Allow pg_restore to load different parts of the COPY data for a single table simultaneously

Remove support for dumping from pre-7.3 servers

In 7.3 and later, we can get accurate dependency information from the server. pg_dump still contains a lot of crufty code
to try to deal with the lack of dependency info in older servers, but the usefulness of maintaining that code grows small.

Allow pre/data/post files when schema and data are dumped separately, for performance reasons

Memory leaks in a short running application like ecpg are not really a problem, but make debugging more complicated

libpq

Add PQescapeIdentifierConn()

Prevent PQfnumber() from lowercasing unquoted column names

PQfnumber() should never have been doing lowercasing, but historically it has so we need a way to prevent it

Allow statement results to be automatically batched to the client

Currently all statement results are transferred to the libpq client before libpq makes the results available to the application. This feature would allow the application to make use of the first result rows while the rest are transferred, or held on the server waiting for them to be requested by libpq. One complexity is that a statement like SELECT 1/col could error out mid-way through the result set.

Triggers

Improve storage of deferred trigger queue

Right now all deferred trigger information is stored in backend memory. This could exhaust memory for very large trigger queues. This item involves dumping large queues into files, or doing some kind of join to process all the triggers, some bulk operation, or a bitmap.

This is currently possible by starting a multi-statement transaction, modifying the system tables, performing the desired SQL, restoring the system tables, and committing the transaction. ALTER TABLE ... TRIGGER requires a table lock so it is not ideal for this usage.

System tables are modified in many places in the backend without going through the executor and therefore not causing triggers to fire. To complete this item, the functions that modify system tables will have to fire triggers.

The main difficulty with this item is the problem of creating an index that can span multiple tables.

Determine whether ALTER TABLE / SET SCHEMA should work on inheritance hierarchies (and thus support ONLY). If yes, implement it.

ALTER TABLE variants sometimes support recursion and sometimes not, but this is poorly/not documented, and the ONLY marker would then be silently ignored. Clarify the documentation, and reject ONLY if it is not supported.

Indexes

Add UNIQUE capability to non-btree indexes

Prevent index uniqueness checks when UPDATE does not modify the column

Uniqueness (index) checks are done when updating a column even if the column is not modified by the UPDATE.
However, HOT already short-circuits this in common cases, so more work might not be helpful.

Allow the creation of on-disk bitmap indexes which can be quickly combined with other bitmap indexes

Such indexes could be more compact if there are only a few distinct values. Such indexes can also be compressed. Keeping such indexes updated can be costly.

This is difficult because you must upgrade to an exclusive table lock to replace the existing index file. CREATE INDEX CONCURRENTLY does not have this complication. This would allow index compaction without downtime.

Cache Usage

Speed up COUNT(*)

We could use a fixed row count and a +/- count to follow MVCC visibility rules, or a single cached value could be used and invalidated if anyone modifies the table. Another idea is to get a count directly from a unique index, but for this to be faster than a sequential scan it must avoid access to the heap to obtain tuple visibility information.

Provide a way to calculate an "estimated COUNT(*)"

Perhaps by using the optimizer's cardinality estimates or random sampling.

Currently indexes do not have enough tuple visibility information to allow data to be pulled from the index without also accessing the heap. One way to allow this is to set a bit on index tuples to indicate if a tuple is currently visible to all transactions when the first valid heap lookup happens. This bit would have to be cleared when a heap tuple is expired.
Another idea is to maintain a bitmap of heap pages where all rows are visible to all backends, and allow index lookups to reference that bitmap to avoid heap lookups, perhaps the same bitmap we might add someday to determine which heap pages need vacuuming. Frequently accessed bitmaps would have to be stored in shared memory. One 8k page of bitmaps could track 512MB of heap pages.
A third idea would be for a heap scan to check if all rows are visible and if so set a per-table flag which can be checked by index scans. Any change to the table would have to clear the flag. To detect changes during the heap scan a counter could be set at the start and checked at the end --- if it is the same, the table has not been modified --- any table change would increment the counter.

Startup Time Improvements

Experiment with multi-threaded backend for backend creation

This would prevent the overhead associated with process creation. Most operating systems have trivial process creation time compared to database startup overhead, but a few operating systems (Win32, Solaris) might benefit from threading. Also explore the idea of a single session using multiple threads to execute a statement faster.

Write-Ahead Log

Eliminate need to write full pages to WAL before page modification

Currently, to protect against partial disk page writes, we write full page images to WAL before they are modified so we can correct any partial page writes during recovery. These pages can also be eliminated from point-in-time archive files.

When full page writes are off, write CRC to WAL and check file system blocks on recovery

If CRC check fails during recovery, remember the page in case a later CRC for that page properly matches.

Write full pages during file system write and not when the page is modified in the buffer cache

This allows most full page writes to happen in the background writer. It might cause problems for applying WAL on recovery into a partially-written page, but later the full page will be replaced from WAL.

[D]Allow WAL traffic to be streamed to another server for stand-by replication

Reduce WAL traffic so only modified values are written rather than entire rows

Allow WAL logging to be turned off for a table, but the table might be dropped or truncated during crash recovery

Allow tables to bypass WAL writes and just fsync() dirty pages on commit. This should be implemented using ALTER TABLE, e.g. ALTER TABLE PERSISTENCE [ DROP | TRUNCATE | DEFAULT ]. Tables using non-default logging should not use referential integrity with default-logging tables. A table without dirty buffers during a crash could perhaps avoid the drop/truncate.

Allow WAL logging to be turned off for a table, but the table would avoid being truncated/dropped

To do this, only a single writer can modify the table, and writes must happen only on new pages so the new pages can be removed during crash recovery. Readers can continue accessing the table. Such tables probably cannot have indexes. One complexity is the handling of indexes on TOAST tables.

This would allow a single query to make use of multiple I/O channels simultaneously. One idea is to create a background reader that can pre-fetch sequential and index scan pages needed by other backends. This could be expanded to allow concurrent reads from multiple devices in a partitioned table.

Experiment with multi-threaded backend for better CPU utilization

This would allow several CPUs to be used for a single query, such as for sorting or query execution.

Miscellaneous Performance

This would remove the requirement for SYSV SHM but would introduce portability issues. Anonymous mmap (or mmap to /dev/zero) is required to prevent I/O overhead.

Consider mmap()'ing files into a backend?

Doing I/O to large tables would consume a lot of address space or require frequent mapping/unmapping. Extending the file also causes mapping problems that might require mapping only individual pages, leading to thousands of mappings. Another problem is that there is no way to _prevent_ I/O to disk from the dirty shared buffers so changes could hit disk before WAL is written.

Add a script to ask system configuration questions and tune postgresql.conf

Consider ways of storing rows more compactly on disk:

Reduce the row header size?

Consider reducing on-disk varlena length from four bytes to two because a heap row cannot be more than 64k in length

Exotic Features

This could allow SQL written for other databases to run without modification.

Allow plug-in modules to emulate features from other databases

Add features of Oracle-style packages

A package would be a schema with session-local variables, public/private functions, and initialization functions. It is also possible to implement these capabilities in any schema and not use a separate "packages" syntax at all.