PostgreSQL's statistics collector is a subsystem that supports
collection and reporting of information about server activity.
Presently, the collector can count accesses to tables and indexes
in both disk-block and individual-row terms. It also supports
determining the exact command currently being executed by other
server processes.

Since collection of statistics adds some overhead to query
execution, the system can be configured to collect or not
collect information. This is controlled by configuration
parameters that are normally set in postgresql.conf. (See Chapter 17 for details about setting
configuration parameters.)

The parameter stats_start_collector
must be set to true for the statistics
collector to be launched at all. This is the default and
recommended setting, but it may be turned off if you have no
interest in statistics and want to squeeze out every last drop
of overhead. (The savings is likely to be small, however.) Note
that this option cannot be changed while the server is
running.

The parameters stats_command_string,
stats_block_level,
and stats_row_level
control how much information is actually sent to the collector
and thus determine how much run-time overhead occurs. These
respectively determine whether a server process sends its
current command string, disk-block-level access statistics, and
row-level access statistics to the collector. Normally these
parameters are set in postgresql.conf
so that they apply to all server processes, but it is possible
to turn them on or off in individual sessions using the
SET command. (To prevent
ordinary users from hiding their activity from the
administrator, only superusers are allowed to change these
parameters with SET.)

Note: Since the parameters stats_command_string, stats_block_level, and stats_row_level default to false, very few statistics are collected in
the default configuration. Enabling one or more of these
configuration variables will significantly enhance the
amount of useful data produced by the statistics collector,
at the expense of additional run-time overhead.

Several predefined views, listed in Table
24-1, are available to show the results of statistics
collection. Alternatively, one can build custom views using the
underlying statistics functions.

When using the statistics to monitor current activity, it is
important to realize that the information does not update
instantaneously. Each individual server process transmits new
block and row access counts to the collector just before going
idle; so a query or transaction still in progress does not
affect the displayed totals. Also, the collector itself emits a
new report at most once per PGSTAT_STAT_INTERVAL milliseconds (500 unless
altered while building the server). So the displayed
information lags behind actual activity. Current-query
information is reported to the collector immediately, but is
still subject to the PGSTAT_STAT_INTERVAL delay before it becomes
visible.

Another important point is that when a server process is
asked to display any of these statistics, it first fetches the
most recent report emitted by the collector process and then
continues to use this snapshot for all statistical views and
functions until the end of its current transaction. So the
statistics will appear not to change as long as you continue
the current transaction. This is a feature, not a bug, because
it allows you to perform several queries on the statistics and
correlate the results without worrying that the numbers are
changing underneath you. But if you want to see new results
with each query, be sure to do the queries outside any
transaction block.

Table 24-1. Standard Statistics Views

View Name

Description

pg_stat_activity

One row per server process, showing database OID,
database name, process ID, user OID, user name, current
query, time at which the current query began execution,
time at which the process was started, and client's
address and port number. The columns that report data
on the current query are only available if the
parameter stats_command_string
has been turned on. Furthermore, these columns read as
null unless the user examining the view is a superuser
or the same as the user owning the process being
reported on. (Note that because of the collector's
reporting delay, the current query will only be
up-to-date for long-running queries.)

pg_stat_database

One row per database, showing database OID,
database name, number of active server processes
connected to that database, number of transactions
committed and rolled back in that database, total disk
blocks read, and total buffer hits (i.e., block read
requests avoided by finding the block already in buffer
cache).

pg_stat_all_tables

For each table in the current database (including
TOAST tables), the table OID, schema and table name,
number of sequential scans initiated, number of live
rows fetched by sequential scans, number of index scans
initiated (over all indexes belonging to the table),
number of live rows fetched by index scans, and numbers
of row insertions, updates, and deletions.

pg_stat_sys_tables

Same as pg_stat_all_tables, except that only
system tables are shown.

pg_stat_user_tables

Same as pg_stat_all_tables, except that only
user tables are shown.

pg_stat_all_indexes

For each index in the current database, the table
and index OID, schema, table and index name, number of
index scans initiated on that index, number of index
entries returned by index scans, and number of live
table rows fetched by simple index scans using that
index.

pg_stat_sys_indexes

Same as pg_stat_all_indexes, except that only
indexes on system tables are shown.

pg_stat_user_indexes

Same as pg_stat_all_indexes, except that only
indexes on user tables are shown.

pg_statio_all_tables

For each table in the current database (including
TOAST tables), the table OID, schema and table name,
number of disk blocks read from that table, number of
buffer hits, numbers of disk blocks read and buffer
hits in all indexes of that table, numbers of disk
blocks read and buffer hits from that table's auxiliary
TOAST table (if any), and numbers of disk blocks read
and buffer hits for the TOAST table's index.

pg_statio_sys_tables

Same as pg_statio_all_tables, except that
only system tables are shown.

pg_statio_user_tables

Same as pg_statio_all_tables, except that
only user tables are shown.

pg_statio_all_indexes

For each index in the current database, the table
and index OID, schema, table and index name, numbers of
disk blocks read and buffer hits in that index.

pg_statio_sys_indexes

Same as pg_statio_all_indexes, except that
only indexes on system tables are shown.

pg_statio_user_indexes

Same as pg_statio_all_indexes, except that
only indexes on user tables are shown.

pg_statio_all_sequences

For each sequence object in the current database,
the sequence OID, schema and sequence name, numbers of
disk blocks read and buffer hits in that sequence.

pg_statio_sys_sequences

Same as pg_statio_all_sequences, except that
only system sequences are shown. (Presently, no system
sequences are defined, so this view is always
empty.)

pg_statio_user_sequences

Same as pg_statio_all_sequences, except that
only user sequences are shown.

The per-index statistics are particularly useful to
determine which indexes are being used and how effective they
are.

Beginning in PostgreSQL
8.1, indexes can be used either directly or via "bitmap scans". In a bitmap scan the output of
several indexes can be combined via AND or OR rules; so it is
difficult to associate individual heap row fetches with
specific indexes when a bitmap scan is used. Therefore, a
bitmap scan increments the pg_stat_all_indexes.idx_tup_read count(s) for the index(es) it
uses, and it increments the pg_stat_all_tables.idx_tup_fetch count for the table, but it
does not affect pg_stat_all_indexes.idx_tup_fetch.

Note: Before PostgreSQL 8.1, the idx_tup_read and idx_tup_fetch counts were essentially
always equal. Now they can be different even without
considering bitmap scans, because idx_tup_read counts index entries
retrieved from the index while idx_tup_fetch counts live rows fetched
from the table; the latter will be less if any dead or
not-yet-committed rows are fetched using the index.

The pg_statio_ views are
primarily useful to determine the effectiveness of the buffer
cache. When the number of actual disk reads is much smaller
than the number of buffer hits, then the cache is satisfying
most read requests without invoking a kernel call. However,
these statistics do not give the entire story: due to the way
in which PostgreSQL handles
disk I/O, data that is not in the PostgreSQL buffer cache may still reside
in the kernel's I/O cache, and may therefore still be fetched
without requiring a physical read. Users interested in
obtaining more detailed information on PostgreSQL I/O behavior are advised to use
the PostgreSQL statistics
collector in combination with operating system utilities that
allow insight into the kernel's handling of I/O.

Other ways of looking at the statistics can be set up by
writing queries that use the same underlying statistics access
functions as these standard views do. These functions are
listed in Table
24-2. The per-database access functions take a database OID
as argument to identify which database to report on. The
per-table and per-index functions take a table or index OID.
(Note that only tables and indexes in the current database can
be seen with these functions.) The per-server-process access
functions take a server process number, which ranges from one
to the number of currently active server processes.

Table 24-2. Statistics Access Functions

Function

Return Type

Description

pg_stat_get_db_numbackends(oid)

integer

Number of active server processes for database

pg_stat_get_db_xact_commit(oid)

bigint

Transactions committed in database

pg_stat_get_db_xact_rollback(oid)

bigint

Transactions rolled back in database

pg_stat_get_db_blocks_fetched(oid)

bigint

Number of disk block fetch requests for
database

pg_stat_get_db_blocks_hit(oid)

bigint

Number of disk block fetch requests found in cache
for database

pg_stat_get_numscans(oid)

bigint

Number of sequential scans done when argument is a
table, or number of index scans done when argument is
an index

pg_stat_get_tuples_returned(oid)

bigint

Number of rows read by sequential scans when
argument is a table, or number of index entries
returned when argument is an index

pg_stat_get_tuples_fetched(oid)

bigint

Number of table rows fetched by bitmap scans when
argument is a table, or table rows fetched by simple
index scans using the index when argument is an
index

pg_stat_get_tuples_inserted(oid)

bigint

Number of rows inserted into table

pg_stat_get_tuples_updated(oid)

bigint

Number of rows updated in table

pg_stat_get_tuples_deleted(oid)

bigint

Number of rows deleted from table

pg_stat_get_blocks_fetched(oid)

bigint

Number of disk block fetch requests for table or
index

pg_stat_get_blocks_hit(oid)

bigint

Number of disk block requests found in cache for
table or index

pg_stat_get_backend_idset()

setof integer

Set of currently active server process numbers
(from 1 to the number of active server processes). See
usage example in the text

pg_backend_pid()

integer

Process ID of the server process attached to the
current session

pg_stat_get_backend_pid(integer)

integer

Process ID of the given server process

pg_stat_get_backend_dbid(integer)

oid

Database ID of the given server process

pg_stat_get_backend_userid(integer)

oid

User ID of the given server process

pg_stat_get_backend_activity(integer)

text

Active command of the given server process (null if
the current user is not a superuser nor the same user
as that of the session being queried, or stats_command_string is not on)

pg_stat_get_backend_activity_start(integer)

timestamp with time zone

The time at which the given server process'
currently executing query was started (null if the
current user is not a superuser nor the same user as
that of the session being queried, or stats_command_string is not on)

pg_stat_get_backend_start(integer)

timestamp with time zone

The time at which the given server process was
started, or null if the current user is not a superuser
nor the same user as that of the session being
queried

pg_stat_get_backend_client_addr(integer)

inet

The IP address of the client connected to the given
server process. Null if the connection is over a Unix
domain socket. Also null if the current user is not a
superuser nor the same user as that of the session
being queried

pg_stat_get_backend_client_port(integer)

integer

The TCP port number of the client connected to the
given server process. -1 if the connection is over a
Unix domain socket. Null if the current user is not a
superuser nor the same user as that of the session
being queried

pg_stat_reset()

boolean

Reset all currently collected statistics

Note:blocks_fetched minus blocks_hit gives the number of kernel
read() calls issued for the
table, index, or database; but the actual number of
physical reads is usually lower due to kernel-level
buffering.

The function pg_stat_get_backend_idset provides a
convenient way to generate one row for each active server
process. For example, to show the PIDs and current queries of all server
processes:

SELECT pg_stat_get_backend_pid(s.backendid) AS procpid,
pg_stat_get_backend_activity(s.backendid) AS current_query
FROM (SELECT pg_stat_get_backend_idset() AS backendid) AS s;