Although indexes in PostgreSQL do not need maintenance and
tuning, it is still important to check which indexes are actually
used by the real-life query workload. Examining index usage for
an individual query is done with the EXPLAIN command; its application for this
purpose is illustrated in Section 13.1. It is
also possible to gather overall statistics about index usage in a
running server, as described in Section 23.2.

It is difficult to formulate a general procedure for
determining which indexes to set up. There are a number of
typical cases that have been shown in the examples throughout the
previous sections. A good deal of experimentation will be
necessary in most cases. The rest of this section gives some tips
for that.

Always run ANALYZE first. This command collects
statistics about the distribution of the values in the table.
This information is required to guess the number of rows
returned by a query, which is needed by the planner to assign
realistic costs to each possible query plan. In absence of
any real statistics, some default values are assumed, which
are almost certain to be inaccurate. Examining an
application's index usage without having run ANALYZE is therefore a lost cause.

Use real data for experimentation. Using test data for
setting up indexes will tell you what indexes you need for
the test data, but that is all.

It is especially fatal to use very small test data sets.
While selecting 1000 out of 100000 rows could be a candidate
for an index, selecting 1 out of 100 rows will hardly be,
because the 100 rows will probably fit within a single disk
page, and there is no plan that can beat sequentially
fetching 1 disk page.

Also be careful when making up test data, which is often
unavoidable when the application is not in production use
yet. Values that are very similar, completely random, or
inserted in sorted order will skew the statistics away from
the distribution that real data would have.

When indexes are not used, it can be useful for testing to
force their use. There are run-time parameters that can turn
off various plan types (described in Section 16.4). For instance,
turning off sequential scans (enable_seqscan) and nested-loop joins
(enable_nestloop), which are the
most basic plans, will force the system to use a different
plan. If the system still chooses a sequential scan or
nested-loop join then there is probably a more fundamental
problem for why the index is not used, for example, the query
condition does not match the index. (What kind of query can
use what kind of index is explained in the previous
sections.)

If forcing index usage does use the index, then there are
two possibilities: Either the system is right and using the
index is indeed not appropriate, or the cost estimates of the
query plans are not reflecting reality. So you should time
your query with and without indexes. The EXPLAIN ANALYZE command can be useful
here.

If it turns out that the cost estimates are wrong, there
are, again, two possibilities. The total cost is computed
from the per-row costs of each plan node times the
selectivity estimate of the plan node. The costs of the plan
nodes can be tuned with run-time parameters (described in
Section 16.4). An
inaccurate selectivity estimate is due to insufficient
statistics. It may be possible to help this by tuning the
statistics-gathering parameters (see ALTER TABLE).

If you do not succeed in adjusting the costs to be more
appropriate, then you may have to resort to forcing index
usage explicitly. You may also want to contact the
PostgreSQL developers to
examine the issue.