Partitioning Large Tables

A newer version of this documentation is available. Use the version menu above to view the most up-to-date release of the Greenplum 5.x documentation.

Partitioning Large Tables

Table partitioning enables supporting very large tables, such as fact tables, by logically
dividing them into smaller, more manageable pieces. Partitioned tables can improve query
performance by allowing the Greenplum Database query optimizer to scan only the data needed to
satisfy a given query instead of scanning all the contents of a large table.

About Table Partitioning

Partitioning does not change the physical distribution of table data across the segments.
Table distribution is physical: Greenplum Database physically divides partitioned tables and
non-partitioned tables across segments to enable parallel query processing. Table
partitioning is logical: Greenplum Database logically divides big tables to improve
query performance and facilitate data warehouse maintenance tasks, such as rolling old data
out of the data warehouse.

Greenplum Database supports:

range partitioning: division of data based on a numerical range,
such as date or price.

list partitioning: division of data based on a list of values, such
as sales territory or product line.

A combination of both types.

Figure 1. Example Multi-level Partition Design

Table Partitioning in Greenplum Database

Greenplum Database divides tables into parts (also known as partitions) to enable massively
parallel processing. Tables are partitioned during CREATE TABLE using the
PARTITION BY (and optionally the SUBPARTITION BY)
clause. Partitioning creates a top-level (or parent) table with one or more levels of
sub-tables (or child tables). Internally, Greenplum Database creates an inheritance
relationship between the top-level table and its underlying partitions, similar to the
functionality of the INHERITS clause of PostgreSQL.

Greenplum uses the partition criteria defined during table creation to create each
partition with a distinct CHECK constraint, which limits the data that
table can contain. The query optimizer uses CHECK constraints to determine
which table partitions to scan to satisfy a given query predicate.

The Greenplum system catalog stores partition hierarchy information so that rows inserted
into the top-level parent table propagate correctly to the child table partitions. To change
the partition design or table structure, alter the parent table using ALTER
TABLE with the PARTITION clause.

To insert data into a partitioned table, you specify the root partitioned table, the table
created with the CREATE TABLE command. You also can specify a leaf child
table of the partitioned table in an INSERT command. An error is returned
if the data is not valid for the specified leaf child table. Specifying a non-leaf or a
non-root partition table in the DML command is not supported.

Deciding on a Table Partitioning Strategy

Not all tables are good candidates for partitioning. If the answer is yes to all or
most of the following questions, table partitioning is a viable database design strategy for
improving query performance. If the answer is no to most of the following questions,
table partitioning is not the right solution for that table. Test your design strategy to
ensure that query performance improves as expected.

Is the table large enough? Large fact tables are good candidates
for table partitioning. If you have millions or billions of records in a table, you may
see performance benefits from logically breaking that data up into smaller chunks. For
smaller tables with only a few thousand rows or less, the administrative overhead of
maintaining the partitions will outweigh any performance benefits you might see.

Are you experiencing unsatisfactory performance? As with any
performance tuning initiative, a table should be partitioned only if queries against that
table are producing slower response times than desired.

Do your query predicates have identifiable access patterns? Examine
the WHERE clauses of your query workload and look for table columns that
are consistently used to access data. For example, if most of your queries tend to look up
records by date, then a monthly or weekly date-partitioning design might be beneficial. Or
if you tend to access records by region, consider a list-partitioning design to divide the
table by region.

Does your data warehouse maintain a window of historical data?
Another consideration for partition design is your organization's business requirements
for maintaining historical data. For example, your data warehouse may require that you
keep data for the past twelve months. If the data is partitioned by month, you can easily
drop the oldest monthly partition from the warehouse and load current data into the most
recent monthly partition.

Can the data be divided into somewhat equal parts based on some
defining criteria? Choose partitioning criteria that will divide your data as evenly
as possible. If the partitions contain a relatively equal number of records, query
performance improves based on the number of partitions created. For example, by dividing a
large table into 10 partitions, a query will execute 10 times faster than it would against
the unpartitioned table, provided that the partitions are designed to support the query's
criteria.

Do not create more partitions than are needed. Creating too many partitions can slow down
management and maintenance jobs, such as vacuuming, recovering segments, expanding the
cluster, checking disk usage, and others.

Partitioning does not improve query performance unless the query optimizer can eliminate
partitions based on the query predicates. Queries that scan every partition run slower than
if the table were not partitioned, so avoid partitioning if few of your queries achieve
partition elimination. Check the explain plan for queries to make sure that partitions are
eliminated. See Query Profiling for more about
partition elimination.

Warning: Be very careful with multi-level partitioning because the number of
partition files can grow very quickly. For example, if a table is partitioned by both day
and city, and there are 1,000 days of data and 1,000 cities, the total number of partitions
is one million. Column-oriented tables store each column in a physical table, so if this
table has 100 columns, the system would be required to manage 100 million files for the
table.

Before settling on a multi-level partitioning strategy, consider a single level partition
with bitmap indexes. Indexes slow down data loads, so performance testing with your data and
schema is recommended to decide on the best strategy.

Creating Partitioned Tables

You partition tables when you create them with CREATE TABLE. This topic
provides examples of SQL syntax for creating a table with various partition designs.

To partition a table:

Decide on the partition design: date range, numeric range, or list of
values.

Choose the column(s) on which to partition the table.

Decide how many levels of partitions you want. For example, you can create
a date range partition table by month and then subpartition the monthly partitions by
sales region.

Defining Date Range Table Partitions

A date range partitioned table uses a single date or
timestamp column as the partition key column. You can use the same
partition key column to create subpartitions if necessary, for example, to partition by
month and then subpartition by day. Consider partitioning by the most granular level. For
example, for a table partitioned by date, you can partition by day and have 365 daily
partitions, rather than partition by year then subpartition by month then subpartition by
day. A multi-level design can reduce query planning time, but a flat partition design runs
faster.

You can have Greenplum Database automatically generate partitions by giving a
START value, an END value, and an
EVERY clause that defines the partition increment value. By default,
START values are always inclusive and END values are
always exclusive. For example:

Defining List Table Partitions

A list partitioned table can use any data type column that allows equality comparisons as
its partition key column. A list partition can also have a multi-column (composite)
partition key, whereas a range partition only allows a single column as the partition key.
For list partitions, you must declare a partition specification for every partition (list
value) you want to create. For example:

Note: The current Greenplum Database legacy optimizer allows list partitions with
multi-column (composite) partition keys. A range partition only allows a single column as
the partition key. The Greenplum Query Optimizer does not support composite keys, so you
should not use composite partition keys.

Defining Multi-level Partitions

You can create a multi-level partition design with subpartitions of partitions. Using a
subpartition template ensures that every partition has the same subpartition
design, including partitions that you add later. For example, the following SQL creates
the two-level partition design shown in Figure 1:

The following example shows a three-level partition design where the
sales table is partitioned by year, then
month, then region. The SUBPARTITION
TEMPLATE clauses ensure that each yearly partition has the same subpartition
structure. The example declares a DEFAULT partition at each level of the
hierarchy.

When you create multi-level partitions on ranges, it is easy to create
a large number of subpartitions, some containing little or no data. This can add many
entries to the system tables, which increases the time and memory required to optimize and
execute queries. Increase the range interval or choose a different partitioning strategy
to reduce the number of subpartitions created.

Partitioning an Existing Table

Tables can be partitioned only at creation. If you have a table that you want to
partition, you must create a partitioned table, load the data from the original table into
the new table, drop the original table, and rename the partitioned table with the original
table's name. You must also re-grant any table permissions. For example:

Limitations of Partitioned Tables

For each partition level, a partitioned table can have a maximum of 32,767 partitions.

A primary key or unique constraint on a partitioned table must contain all the
partitioning columns. A unique index can omit the partitioning columns; however, it is
enforced only on the parts of the partitioned table, not on the partitioned table as a
whole.

GPORCA, the Greenplum next generation query optimizer, supports uniform multi-level
partitioned tables. If GPORCA is enabled (the default) and the multi-level partitioned
table is not uniform, Greenplum Database executes queries against the table with the
legacy query optimizer. For information about uniform multi-level partitioned tables, see
About Uniform Multi-level Partitioned Tables.

COPY commands that attempt to copy from an external table
partition return an error unless you specify the IGNORE EXTERNAL
PARTITIONS clause with COPY command. If you specify the
clause, data is not copied from external table partitions.

To use the
COPY command against a partitioned table with a leaf child
table that is an external table, use an SQL query to copy the data. For example,
if the table my_sales contains a with a leaf child table that is
an external table, this command sends the data to
stdout:

COPY (SELECT * from my_sales ) TO stdout

VACUUM commands skip external table partitions.

The following operations are supported if no data is changed on the external table
partition. Otherwise, an error is returned.

Adding or dropping a column.

Changing the data type of column.

These ALTER PARTITION operations are not supported if the partitioned
table contains an external table partition:

Setting a subpartition template.

Altering the partition properties.

Creating a default partition.

Setting a distribution policy.

Setting or dropping a NOT NULL constraint of column.

Adding or dropping constraints.

Splitting an external partition.

The Greenplum Database utility gpcrondump does not back up data from
a leaf child partition of a partitioned table if the leaf child partition is a readable
external table.

Loading Partitioned Tables

After you create the partitioned table structure, top-level parent tables are empty. Data
is routed to the bottom-level child table partitions. In a multi-level partition design,
only the subpartitions at the bottom of the hierarchy can contain data.

Rows that cannot be mapped to a child table partition are rejected and the load fails. To
avoid unmapped rows being rejected at load time, define your partition hierarchy with a
DEFAULT partition. Any rows that do not match a partition's
CHECK constraints load into the DEFAULT partition. See
Adding a Default Partition.

At runtime, the query optimizer scans the entire table inheritance hierarchy and uses the
CHECK table constraints to determine which of the child table partitions
to scan to satisfy the query's conditions. The DEFAULT partition (if your
hierarchy has one) is always scanned. DEFAULT partitions that contain data
slow down the overall scan time.

When you use COPY or INSERT to load data into a parent
table, the data is automatically rerouted to the correct partition, just like a regular
table.

Best practice for loading data into partitioned tables is to create an intermediate staging
table, load it, and then exchange it into your partition design. See Exchanging a Partition.

Verifying Your Partition Strategy

When a table is partitioned based on the query predicate, you can use
EXPLAIN to verify that the query optimizer scans only the relevant data
to examine the query plan.

For example, suppose a sales table is date-range partitioned by month and
subpartitioned by region as shown in Figure 1. For the following query:

EXPLAIN SELECT * FROM sales WHERE date='01-07-12' AND
region='usa';

The query plan for this query should show a table scan of only the following tables:

Ensure that the query optimizer does not scan unnecessary partitions or subpartitions (for
example, scans of months or regions not specified in the query predicate), and that scans of
the top-level tables return 0-1 rows.

Troubleshooting Selective Partition Scanning

The following limitations can result in a query plan that shows a non-selective scan of
your partition hierarchy.

The query optimizer can selectively scan partitioned tables only when
the query contains a direct and simple restriction of the table using immutable
operators such as:

=, < , <= , >, >= , and <>

Selective scanning recognizes STABLE and
IMMUTABLE functions, but does not recognize VOLATILE
functions within a query. For example, WHERE clauses such as
date > CURRENT_DATE cause the query optimizer to selectively scan
partitioned tables, but time > TIMEOFDAY does not.

Viewing Your Partition Design

You can look up information about your partition design using the pg_partitions
view. For example, to see the partition design of the sales table:

pg_partition_templates - Shows the subpartitions created using a
subpartition template.

pg_partition_columns - Shows the partition key columns used in a
partition design.

For information about Greenplum Database system catalog tables and views, see the
Greenplum Database Reference Guide.

Maintaining Partitioned Tables

To maintain a partitioned table, use the ALTER TABLE command against the
top-level parent table. The most common scenario is to drop old partitions and add new ones
to maintain a rolling window of data in a range partition design. You can convert
(exchange) older partitions to the append-optimized compressed storage format to
save space. If you have a default partition in your partition design, you add a partition by
splitting the default partition.

Important: When defining and altering partition designs, use the given partition
name, not the table object name. Although you can query and load any table (including
partitioned tables) directly using SQL commands, you can only modify the structure of a
partitioned table using the ALTER TABLE...PARTITION clauses.

Partitions
are not required to have names. If a partition does not have a name, use one of the
following expressions to specify a part: PARTITION FOR (value) or
)PARTITION FOR(RANK(number).

Adding a Partition

You can add a partition to a partition design with the ALTER TABLE
command. If the original partition design included subpartitions defined by a
subpartition template, the newly added partition is subpartitioned according to
that template. For example:

If incoming data does not match a partition's CHECK constraint and there
is no default partition, the data is rejected. Default partitions ensure that incoming
data that does not match a partition is inserted into the default partition.

Dropping a Partition

You can drop a partition from your partition design using the ALTER
TABLE command. When you drop a partition that has subpartitions, the
subpartitions (and all data in them) are automatically dropped as well. For range
partitions, it is common to drop the older partitions from the range as old data is rolled
out of the data warehouse. For example:

ALTER TABLE sales DROP PARTITION FOR (RANK(1));

Truncating a Partition

You can truncate a partition using the ALTER TABLE command. When you
truncate a partition that has subpartitions, the subpartitions are automatically truncated
as well.

ALTER TABLE sales TRUNCATE PARTITION FOR (RANK(1));

Exchanging a Partition

You can exchange a partition using the ALTER TABLE command. Exchanging a
partition swaps one table in place of an existing partition. You can exchange partitions
only at the lowest level of your partition hierarchy (only partitions that contain data
can be exchanged).

Partition exchange can be useful for data loading. For example, load a staging table and
swap the loaded table into your partition design. You can use partition exchange to change
the storage type of older partitions to append-optimized tables. For example:

Note: This example refers to the single-level definition of the table
sales, before partitions were added and altered in the previous
examples.

Warning: If you specify the WITHOUT VALIDATION clause, you must
ensure that the data in table that you are exchanging for an existing partition is valid
against the constraints on the partition. Otherwise, queries against the partitioned table
might return incorrect results.

The Greenplum Database server configuration parameter
gp_enable_exchange_default_partition controls availability of the
EXCHANGE DEFAULT PARTITION clause. The default value for the parameter
is off, the clause is not available and Greenplum Database returns an
error if the clause is specified in an ALTER TABLE command.

For information about the parameter, see "Server Configuration Parameters" in the
Greenplum Database Reference Guide.

Warning: Before you exchange
the default partition, you must ensure the data in the table to be exchanged, the new
default partition, is valid for the default partition. For example, the data in the new
default partition must not contain data that would be valid in other leaf child
partitions of the partitioned table. Otherwise, queries against the partitioned table
with the exchanged default partition that are executed by GPORCA might return incorrect
results.

Splitting a Partition

Splitting a partition divides a partition into two partitions. You can split a partition
using the ALTER TABLE command. You can split partitions only at the
lowest level of your partition hierarchy (partitions that contain data). For a multi-level
partition, only range partitions can be split, not list partitions. The split value you
specify goes into the latter partition.

For example, to split a monthly partition into two with the first partition containing
dates January 1-15 and the second partition containing dates January 16-31:

Modifying a Subpartition Template

Use ALTER TABLE SET SUBPARTITION TEMPLATE to modify the subpartition
template of a partitioned table. Partitions added after you set a new subpartition
template have the new partition design. Existing partitions are not modified.

The following example alters the subpartition template of this partitioned
table:

When you add a date-range partition of the table sales, it includes the new regional list
subpartition for Africa. For example, the following command creates the subpartitions
usa, asia, europe,
africa, and a default partition named other:

To view the tables created for the partitioned table sales, you can use
the command \dt sales* from the psql command line.

To remove a subpartition template, use SET SUBPARTITION TEMPLATE with
empty parentheses. For example, to clear the sales table subpartition template:

ALTER TABLE sales SET SUBPARTITION TEMPLATE ();

Exchanging a Leaf Child Partition with an External Table

You can exchange a leaf child partition of a partitioned table with a readable external
table. The external table data can reside on a host file system, an NFS mount, or a Hadoop
file system (HDFS).

For example, if you have a partitioned table that is created with monthly partitions and
most of the queries against the table only access the newer data, you can copy the older,
less accessed data to external tables and exchange older partitions with the external
tables. For queries that only access the newer data, you could create queries that use
partition elimination to prevent scanning the older, unneeded partitions.

Exchanging a leaf child partition with an external table is not supported if the
partitioned table contains a column with a check constraint or a NOT NULL
constraint.

For information about exchanging and altering a leaf child partition, see the
ALTER TABLE command in the Greenplum Database Command
Reference.

There are four leaf child partitions for the partitioned table. Each leaf child
partition contains the data for a single year. The leaf child partition table
sales_1_prt_yr_1 contains the data for the year 2010. These steps
exchange the table sales_1_prt_yr_1 with an external table the uses the
gpfdist protocol:

Ensure that the external table protocol is enabled for the Greenplum Database
system.

This example uses the gpfdist protocol. This command
starts the gpfdist protocol.

$ gpfdist

Create a writable external table.

This CREATE WRITABLE EXTENAL
TABLE command creates a writable external table with the same columns as
the partitioned
table.

The
external table becomes the leaf child partition with the table name
sales_1_prt_yr_1 and the old leaf child partition becomes the
table sales_2010_ext.

Warning: In order to ensure
queries against the partitioned table return the correct results, the external table
data must be valid against the CHECK constraints on the leaf child
partition. In this case, the data was taken from the child leaf partition table on
which the CHECK constraints were defined.

Drop the table that was rolled out of the partitioned
table.

DROP TABLE sales_2010_ext ;

You can rename the name of the leaf child partition to indicate that
sales_1_prt_yr_1 is an external table.

This example command changes the partitionname to
yr_1_ext and the name of the child leaf partition table to
sales_1_prt_yr_1_ext.