Redistributing Tables

Redistributing Tables

After creating an expansion schema, you can bring Greenplum Database back online and
redistribute tables across the entire array with gpexpand. Target low-use
hours when the utility's CPU usage and table locks have minimal impact on operations. Rank
tables to redistribute the largest or most critical tables in preferential order.

Note: When redistributing data, Greenplum Database must be running in production mode. Greenplum
Database cannot be restricted mode or in master mode. The gpstart options
-R or -m cannot be specified to start Greenplum
Database.

While table redistribution is underway any new tables or partitions created are distributed
across all segments exactly as they would be under normal operating conditions. Queries can
access all segments, even before the relevant data is redistributed to tables on the new
segments. The table or partition being redistributed is locked and unavailable for read or
write operations. When its redistribution completes, normal operations resume.

Ranking Tables for Redistribution

For large systems, control the table redistribution order. Adjust
tables' rank values in the expansion schema to prioritize heavily-used
tables and minimize performance impact. Available free disk space can affect table ranking;
see Managing Redistribution in Large-Scale Greenplum Systems.

To rank tables for redistribution by updating rank values in
gpexpand.status_detail, connect to Greenplum Database using psql
or another supported client. Update gpexpand.status_detail with commands such as:

These commands lower the priority of all tables to 10 and then assign a
rank of 1 to lineitem and a rank of 2 to
orders. When table redistribution begins, lineitem is
redistributed first, followed by orders and all other tables in
gpexpand.status_detail. To exclude a table from redistribution, remove the table
from gpexpand.status_detail.

Redistributing Tables Using gpexpand

To redistribute tables with gpexpand

Log in on the master host as the user who will run your Greenplum
Database system, for example, gpadmin.

Run the gpexpand utility. You can use the
-d or -e option to define the expansion session time
period. For example, to run the utility for up to 60 consecutive
hours:

$ gpexpand -d 60:00:00

The utility redistributes tables
until the last table in the schema completes or it reaches the specified duration or
end time. gpexpand updates the status and time in
gpexpand.status when a session starts and finishes.

Monitoring Table Redistribution

You can query the expansion schema during the table redistribution process. The view
gpexpand.expansion_progress provides a current progress summary, including the
estimated rate of table redistribution and estimated time to completion. You can query the
table gpexpand.status_detail for per-table status information.

Viewing Expansion Status

After the first table completes redistribution, gpexpand.expansion_progress
calculates its estimates and refreshes them based on all tables' redistribution rates.
Calculations restart each time you start a table redistribution session with
gpexpand. To monitor progress, connect to Greenplum Database using
psql or another supported client; query
gpexpand.expansion_progress with a command like the following:

Viewing Table Status

The table gpexpand.status_detail stores status, time of last update, and more
facts about each table in the schema. To see a table's status, connect to Greenplum
Database using psql or another supported client and query
gpexpand.status_detail: