New versions of Pachyderm often require a migration for some or all of the
on disk objects which persist Pachyderm’s metadata for commits, jobs, etc.
This document describes how Pachyderm migration works and the best
practices surrounding it.

As of 1.7, Pachyderm’s migration works by extracting objects into a stream of
API requests, and replaying those requests onto the newer version of pachd.
This process happens automatically using Kubernetes’ “rolling update”
functionality. All you need to do is upgrade Pachyderm (with pachctldeploy) as further described here.
Generally, you will need to:

Have version 1.6.10 or later of Pachyderm up and running in Kubernetes.

(Optional, but recommended) Create a backup of your cluster state with
pachctlextract (see below).

Run pachctldeploy... with whatever arguments you used to deploy Pachyderm
previously.

While the migration is running, you will see 2 pachd pods running, the one that was
already running and the new one. The original pachd pod (deployed with the previous version of Pachyderm) will
still respond to requests. However, write operations will race with the
migration and may not make it to the new cluster. Thus, you should make sure
that all external processes that write data to repos (i.e., calls to put-file) or create new
pipelines are turned down before migration begins. You don’t need to worry
about pipelines running during the migration process.

It is highly recommended that you backup your cluster before you perform
a migration. This is accomplished with the pachctlextract command. Running
this command will generate a stream of API requests, similar to the stream used
by migration above. This stream can then be used to reconstruct your cluster by
running pachctlrestore. See the docs for pachctlextract and
pachctlrestore for
further usage.

1.7 is the first Pachyderm version to support extract and restore which are
necessary for migration. To bridge the gap to previous Pachyderm versions,
we’ve made a final 1.6 release, 1.6.10, which backports the extract and
restore functionality to the 1.6 series of releases. 1.6.10 requires no
migration from other 1.6.x versions. You can simply pachctlundeploy and then pachctldeploy after upgrading pachctl to version 1.6.10. After 1.6.10 is deployed you
should make a backup using pachctlextract and then upgrade pachctl again,
to 1.7.0. Finally you can pachctldeploy... with pachctl 1.7.0 to trigger
the migration.