Shadow Migration

Shadow Data Migration

A common task for administrators is to move data from one location
to another. In the most abstract sense, this problem encompasses a large number
of use cases, from replicating data between servers to keeping user data on
laptops in sync with servers. There are many external tools available to
do this, but the Sun Storage 7000 series of appliances has two integrated
solutions for migrating data that addresses the most common use cases. The
first, remote replication, is intended for replicating data between one or more appliances, and is
covered separately. The second, shadow migration, is described here.

Shadow migration is a process for migrating data from external NAS sources
with the intent of replacing or decommissioning the original once the migration is complete.
This is most often used when introducing a Sun Storage 7000 appliance
into an existing environment in order to take over file sharing duties of
another server, but a number of other novel uses are possible, outlined below.

Traditional Data Migration

Traditional file migration typically works in one of two ways: repeated synchronization
or external interposition.

Migration via synchronization

This method works by taking an active host X and migrating data
to the new host Y while X remains active. Clients still read
and write to the original host while this migration is underway. Once
the data is initially migrated, incremental changes are repeatedly sent until the delta
is small enough to be sent within a single downtime window. At
this point the original share is made read-only, the final delta is sent
to the new host, and all clients are updated to point to the
new location. The most common way of accomplishing this is through the
rsync tool, though other integrated tools exist. This mechanism has several drawbacks:

The anticipated downtime, while small, is not easily quantified. If a user commits a large amount of change immediately before the scheduled downtime, this can increase the downtime window.

During migration, the new server is idle. Since new servers typically come with new features or performance improvements, this represents a waste of resources during a potentially long migration period.

Coordinating across multiple filesystems is burdensome. When migrating dozens or hundreds of filesystems, each migration will take a different amount of time, and downtime will have to be scheduled across the union of all filesystems.

Migration via external interposition

This method works by taking an active host X and inserting a
new appliance M that migrates data to a new host Y. All
clients are updated at once to point to M, and data is automatically
migrated in the background. This provides more flexibility in migration options (for
example, being able to migrate to a new server in the future without
downtime), and leverages the new server for already migrated data, but also has
significant drawbacks:

The migration appliance represents a new physical machine, with associated costs (initial investment, support costs, power and cooling) and additional management overhead.

The migration appliance represents a new point of failure within the system.

The migration appliance interposes on already migrated data, incurring extra latency, often permanently. These appliances are typically left in place, though it would be possible to schedule another downtime window and decommission the migration appliance.

Shadow Migration

Shadow migration uses interposition, but is integrated into the appliance and doesn't require
a separate physical machine. When shares are created, they can optionally "shadow"
an existing directory, either locally or over NFS. In this scenario, downtime
is scheduled once, where the source appliance X is placed into read-only mode,
a share is created with the shadow property set, and clients are updated
to point to the new share on the Sun Storage 7000 appliance.
Clients can then access the appliance in read-write mode.

Once the shadow property is set, data is transparently migrated in the background
from the source appliance locally. If a request comes from a
client for a file that has not yet been migrated, the appliance will
automatically migrate this file to the local server before responding to the request.
This may incur some initial latency for some client requests, but once
a file has been migrated all accesses are local to the appliance and
have native performance. It is often the case that the current working
set for a filesystem is much smaller than the total size, so once
this working set has been migrated, regardless of the total native size on
the source, there will be no perceived impact on performance.

The downside to shadow migration is that it requires a commitment before the
data has finished migrating, though this is the case with any interposition method.
During the migration, portions of the data exists in two locations, which
means that backups are more complicated, and snapshots may be incomplete and/or exist
only on one host. Because of this, it is extremely important that
any migration between two hosts first be tested thoroughly to make sure that
identity management and access controls are setup correctly. This need not test
the entire data migration, but it should be verified that files or directories
that are not world readable are migrated correctly, ACLs (if any) are preserved,
and identities are properly represented on the new system.

Shadow migration is implemented using on-disk data within the filesystem, so there is
no external database and no data stored locally outside the storage pool. If
a pool is failed over in a cluster, or both system disks
fail and a new head node is required, all data necessary to continue
shadow migration without interruption will be kept with the storage pool.

Shadow migration behavior

Restrictions on shadow source

In order to properly migrate data, the source filesystem or directory *must be read-only*. Changes made to files source may or may not be propagated based on timing, and changes to the directory structure can result in unrecoverable errors on the appliance.

Shadow migration supports migration only from NFS sources. NFSv4 shares will yield the best results. NFSv2 and NFSv3 migration are possible, but ACLs will be lost in the process and files that are too large for NFSv2 cannot be migrated using that protocol. Migration from SMB sources is not supported.

Shadow migration of LUNs is not supported.

Shadow filesystem semantics during migration

If the client accesses a file or directory that has not yet
been migrated, there is an observable effect on behavior:

For directories, clients requests are blocked until the entire directory is migrated. For files, only the portion of the file being requested is migrated, and multiple clients can migrate different portions of a file at the same time.

Files and directories can be arbitrarily renamed, removed, or overwritten on the shadow filesystem without any effect on the migration process.

For files that are hard links, the hard link count may not match the source until the migration is complete.

The majority of file attributes are migrated when the directory is created, but the on-disk size (st_nblocks in the UNIX stat structure) is not available until a read or write operation is done on the file. The logical size will be correct, but a du(1) or other command will report a zero size until the file contents are actually migrated.

If the appliance is rebooted, the migration will pick up where it left off originally. While it will not have to re-migrate data, it may have to traverse some already-migrated portions of the local filesystem, so there may be some impact to the total migration time due to the interruption.

Data migration makes use of private extended attributes on files. These are generally not observable except on the root directory of the filesystem or through snapshots. Adding, modifying, or removing any extended attribute that begins with SUNWshadow will have undefined effects on the migration process and will result in incomplete or corrupt state. In addition, filesystem-wide state is stored in the .SUNWshadow directory at the root of the filesystem. Any modification to this content will have a similar affect.

Once a filesystem has completed migration, an alert will be posted, and the shadow attribute will be removed, along with any applicable metadata. After this point, the filesystem will be indistinguishable from a normal filesystem.

Data can be migrated across multiple filesystems into a singe filesystem, through the use of NFSv4 automatic client mounts (sometimes called "mirror mounts") or nested local mounts.

Identity and ACL migration

In order to properly migrate identity information for files, including ACLs, the
following rules must be observed:

The migration source and target appliance must have the same name service configuration.

The migration source and target appliance must have the same NFSv4 mapid domain

The migration source must support NFSv4. Use of NFSv3 is possible, but some loss of information will result. Basic identity information (owner and group) and POSIX permissions will be preserved, but any ACLs will be lost.

The migration source must be exported with root permissions to the appliance.

If you see files or directories owned by "nobody", it likely means
that the appliance does not have name services setup correctly, or that
the NFSv4 mapid domain is different. If you get 'permission denied'
errors while traversing filesystems that the client should otherwise have access to,
the most likely problem is failure to export the migration source with
root permissions.

Shadow Migration Management

Creating a shadow filesystem

The shadow migration source can only be set when a filesystem is
created. In the BUI, this is available in the filesystem creation
dialog. In the CLI, it is available as the shadow property.
The property takes one of the following forms:

Local - file:///<path>

NFS - nfs://<host>/<path>

The BUI also allows the alternate form <host>:/<path> for NFS mounts, which
matches the syntax used in UNIX systems. The BUI also sets
the protocol portion of the setting (file:// or nfs://) via the use
of a pull down menu. When creating a filesystem, the server
will verify that the path exists and can be mounted.

Managing background migration

When a share is created, it will automatically begin migrating in the
background, in addition to servicing inline requests. This migration is controlled
by the shadow migration service. There is a single global tunable which is
the number of threads dedicated to this task. Increasing the number
of threads will result in greater parallelism at the expense of additional
resources.

The shadow migration service can be disabled, but this should only be
used for testing purposes, or when the active of shadow migration is
overwhelming the system to the point where it needs to be temporarily
stopped. When the shadow migration service is disabled, synchronous requests are
still migrated as needed, but no background migration occurs. With the
service disabled no shadow migration will ever complete, even if all the
contents of the filesystem are read manually. It is highly recommended
to always leave the service enabled.

Handling errors

Because shadow migration requires committing new writes to the server prior to
migration being complete, it is very important to test migration and monitor
for any errors. Errors encountered during background migration are kept and
displayed in the BUI as part of shadow migration status. Errors
encountered during other synchronous migration are not tracked, but will be accounted
for once the background process accesses the affected file. For each
file, the remote filename as well as the specific error are kept.
Clicking on the information icon next to the error count will
bring up this detailed list. The error list is not updated
as errors are fixed, but simply cleared by virtue of the migration
completing successfully.

Shadow migration will not complete until all files are migrated successfully.
If there are errors, the background migration will continually retry the migration
until it succeeds. This allows the administrator to fix any errors
(such as permission problems), let the migration complete, and be assured of
success. If the migration cannot complete due to persistent errors, the
migration can be canceled, leaving the local filesystem with whatever data was
able to be migrated. This should only be used as a
last resort - once migration has been canceled, it cannot be resumed.

Monitoring progress

Monitoring progress of a shadow migration is difficult given the context in
which the operation runs. A single filesystem can shadow all or
part of a filesystem, or multiple filesystems with nested mountpoints. As
such, there is no way to request statistics about the source and
have any confidence in them being correct. In addition, even with
migration of a single filesystem, the methods used to calculate the available
size is not consistent across systems. For example, the remote filesystem
may use compression, or it may or not include metadata overhead.
For these reasons, it's impossible to display an accurate progress bar for
any particular migration.

The appliance provides the following information that is guaranteed to be
accurate:

Local size of the local filesystem so far

Logical size of the data copied so far

Time spent migrating data so far

These values are made available in the BUI and CLI through both
the standard filesystem properties as well as properties of the shadow migration
node (or UI panel). If you know the size of the
remote filesystem, you can use this to estimate progress. The size
of the data copied consists only of plain file contents that needed
to be migrated from the source. Directories, metadata, and extended attributes
are not included in this calculation. While the size of the
data migrated so far includes only remotely migrated data, resuming background migration
may traverse parts of the filesystem that have already been migrated.
This can cause it to run fairly quickly while processing these initial
directories, and slow down once it reaches portions of the filesystem that
have not yet been migrate.

While there is no accurate measurement of progress, the appliance does attempt
to make an estimation of remaining data based on the assumption of
a relatively uniform directory tree. This estimate can range from fairly
accurate to completely worthless depending on the dataset, and is for information
purposes only. For example, one could have a relatively shallow filesystem
tree but have large amounts of data in a single directory that
is visited last. In this scenario, the migration will appear almost
complete, and then rapidly drop to a very small percentage as this
new tree is discovered. Conversely, if that large directory was processed
first, then the estimate may assume that all other directories have a
similarly large amount of data, and when it finds them mostly empty
the estimate quickly rises from a small percentage to nearly complete.
The best way to measure progress is to setup a test migration,
let it run to completion, and use that value to estimate progress
for filesystem of similar layout and size.

Canceling migration

Migration can be canceled, but should only be done in extreme circumstances
when the source is no longer available. Once migration has been
canceled, it cannot be resumed. The primary purpose is to allow
migration to complete when there are uncorrectable errors on the source.
If the complete filesystem has finished migrated except for a few files
or directories, and there is no way to correct these errors (i.e.
the source is permanently broken), then canceling the migration will allow the
local filesystem to resume status as a 'normal' filesystem.

To cancel migration in the BUI, click the close icon next to
the progress bar in the left column of the share in question.
In the CLI, migrate to the shadow node beneath the filesystem
and run the cancel command.

Snapshots of shadow filesystems

Shadow filesystems can be snapshotted, however the state of what is included
in the snapshot is arbitrary. Files that have not yet been
migrated will not be present, and implementation details (such as SUNWshadow extended
attributes) may be visible in the snapshot. This snapshot can be
used to restore individual files that have been migrated or modified since
the original migration began. Because of this, it is recommended that
any snapshots be kept on the source until the migration is completed,
so that unmigrated files can still be retrieved from the source if
necessary. Depending on the retention policy, it may be necessary to
extend retention on the source in order to meet service requirements.

While snapshots can be taken, these snapshots cannot be rolled back to,
nor can they be the source of a clone. This reflects
the inconsistent state of the on-disk data during the migration.

Backing up shadow filesystems

Filesystems that are actively migrating shadow data can be backed using NDMP
as with any other filesystem. The shadow setting is preserved with
the backup stream, but will be restored only if a complete restore
of the filesystem is done and the share doesn't already exist.
Restoring individual files from such a backup stream or restoring into existing
filesystems may result in inconsistent state or data corruption. During the
full filesystem restore, the filesystem will be in an inconsistent state (beyond
the normal inconsistency of a partial restore) and shadow migration will not
be active. Only when the restore is completed is the shadow
setting restored. If the shadow source is no longer present or
has moved, the administrator can observe any errors and correct them as
necessary.

Replicating shadow filesystems

Filesystems that are actively migrating shadow data can be replicated using the
normal mechanism, but only the migrated data is sent in the data
stream. As such, the remote side contains only partial data that
may represent an inconsistent state. The shadow setting is sent along
with the replication stream, so when the remote target is failed over,
it will keep the same shadow setting. As with restoring an
NDMP backup stream, this setting may be incorrect in the context of
the remote target. After failing over the target, the administrator can
observe any errors and correct the shadow setting as necessary for the
new environment.

Shadow migration analytics

In addition to standard monitoring on a per-share basis, it's also possible
to monitor shadow migration system-wide through Analytics. The shadow migration analytics are
available under the "Data Movement" category. There are two basic statistics available:

Shadow migration requests

This statistic tracks requests for files or directories that are not cached
and known to be local to the filesystem. It does account
for both migrated and unmigrated files and directories, and can be used
to track the latency incurred as part of shadow migration, as well
as track the progress of background migration. It can be broken
down by file, share, project, or latency. It currently encompasses both
synchronous and asynchronous (background) migration, so it's not possible to view only
latency visible to clients.

Shadow migration bytes

This statistic tracks bytes transferred as part of migrating file or directory
contents. This does not apply to metadata (extended attributes, ACLs, etc).
It gives a rough approximation of the data transferred, but source datasets
with a large amount of metadata will show a disproportionally small bandwidth.
The complete bandwidth can be observed by looking at network analytics.
This statistic can be broken down by local filename, share, or
project.

Shadow migration operations

This statistic tracks operations that require going to the source filesystem.
This can be used to track the latency of requests from the
shadow migration source. It can be broken down by file, share,
project, or latency.

Migration of local filesystems

In addition to its primary purpose of migrating data from remote sources,
the same mechanism can also be used to migrate data from local filesystem
to another on the appliance. This can be used to change settings
that otherwise can't be modified, such as creating a compressed version of a
filesystem, or changing the recordsize for a filesystem after the fact. In
this model, the old share (or subdirectory within a share) is made read-only
or moved aside, and a new share is created with the shadow property
set using the file protocol. Clients access this new share, and data
is written using the settings of the new share.

Tasks

Testing potential shadow migration

Before attempting a complete migration, it is important to test the migration to
make sure that the appliance has appropriate permissions and security attributes are translated
correctly.

Configure the source so that the Sun Storage 7000 appliance has root access to the share. This typically involves adding an NFS host-based exception, or setting the anonymous user mapping (the latter having more significant security implications).

Create a share on the local filesystem with the shadow attribute set to 'nfs://<host>/<snapshotpath>' in the CLI or just '<host>/<snapshotpath>' in the BUI (with the protocol selected as 'NFS'). The snapshot should be read-only copy of the source. If no snapshots are available, a read-write source can be used, but may result in undefined errors.

Validate that file contents and identity mapping is correctly preserved by traversing the file structure.

If the data source is read-only (as with a snapshot), let the migration complete and verify that there were no errors in the transfer.

Migrating data from an active NFS server

Once you are confident that the basic setup is functional, the shares
can be setup for the final migration.

Schedule downtime during which clients can be quiesced and reconfigured to point to a new server.

Configure the source so that the Sun Storage 7000 appliance has root access to the share. This typically involves adding an NFS host-based exception, or setting the anonymous user mapping (the latter having more significant security implications).

Configure the source to be read-only. This step is technically optional, but it is much easier to guarantee compliance if it's impossible for misconfigured clients to write to the source while migration is in progress.

Create a share on the local filesystem with the shadow attribute set to 'nfs://<host>/<path>' in the CLI or just '<host>/<path>' in the BUI (with the protocol selected as 'NFS').

Reconfigure clients to point at the local share on the SS7000.

At this point shadow migration should be running in the background, and
client requests should be serviced as necessary. You can observe the
progress as described above. Multiple shares can be created during a
single scheduled downtime through scripting the CLI.