Introducing Asynchronous Cross-Account Copy Blob

We are excited to introduce some changes to the Copy Blob API with 2012-02-12 version that allows you to copy blobs between storage accounts. This enables some interesting scenarios like:

Backup your blobs to another storage account without having to retrieve the content and saving it yourself

Migrate your blobs from one account to another efficiently with respect to cost and time

NOTE: To allow cross-account copy, the destination storage account needs to have been created on or after June 7th 2012. This limitation is only for cross-account copy, as accounts created prior can still copy within the same account. If the account is created before June 7th 2012, a copy blob operation across accounts will fail with HTTP Status code 400 (Bad Request) and the storage error code will be “CopyAcrossAccountsNotSupported.”

In this blog, we will go over some of the changes that were made along with some of the best practices to use this API. We will also show some sample code on using the new Copy Blob APIs with SDK 1.7.1 which is available on GitHub.

In versions prior to 2012-02-12, the source request header was specified as “/<account name>/<fully qualified blob name with container name and snapshot time if applicable >”. With 2012-02-12 version, we now require x-ms-copy-source to be specified as a URL. This is a versioned change, as specifying the old format with this new version will now fail with 400 (Bad Request). The new format allows users to specify a shared access signature or use a custom storage domain name. When specifying a source blob from a different account than the destination, the source blob must either be

A publicly accessible blob (i.e. the container ACL is set to be public)

A private blob, only if the source URL is pre-authenticated with a Shared Access Signature (i.e. pre-signed URL), allowing read permissions on the source blob

A copy operation preserves the type of the blob: a block blob will be copied as a block blob and a page blob will be copied to the destination as a page blob. If the destination blob already exists, it will be overwritten. However, if the destination type (for an existing blob) does not match the source type, the operation fails with HTTP status code 400 (Bad Request).

Note: The source blob could even be a blob outside of Windows Azure, as long as it is publicly accessible or accessible via some form of a Signed URL. For source blobs outside of Windows Azure, they will be copied to block blobs.

Making copy asynchronous is a major change that greatly differs from previous versions. Previously, the Blob service returns a successful response back to the user only when the copy operation has completed. With version 2012-02-12, the Blob service will instead schedule the copy operation to be completed asynchronously: a success response only indicates that the copy operation has been successfully scheduled. As a consequence, a successful response from Copy Blob will now return HTTP status code 202 (Accepted) instead of 201 (Created).

A few important points:

There can be only one pending copy operation to a given destination blob name URL at time. But a source blob can be a source for many outstanding copies at once.

The asynchronous copy blob runs in the background using spare bandwidth capacity, so there is no SLA in terms of how fast a blob will be copied.

Currently there is no limit on the number of pending copy blobs that can be queued up for a storage account, but a pending copy blob operation can live in the system for at most 2 weeks. If longer than that, then the copy blob operation will be terminated.

If the source storage account is in a different location from the destination storage account, then the source storage account will be charged egress for the copy using the bandwidth rates as shown here.

When a copy is pending, any attempt to modify, snapshot, or lease the destination blob will fail.

Below we break down the key concepts of the new Copy Blob API.

Copy Blob Scheduling: when the Blob service receives a Copy Blob request, it will first ensure that the source exists and it can be accessed. If source does not exist or cannot be accessed, an HTTP status code 400 (Bad Request) is returned. If any source access conditions are provided, they will be validated too. If conditions do not match, then an HTTP status code 412 (Precondition Failed) error is returned. Once the source is validated, the service then validates any conditions provided for the destination blob (if it exists). If condition checks fail on destination blob, an HTTP status code 412 (Precondition Failed) is returned. If there is already a pending copy operation, then the service returns an HTTP status code 409 (Conflict). Once the validations are completed, the service then initializes the destination blob before scheduling the copy and then returns a success response to the user. If the source is a page blob, the service will create a page blob with the same length as the source blob but all the bytes are zeroed out. If the source blob is a block blob, the service will commit a zero length block blob for the pending copy blob operation. The service maintains a few copy specific properties during the copy operation to allow clients to poll the status and progress of their copy operations.

Copy Blob Response: when a copy blob operation returns success to the client, this indicates the Blob service has successfully scheduled the copy operation to be completed. Two new response headers are introduced:

x-ms-copy-status: The status of the copy operation at the time the response was sent. It can be one of the following:

success : Copy operation has completed. This is analogous to the scenario in previous versions where the copy operation has completed synchronously.

pending: Copy operation is still pending and the user is expected to poll the status of the copy. (See “Polling for Copy Blob properties” below.)

x-ms-copy-id: The string token that is associated with the copy operation. This can be used when polling the copy status, or if the user wishes to abort a “pending” copy operation.

Polling for Copy Blob properties: we now provide the following additional properties that allow users to track the progress of the copy, using Get Blob Properties, Get Blob, or List Blobs:

x-ms-copy-status (or CopyStatus): The current status of the copy operation. It can be one of the following:

pending: Copy operation is pending.

success: Copy operation completed successfully.

aborted: Copy operation was aborted by a client.

failed: Copy operation failed to complete due to an error.

x-ms-copy-id (CopyId): The id returned by the copy operation which can be used to monitor the progress or abort a copy.

x-ms-copy-status-description (CopyStatusDescription): Additional error information that can be used for diagnostics.

x-ms-copy-progress (CopyProgress): The amount of the blob copied so far. This has the format X/Y where X=number of bytes copied and Y is the total number of bytes.

x-ms-copy-completion-time (CopyCompletionTime): The completion time of the last copy.

Copy Blob operations are retried on any intermittent failures such as network failures, server busy etc. but any failures are recorded in x-ms-copy-status-description which would let users know why the copy is still pending.

When the copy operation is pending, any writes to the destination blob is disallowed and the write operation will fail with HTTP status code 409 (Conflict). One would need to abort the copy before writing to the destination.

Data integrity during asynchronous copy: The Blob service will lock onto a version of the source blob by storing the source blob ETag at the time of copy. This is done to ensure that any source blob changes can be detected during the course of the copy operation. If the source blob changes during the copy, the ETag will no longer match its value at the start of the copy, causing the copy operation to fail.

Aborting the Copy Blob operation: To allow canceling a pending copy, we have introduced the Abort Copy Blob operation in the 2012-02-12 version of REST API. The Abort operation takes the copy-id returned by the Copy operation and will cancel the operation if it is in the “pending” state. An HTTP status code 409 (Conflict) is returned if the state is not pending or the copy-id does not match the pending copy. The blob’s metadata is retained but the content is zeroed out on a successful abort.

Example: Monitoring code without error handling for brevity. NOTE: This sample assumes that no one else would start a different copy operation on the same destination blob. If such assumption is not valid for your scenario, please see “How do I prevent someone else from starting a new copy operation to overwrite my successful copy?” below.

In an asynchronous copy, once authorization is verified on source, the service locks to that version of the source by using the ETag value. If the source blob is modified when the copy operation is pending, the service will fail the copy operation with HTTP status code 412 (Precondition Failed). To ensure that source blob is not modified, the client can acquire and maintain a lease on the source blob. (See the Lease Blob REST API.)

With 2012-02-12 version, we have introduced the concept of lock (i.e. infinite lease) which makes it easy for a client to hold on to the lease. A good option is for the copy job to acquire an infinite lease on the source blob before issuing the copy operation. The monitor job can then break the lease when the copy completes.

During a pending copy, the blob service ensures that no client requests can write to the destination blob. The copy blob properties are maintained on the blob after a copy is completed (failed/aborted/successful). However, these copy properties are removed when any write command like Put Blob, Put Block List, Set Blob Metadata or Set Blob Properties are issued on the destination blob. The following operations will however retain the copy properties: Lease Blob, Put Page, and Put Block. Hence, a monitoring component which may require providing confirmation that a copy is completed will need these properties to be retained until it verifies the copy. To prevent any writes on destination blob once the copy is completed, the copy job should acquire an infinite lease on destination blob and provide that as destination access condition when starting the copy blob operation. The copy operation only allows infinite leases on the destination blob. This is because the service prevents any writes to the destination blob and any other granular lease would require client to issue Renew Lease on the destination blob. Acquiring a lease on destination blob requires the blob to exist and hence client would need to create an empty blob before the copy operation is issued. To terminate an infinite lease on a destination blob with pending copy operation, you would have to abort the copy operation before issuing the break request on the lease.

At this point we have released the source so that developers can compile as part of their project and start using it. It is not part of SDK 1.7 release but we will have it as part of our next SDK release(but we do not have an ETA yet).

One of the issues with the copy blob is the destination blob cannot use a SAS connection string.

This means all my code can work with SAS, except if I want to copy a blob I have to connect to the raw account. Can we please not fix this? I mean if the SAS connection string is the same on the source and destination blob account, what is the issue? Even if they are not the same, with the new above features, as long as we have write permission why can we not do a copy blob?

Are you planning similar feature for copying tables? (or maybe there is something already).

I'm looking for ability to create backups of my tables in table storage, so that I can protect my data against accidental damage (best would be scheduled & incremental transfers to another storage account or blob).

What happens to the destination during the process of copying? Does the copy complete atomically, so the reader will see a changed destination only all changes are complete? Or can a reader observe partial changes? Similarly, when a copy operation fails, does it fail without making any observable changes? And you mention the possibility of multiple copies or multiple writers to the same destination. Could you explain what happens? Does the copy lock the destination, or is there a possibility of lost writes in some orders?

@TanjB – when copy begins, the destination cannot be written to and only one copy operation is allowed for the destination at a time. When a copy begins, the destination is overwritten.

Page blobs – system will issue a PutBlob request with length = length of source blob and the copy does not complete atomically.

For block blob, the system commits a 0 length blob. It will then proceed to copy blocks and the blob is finally committed when the entire blob is copied. The best way to tell if the blob copy is completed and hence can be read is to poll for copy completion (rely on x-ms-copy-status and x-ms-copy-id response headers). Does this help?

@Arthur, since we list blobs, the response from server contains the state and we need not invoke FetchAttributes again. FetchAttribute is required only if you have a reference using GetBlockBlobReference or GetPageBlobReference (since it does not issue any request to server) and hence the state is not updated.

Is this released yet? I see method, BeginCopyFromBlob, but it doesn't take UR as parameter, instead the source blob reference, call back method and state. The method is not functioning as expected in the blog.

@Raj, Azure Storage Client Library 1.7.0 did not have this support. It was added in 1.7.1, which was a source-only release on GitHub at the time. However, we have recently released 3.2.0, which introduced many enhancements including new features and performance improvements. We strongly recommend upgrading to the latest version, which can always be found on NuGet: http://www.nuget.org/…/WindowsAzure.Storage-Preview

I have multiple Azure accounts. I want to be able to do asynchronous copy from one to all the other accounts whenever something is updated on any of them. Would async copy work perfectly for me? The examples shown above, are they also supported in the Java API?