Some commands tend to be significantly slower with than when invoked against HDFS or
other filesystems. This includes renaming files, listing files, find,
mv, cp, and rm.

Renaming Files

Unlike in a normal filesystem, renaming a directory in an object store usually takes
time at least as proportional to the number of the objects being manipulated. As many of the
filesystem shell operations use renaming as the final stage in operations, skipping that
stage can avoid long delays. Amazon S3's time to rename is proportional the amount of data
being renamed, so the larger the files being worked on, the longer it will take. This can
become a significant delay.

We recommend that when using the hadoop fs put and hadoop fs
copyFromLocal commands, you set the -doption for a direct upload. For
example:

In Amazon S3, the time to rename a file depends on its size. The time to rename a
directory depends on the number and size of all files beneath that directory. For WASB, GCS
and ADLS, the time to rename is proportionly simply to the number of files. If the a rename
operation is interrupted, the object store may in an undefined, with some of the source
files renamed, others still in their original paths. There may also be duplicate copies of
the data.

hadoop fs -mv s3a://bucket1/datasets s3a://bucket/historical

Copy

The hadoop fs -cp operation reads each file and then writes it back to the
object store; the time to complete depends on the amount of data to copy, and on the
bandwidth between the local computer and the object store.

As an example, this copy
command will perform the copy by downloading all the data and uploading it again.