May 22, 2016

I am probably ten years too late in writing about the oldest AWS service but the power of AWS SDK asynchronous file transfer is irresistible. Moving a 300MB file takes just a few seconds. It is very easy to plug this API into reactive backend services. For this exercise we assume that one service uploads a file to S3 and another service periodically checks for new files in that location.

Fundamentally, asynchronous operations require an instance of the TransferManager class and a callback to process status notifications. I wrapped the whole process into a few classes representing abstractions for uploading to, downloading from, and detecting newly uploaded files in a pre-configured location in some S3 bucket location.

The TransferManager API typically takes a Request object and an asynchronous status listener. It returns a Transfer instance that can be used to retrieve error message in case of failure. Polling an S3 location for available files requires a loop because the results are returned in batches. Checking if a file exists at a given S3 path is implemented as an attempt to fetch the corresponding file metadata and treating a thrown exception as "FileNotFound".

S3 is a simple (duh!) service so there are only two additional notes. First, it is a good idea to encode some metadata into file names on S3. Things such as tenant id or version or video resolution. It helps with deciding how to handle a downloaded file by parsing its name. Second, it's convenient to superimpose a "directory structure" onto the flat namespace of the S3 bucket abstraction.

So a reasonable file naming convention might include a three-part prefix appended to all relative file paths: a backend service name, a file schema version, and a namespace representing either an environment (e.g. PROD) or a developer (in development deployments). The version part in particular makes upgrades much easier in production. For example,
"$BUCKET/prod/somesvc/v2/relative_path/file.ext".

The digram below shows a typical sequence of operations for uploading a file and then finding it with S3 polling from a different service.