Working with Amazon S3 and FME

Question

What is Amazon S3?

Answer

FME and Cloud Storage Services

Users are shifting to storing their data in the cloud to leverage the benefits it brings (security, durability, performance, scalability), often at a reduced cost to on-premises storage. Amazon S3 is the most popular object storage service.

Amazon S3

Amazon Simple Storage Service (S3) is a cloud storage service which is capable of storing unlimited numbers of files, each of which can be as large as 5TB. This makes S3 a good candidate for storing GIS data on the cloud.

It has the following features:

Can store objects (i.e. files) from 1 byte to 5TB each. It can store any number of objects

A user can have up to 100 buckets, which is like a directory. The name of bucket must be unique in the entire S3 service.

Each object is then stored in a bucket (you can have unlimited objects in a bucket) and identified by a user-assigned key

One can share his/her object to the public or set specific AWS-user based access control. The control can be applied to buckets and objects separately

S3 does not have the concept of directory. Within a bucket, all objects are stored in a flat structure. You cannot have a bucket in another bucket. One can use delimiters of his/her own choice in object keys to form a "hierarchy".

An object can have metadata associated with it

One can enable versioning in a bucket. Then whenever an object in that bucket is changed (updated or deleted), the old version of the object is still kept.

Accessing using FME Desktop 2014 and later

FME Desktop includes several transformers for working with files on S3: S3Uploader, S3Downloader, S3ObjectLister and S3Deleter. To use these transformers, you will need the Access Key ID and Secret for your S3 account, and the bucket name and object key. As of build 2015.1.1, all AWS transformers support multiple regions. You don't have to do anything we automatically detect the region. Please see S3ObjectLister, S3Downloader, S3Uploader Transformers for more information on how to set up these transformers.

S3Uploader

When uploading, you can specify the path to a file/folder, or you can upload the result of an expression - this can of course include the @Value() of an attribute.

S3Downloader

You can download to a file or an attribute. If you are downloading to an attribute and the files are large ensure you have enough memory.

S3ObjectLister

This is useful if you are not familiar with the contents of the bucket. You can list all of the objects and then use the S3Downloader to download the files using the paths it returns.

If the security bucket policy is set to public READ access you can simply get the URL and then use that as the source for any file based dataset.

Accessing using FME Server 2014 and later

S3 Watch Publisher

FME Server 2014 introduces an Amazon Simple Storage Service publisher. This means you can watch any S3 bucket you have permission on and trigger an event when a file is added, changed and deleted. The publisher polls activity on the bucket and publishes messages about that activity to FME Server topics.

S3 buckets now natively support event notifications, so rather than polling using the FME S3 subscriber you can configure messages to be sent to an SQS queue or Lambda function. Moe details on how to do that here. The S3 Watch publisher is still useful when you want to watch a public bucket or bucket that you don't own.

S3 Watch Subscriber

FME Server 2014 introduces an Amazon Simple Storage Service subscriber. This means that any notification can trigger an upload to S3. As with the FME Desktop transformers, you will need the credentials for the account and the bucket/key. The file to be uploaded can be selected from the Shared Resources configured in FME Server, or can be dynamically selected from the notification content using email template language. These Notification Keywords specific to the S3 Subscriber may also be useful.

Accessing with older versions of FME

In previous versions of FME (2013 and earlier) the method for interacting with S3 was to mount your S3 storage on your computer as a network drive or local drive.

There are various third party software tools through which to mount a drive. These products usually provide you with advanced options such as using a proxy, setting a maximum upload or download rate and maximum number of concurrent transfers, etc.

Below are some applications which we have tried (though of course there may be others, and we don't officially endorse one above any other).

Microsoft Windows

TntDrive and Gladinet have been tested to work well with FME. The setup steps are simple: you can mount S3 to a particular drive (e.g. Z:\) by supplying your S3 access key (i.e. user ID) and secret key (i.e. password) in these applications.

After mounting, you can select any file or folder on the mounted S3 drive as if your data were stored locally. You can use your mounted S3 drive as your dataset source, dataset destination, or both. The only noticeable difference is that the translation takes longer to finish because data has to be downloaded from or uploaded to the S3 server.

Linux/Mac

You need to use FUSE (Filesystem in Userspace) to mount a drive on your system. There are a couple of open source projects which mounts S3 as a local file system. FME can access data on S3 by using the mounted file system. These options are s3fs-fuse and S3Backer. Detailed setup instructions can be found on their respective source code repositories.