Experimenting With The Amazon Simple Storage Service (S3) API Using ColdFusion

Before I say anything, I should probably mention that as of ColdFusion 9.0.1, ColdFusion has had native file-support for Amazon S3 using the "s3://" protocol. That said, I wanted to try experimenting with the Amazon S3 REST API using ColdFusion's CFHttp functionality. I know that I'm like 5 years (at least) behind everyone else on this topic; so, this blog post won't add much to the conversation - really, this is just here for my own reference.

Amazon Simple Storage Service (S3) is a hugely scalable data storage system. But, it is not a file system; it is a key-value store. You can have it mimic a file system by using storage keys that look like file paths; many applications, including ColdFusion's native S3 integration, present S3 as a file hierarchy. But at the end of the day, that's just a user-friendly abstraction built on top of the resource key that identifies a stored object.

The "not a file system" nature of Amazon S3 has other implications, as well, such as consistency. In some regions (but not all), S3 provides "eventual consistency." I don't have a full grasp of how "eventual" eventual consistency is; but in the US Standard region, due to the cross-country latency, Amazon does not guarantee read-after-write access.

Right now, I don't know if this eventual consistency applies to every client of your application? Or, if it is just for cross-client consistency? Meaning, if I PUT an object into S3, can I (as the PUT executer) read that object from S3 immediately? I'll have to do some more reading on this.

Ok, enough with the background, let's do some experimenting. For this post, all I want to do is try to upload an object to Amazon Simple Storage Service (S3), read it out as a binary, and provide an authenticated, public URL to the object.

Uploading Objects To Amazon Simple Storage Service (S3)

Amazon S3 can store just about anything with only the loosest of size constraints. It simply stores bytes. Those bytes can represent text files; those bytes can also represent images. We're going to try uploading an image of the beautiful and talented Helena Bonham Carter.

When posting the file to S3, we'll post its binary value as the Body of the post.

<!---

Creates a structure with the secretKey and accessID so that I

don't have to have them in the blog post.

--->

<cfinclude template="credentials.cfm" />

<!---

This is the file we are going to upload. We need to read in the

binary file since we aren't posting it like a form field - we're

posting it as the BODY of the PUT request.

--->

<cfset content = fileReadBinary( expandPath( "./helena.jpg" ) ) />

<!---

When uploading the file, we are going to save it at the

following "Key". NOTE: S3 is NOT A FILE SYSTEM. It's a key/value

store. While this resource address looks like a file path, it is

a single key.

--->

<cfset resource = "/testing.bennadel.com/signed-urls/helena.jpg" />

<!--- ----------------------------------------------------- --->

<!--- ----------------------------------------------------- --->

<!---

All requests to the S3 API have to be authenticated. Here, we are

going to create the "signature" to be used in the Authorization

header of the PUT request.

--->

<!---

A timestamp is required for all authenticated requests (NOTE: This

does not apply to query-string-authentication based requests).

--->

<cfset currentTime = getHttpTimeString( now() ) />

<!---

The content type is not required; but it will be stored as meta-

data with the object if supplied.

--->

<cfset contentType = "image/jpeg" />

<!---

Set up the part of the string to sign - we are not including any

X-AMZ headers in this.

--->

<cfset stringToSignParts = [

"PUT",

"",

contentType,

currentTime,

resource

] />

<!--- Collapse the parts into a newline-delimited list. --->

<cfset stringToSign = arrayToList( stringToSignParts, chr( 10 ) ) />

<!---

The target string is then signed to Hmac-Sha1 hashing, and

must be encoded as Base64. For this, I am using my Crypto.cfc

component.

NOTE: If you have ColdFusion 10, the hmac() function will now

do this with a single function call.

--->

<cfset signature = new Crypto().hmacSha1(

aws.secretKey,

stringToSign,

"base64"

) />

<!--- ----------------------------------------------------- --->

<!--- ----------------------------------------------------- --->

<!---

Post the actual binary to the S3 bucket at the given resouce.

NOTE: Since we have not provided any ACL (Access Control List)

permissions, the resource will be stored as *private* by default.

--->

<cfhttp

result="put"

method="put"

url="https://s3.amazonaws.com#resource#">

<cfhttpparam

type="header"

name="Authorization"

value="AWS #aws.accessID#:#signature#"

/>

<cfhttpparam

type="header"

name="Content-Length"

value="#arrayLen( content )#"

/>

<cfhttpparam

type="header"

name="Content-Type"

value="#contentType#"

/>

<cfhttpparam

type="header"

name="Date"

value="#currentTime#"

/>

<cfhttpparam

type="body"

value="#content#"

/>

</cfhttp>

<!--- Dump out the Amazon S3 response. --->

<cfdump

var="#put#"

label="S3 Response"

/>

By default, the object is stored with private access settings. This means that only authenticated users can view the object using the resource URL. You can pass a lot of additional settings with the PUT command, including access control permissions; but, for this blog post, I'll keep it as simple as possible.

Reading Objects From Amazon Simple Storage Service (S3)

Now that we've uploaded our image, let's read it back out. Like the PUT command, the GET command also has to be authenticated with the Hmac signature.

<!---

Creates a structure with the secretKey and accessID so that I

don't have to have them in the blog post.

--->

<cfinclude template="credentials.cfm" />

<!--- This is the resource that we want to read as a binary. --->

<cfset resource = "/testing.bennadel.com/signed-urls/helena.jpg" />

<!--- ----------------------------------------------------- --->

<!--- ----------------------------------------------------- --->

<!---

All requests to the S3 API have to be authenticated. Here, we are

going to create the "signature" to be used in the Authorization

header of the GET request.

--->

<!---

A timestamp is required for all authenticated requests (NOTE: This

does not apply to query-string-authentication based requests).

--->

<cfset currentTime = getHttpTimeString( now() ) />

<!--- Set up the part of the string to sign. --->

<cfset stringToSignParts = [

"GET",

"",

"",

currentTime,

resource

] />

<!--- Collapse the parts into a newline-delimited list. --->

<cfset stringToSign = arrayToList( stringToSignParts, chr( 10 ) ) />

<!---

The target string is then signed to Hmac-Sha1 hashing, and

must be encoded as Base64. For this, I am using my Crypto.cfc

component.

NOTE: If you have ColdFusion 10, the hmac() function will now

do this with a single function call.

--->

<cfset signature = new Crypto().hmacSha1(

aws.secretKey,

stringToSign,

"base64"

) />

<!--- ----------------------------------------------------- --->

<!--- ----------------------------------------------------- --->

<!--- Read the S3 resource AS A BINARY object. --->

<cfhttp

result="get"

method="get"

url="https://s3.amazonaws.com#resource#"

getasbinary="yes">

<cfhttpparam

type="header"

name="Authorization"

value="AWS #aws.accessID#:#signature#"

/>

<cfhttpparam

type="header"

name="Date"

value="#currentTime#"

/>

</cfhttp>

<!---

Reset the output buffer and then stream the content to the

screen as an image.

--->

<cfcontent

type="image/jpeg"

variable="#get.fileContent#"

/>

Notice that both the PUT and the GET actions required the current date to be set as part of the request headers. This date/time value needs to be within 15 minutes of Amazon S3 system time, or the request will be rejected. In addition to being current, the date/time value also has to be posted in a specific format. Luckily, ColdFusion's native getHttpTimeString() function makes this super easy as well.

Now that we've seen that we, as authenticated S3 users, can write-to and read-from the REST API, let's look at how to provide public URLs to our uploaded objects. Using "Query String Request Authentication," we can put our authentication signature directly into the request URL, removing the need of our end-users to provide the Authorization request header.

These generated URLs are time-sensitive. That is, we define an expiration date as part of the URL definition. Once the URLs has expired, Amazon S3 will start returning "Access Denied" responses. The expiration is defined as the number of seconds since Epoch. In our demo, we'll provide a URL that is valid for only 10 seconds.

NOTE: I am using the undocumented .getTime() method of the Jave Date object. You could be a bit more "proper" and use the dateDiff() function.

After we generate this URL, we can then use it to populate an IMG "src" attribute presented to our users. In this way, we can provide "secure" content to our users without making our S3 objects public.

This is only a taste of what the Amazon Simple Storage Service (S3) can do. There's a ton of stuff left to explore.

I have coincidentally been beating my head against the S3 API for the last week or so. One big "gotcha" I had to work around was file names and paths containing spaces. Remember to URL Encode your request!

If you don't, the signature will be for the non-encoded value while the browser will auto-URL encode the returned presigned URL. This will result in a signature mismatch error being returned by S3.

Glad you like! Hopefully I'll have some more interesting stuff coming. This morning, I blogged a bit more about generating the pre-signed, query string authenticated URLs; but, then deemed that my exploration probably was not very fruitful (other than an increased understanding of the technology).

Oh, super interesting! I had only thought to url-encode the signature; but I think that's because the S3 docs actually have a special NOTE telling you to do so. It would have never occurred to me that url-encoding would be necessary for the file names when generating the signature. Dang! I don't have any idea how I would have even debugged that.

In the past, I know that debugging Hmac values is wicked super pain. I remember when I was dealing with the Twilio API (I think), I wasn't converting to Hex properly and the leading "0" would always be stripped off... so it failed like 20% of the time :D Talk about frustrating! Took me like a week going back and forth with their support before I figured out what the problem was.

Right, but I'm still not 100% sure I understand the implications of that. Meaning, let's say that I am on the East coast of the US and I PUT an object into S3. When I think about "eventual consistency," it makes me think that if someone on the West coast of the US then immediately tried to request it, it *may* not be available yet, due to the latency in distribution across data centers. But, does that also mean that if I (on the East coast) make a request to the Object I just uploaded, will it be available immediately.

Or maybe I'm just not understanding how the data is distributed across areas.

In my cursory testing uploading to US Standard, I've been able to access the files immediately after upload. My uploader performs processing on the file on upload success. So it appears that the uploader can hit the file, but perhaps not clients hitting nodes in other regions.

It remains to be seen if that process survives QA. If we don't get consistency, I may have to roll my processing over to a polling type system that keeps an eye on my temp S3 storage location.

I assume the data is distributed as with any other kind of CDN. It gets placed onto a single node immediately and then propagates through the rest of the network.

I was a conference last week talking to John Mancuso, who is a "Solutions Architect at Amazon Web Services". He had mentioned to the eventual consistency to me at the time. BUT, he said that the latency was only on the order of 1 second. So, at the very least, if its eventually consistent, at least "eventual" is super fast.

In my particular scenario, that could be OK, because we don't really need to read directly after write. What we do need to do is:

* Upload.* Create a pre-signed URL.* Send that URL to the browser.* Have the client use an IMG tag with that URL.

So, hopefully the 1s delay (if it happens at all), will be offset by the workflow and server-client communication and HTML rendering overhead.

Thanks for the links; I poked around in Barney's S3; and I've actually used Joe's in a previous project. But, I'd not really gotten my hands dirty with the knitty-gritty of how everything was put together.

Kudos Ben, I read this at just the right time. I have recently migrated two client CF sites to AWS (one Windows, one Ubuntu) using the CF 10 AMI that came out a few months ago. I need to convert the static assets to S3 next. Thanks for the code and the hmac() function. I'll take a look at your Crypto for my CF9 clients.

You are a great asset to the ColdFusion community. I've been coding out here in Seattle for years mostly under the radar admiring your blog from afar.

BTW I've been happy with CF on AWS so far and the pricing beats traditional hosting.

Excellent timing! And, funny you mention the hmac() stuff. I actually, just yesterday, posted a bit more about generating the signatures and the Content-MD5 hash in both ColdFusion 9 and ColdFusion 10:

Post A Comment

You — Get Out Of My Dreams, Get Into My Comments

Live in the Now

Oops!

Name:

Email:

( I keep this private )

Website:

Comment:

Subscribe to comments.

Comment Etiquette: Please do not post spam. Please keep the comments on-topic. Please
do not post unrelated questions or
large chunks of code. And, above all, please be nice to each other - we're trying to
have a good conversation here.