Amazon SimpleDB released a new version last week. With this new version, developers will be now able sort the results and use a new does-not-start-with operator in their queries - the two most frequently asked feature requests.

I am very excited about the new sort feature because now all the processing will happen in-the-cloud and I will be able to execute scenarios like:

I would also like to highlight if you are storing your data on Amazon S3 and indexing your corresponding metadata on Amazon SimpleDB, this feature will be highly useful. Use this excellent library to index all your Amazon S3 object metadata in Amazon SimpleDB.

Everyday, we hear new stories about a cool new startup and its success story.

Today, It was WalkScore.com. The website offers some great information about which neighborhood/city is more walkable than the rest (San francisco was #1 and Seattle was #6). Walk Score calculates the walkability of an address by locating nearby stores, restaurants, schools, parks, etc. I think it is a great site for those who are consider moving to a new neighborhood or those who simply like the car-free lifestyle, especially because the gas prices are setting new records everyday.

We would never have weathered being the #1 story on Yahoo! yesterday if it weren't for Amazon! THANK YOU!

We're a hybrid-philanthropy business which means we prioritize social good over profit--and therefore we're on a pretty tight infrastructure budget :-) What's so great about Amazon cloud computing is that it was very cheap from an infrastructure and dev standpoint for us to scale up quickly. In a nutshell we have only one physical web server and didn't want to deal with the expense of a hardware upgrade so:

We set up 4 EC2 instances to serve the walkability heat map tiles you see overlaid on top of the Google maps. Here is Seattle for example.

We moved all of our images, CSS, and JS files to Amazon S3 which took a big load off of our one web server.

We were able to accommodate a spike of about 80K unique visitors during a three hour period thanks to Amazon

Does your desk look like this photo? No comment on where the photo was taken, of course... There's hope!

Pixily just launched, with a business model that could be described as "NetFlix in reverse". They offer a plan that allows you to send them one envelope per month (envelopes can contain up
to 50 items) filled with documents that you want scanned and made searchable. This base plan costs $14.95 per month, and of course higher volume plans are available.

Prasad Thammineni, CEO
of Pixily, came to our AWS Startup Event last fall in Boston, where I had the opportunity to meet him. Pixily is based in Waltham, MA and a
big user of AWS--in fact, a Prasad says "We use EC2, S3 and SQS. AWS has helped us democratize
expensive technology and make it accessible to consumers and small businesses.
This technology until now was available to only large enterprises."

"Pixily has economized by building the entire website atop Amazon's Web
services infrastructure, which allows a company to rent servers and storage
space as needed. "That gives us the flexibility to add more servers based on our
demand, as traffic increases, instead of paying for them at the outset," says
chief technology officer Vikram Kumar"

We thought of trying out a new idea. Instead of working from our Amazon offices, for a change, we will be work for few hours, every last tuesday of the month, from an offsite.

We like to call it AWS “Office Hours”.

Offsite will be at the StartPad co-working office space in Pioneer Square in Seattle. This will be your chance to chat with an AWS technical evangelist and technical support engineer and get your questions answered. Plus, there is free internet and desk space if you want to camp out for the afternoon.

If you are new to the cloud, the first section of the paper will help you understand the benefits of building applications in-the-cloud. If you are using the cloud already, the second section of the paper will help you to use the cloud more effectively by utilizing some of the best practices.

In this paper, I discuss a new way to design architectures. Cloud Architectures are Services-Oriented Architectures that are designed to use On-demand infrastructure more effectively. Applications built on Cloud Architectures are such that the underlying computing infrastructure is used only when it is needed (for example to process a user request), draw the necessary resources on-demand (like compute servers or storage), perform a specific job, then relinquish the unneeded resources after the job is done. While in operation the application scales up or down elastically based on actual need for resources. Everything is automated and operates without any human intervention.

As an example of a Cloud Architecture, I discuss the GrepTheWeb application. This application runs a regular expression against millions of documents from the web and returns the filtered results which match the query. The architecture is interesting because it is runs completely on-demand in automated fashion. Triggered by a regex request, hundreds of Amazon EC2 instances are launched, a Hadoop Cluster is started on them, transient messages are stored on Amazon SQS queues, statuses in Amazon SimpleDB, and all Map/Reduce jobs are run in parallel. Each Map task fetches the file from Amazon S3 and runs the regular expression - and aggregates all the results in the Reduce/Combine Phase and then disposes all the infrastructure back into the cloud (when the Hadoop job is processed)

GrepTheWeb is one of many applications built by Amazon that uses all our services (Amazon EC2, Amazon SimpleDB, Amazon SQS, Amazon S3) together.

A wide variety of different types of applications that can be built using this design approach - from nightly batch processing systems to media processing pipelines.

An excerpt:

Cloud Architectures address key difficulties surrounding large-scale data processing. In traditional data processing it is difficult to get as many machines as an application needs. Second, it is difficult to get the machines when one needs them. Third, it is difficult to distribute and co-ordinate a large-scale job on different machines, run processes on them, and provision another machine to recover if one machine fails. Fourth, it is difficult to auto-scale up and down based on dynamic workloads. Fifth, it is difficult to get rid of all those machines when the job is done. Cloud Architectures solve such difficulties.

Applications built on Cloud Architectures run in-the-cloud where the physical location of the infrastructure is determined by the provider. They take advantage of simple APIs of Internet-accessible services that scale on-demand, that are industrial-strength, where the complex reliability and scalability logic of the underlying services remains implemented and hidden inside-the-cloud. The usage of resources in Cloud Architectures is as needed, sometimes ephemeral or seasonal, thereby providing the highest utilization and optimum bang for the buck.

In the first section I discuss the advantages and business benefits of Cloud Architectures and how each service was used. In the second section, I discuss best practices for the various Amazon Web Services.

I talked about this briefly at the Hadoop Summit 2008 and QCon 2007. I got some good reviews after the talk and hence I decided to put all my thoughts in this paper along with some Best Practices for the use of Amazon Web Services (Amazon EC2, Amazon SQS, Amazon S3 and Amazon SimpleDB together). Many developers from our community have been asking for a real-world example of a complex, large-scale application. I will presenting this paper at the 2008 NSF Data-Intensive Scalable Computing Workshop at UW and 9th IEEE/NATEA Conference on Cloud Computing later this week.

I believe this new and emerging way of building applications, that run in-the-cloud, is going to change the way we do business.

Andras wrote to tell me about Jollat, a new graphical cross-platform (Windows, Mac, and Linux) management client for Amazon EC2 and S3. Available for free download (with a purchase option), the client includes a number of interesting features.

On the S3 side, Jollat handles bucket creation in both the US and EU zones, upload and download of multiple files, log file configuration and management, and an access control list (ACL) editor.

On the EC2 side, Jollat's image manager makes it easy to find and launch any AMI (Amazon Machine Image). Once launched, instances can be accessed using an embedded SSH client. The tool also manages availability zones, IP addresses, and key pairs.