We've removed the distinction between Amazon EC2 running Windows and Amazon EC2 running Windows with Authentication Services, allowing all of our Windows instances to make use of Authentication Services such as LDAP, RADIUS, and Kerberos. With this change, any Windows instance can host a Domain Controller or join an existing domain. File sharing services such as SMB between instances will now automatically default to SMB-over-TCP in all cases, and will also be able to negotiate more secure authentication.

Existing Windows with Authentication Services instances will now be charged the same price as Windows instances, a savings of 50% on the hourly rate. All newly launched instances will be charged the new, lower price (starting at 12.5 cents per hour for a 32-bit instance in the US). Applications requiring logins can now be run on the Amazon EC2 running Windows AMIs.

If you are using Amazon DevPay in conjunction with Amazon EC2 running Windows with Authentication Services you will need to create new AMIs and adjust your pricing plan before November 1, 2009.

We continue to strive for simplicity and cost effectiveness; this is a good example of both!

-- Jeff;

PS - I know that a lot of you have been asking us to support Windows Server 2008. I don't have a release date for you yet, but I can assure you that we've prioritized the work needed to properly support it.

Public cloud computing has evolved into a mainstream approach for building out components of an IT infrastructure. Cost saving opportunities make the development of a public cloud strategy absolutely critical. Even before taking on pilot projects in the cloud, however, you should have a solid understanding of the security implications and opportunities in public cloud computing. Amazon Web Services and enStratus have teamed up for this webinar detailing how businesses moving into the cloud can understand the security issues in public cloud computing and how to secure a public cloud infrastructure.

Among the most critical components in cloud security is transparency from your cloud providers. AWS has built out an infrastructure and established processes to mitigate common vulnerabilities and offer a safe compute and storage environment. enStratus operates outside of the AWS cloud, watching over its operations, and keeping your authentication and encryption credentials safe outside the cloud while encrypting the data inside the cloud both in transit and at rest.

Steve Riley from AWS and George Reese from enStratus will discuss common cloud security concerns and show you how to take advantage of the security features AWS and enStratus provide you to build a secure public cloud infrastructure.

Key Learnings

How does AWS protect its infrastructure and, by extension, your data?

What can you do with tools like enStratus to further protect your data?

How can you use enStratus to protect your data from third-party subpoenas or subpoenas targeted at AWS?

How can I manage user access to my AWS infrastructure?

What issues impact compliance with various standards/regulations in the AWS cloud?

Speakers

George Reese, O'Reilly cloud computing author and CTO for enStratus, a leading cloud management platform.

I think it is really interesting to see how breakthroughs and process improvements in one scientific or technical discipline can drive that discipline
forward while also enabling progress in other seemingly unrelated disciplines.

The
Bioinformatics field is rife with examples of this pattern. Declining hardware costs, cloud
computing, the ability to do parallel processing, and algorithmic advances have driven down the cost and time of gene sequencing by multiple orders of
magnitude in the space of a decade or two. Processing that was once measured by years and megabucks is now denominated by hours and dollars.

My colleague Deepak Singh pointed out a number of recent AWS-related developments in this space:

There's a
getting-started guide for the JCVI AMI. Graphical and command-
line bioinformatics tools can be launched from a shell window connected to a running instance of the AMI.

CloudBurst

CloudBurst
is described as a "new parallel read-mapping algorithm optimized for mapping next-generation sequence data to the human genome and other reference genomes,
for use in a variety of biological analyses including SNP discovery, genotyping, and personal genomics."

In laymen's terms, CloudBurst uses Hadoop to implement a linearly scalable search tool. Once loaded with a reference genome, it maps the "short reads"
(snippets of sequenced DNA approximately 30
base pairs long) to a location (or locations) on the reference genome. Think of it as a very advanced
form of string matching, with support for partial matches, insertions, deletions, and subtle differences. This is a highly parallelizable operation;
CloudBurst reduces operations involving millions of short reads from hours to minutes when run on a large-scale cluster of EC2 instances.

You can read more about CloudBurst in the research paper. This paper includes
benchmarks of CloudBurst on EC2 along with performance and scaling information.

Crossbow

Crossbow was built to do "Whole Genome Resequencing in the Clouds."
It combines Bowtie for ultra-fast short read alignment and
SOAPsnp for sequence assembly and high quality SNP calling.
The Crossbow home page claims that it can sequence an entire genome in an afternoon on EC2, for less than $250. Crossbow is so new that the papers
and the code distribution are still a little ways off. There's a lot of good information in
this
poster:

Michael Schatz (the principal author of CloudBurst and Bowtie) wrote a really interesting note on
Hadoop for Computational Biology. He
states that "CloudBurst is just the beginning of the story, not the end." and endorses the Map/Reduce model for processing 100+GB datasets. I will echo
Mike's conclusion to wrap up this somewhat long post:

In short, there is no shortage of opportunities for utilizing MapReduce/Hadoop for computational biology, so if your users are skeptical now, I just ask that
they are patient for a little bit longer and reserve judgment on MapReduce/Hadoop until we can publish a few more results.

I really learned a lot while putting this post together and I hope that you will learn something by reading it. If you are using EC2 in a bioinformatics context, I'd love to hear from you. Leave a comment or send me some mail.

Weighing in at a whopping 500 GB (388 GB of data and 112 GB of free space to allow for some in-place decompression), the Wikipedia XML data is our newest Public Data Set.

This data set contains all of the Wikimedia wikis in the form of wikitext source and metadata embedded in XML. We'll be updating this data set every month and we'll keep the sets for the previous three months around.

As you can see from this screen shot of my PuTTY window, there are some pretty beefy files in this data set:

This 20 GB data set incorporates daily weather measurements (temperature, dew point, wind speed, humidity, barometric pressure, and so forth) from over 9000 weather stations around the world. The data was originally collected as part of the Global Surface Summary of the Day (GSOD) by the National Climactic Data Center and is available from 1929 to the present, with the data from 1973 to the present being the most complete.

The map at right contains one yellow dot for each data collection station.

Weighing in at 180 GB, the SDSS is the most ambitious astronomical survey ever undertaken. The researchers have used a 2.5 meter, 120 megapixel telescope located in Apache Point, New Mexico to capture images of over one quarter of the sky, or about 230 million celestial objects. They have also created 3-dimensional maps containing more than 930,000 galaxies and 120,000 quasars.

This new public data set (which is a subset of the entire SDSS) will be of interest to students, educators, hobby astronomers, and researchers. From a standing start, it is possible to launch an EC2 instance, create an Elastic Block Store volume with this data, attach the volume to the instance and start examining and processing the data in less than ten minutes.

The data set takes the form of a Microsoft SQL Server MDF file. Once you have created your EBS volume and attached it to your Windows EC2 instance, you can access the data using SQL Server Enterprise Manager or SQL Server Management Studio. The SDSS makes use of stored procedures, user defined functions, and a spatial indexing library, so porting it to another database would be a fairly complex undertaking.

I know from experience (my son Andy is studying Astronomy at the University of Washington and is always showing me the "please delete your unnecessary files" emails from the department's administrator) that storage space is always at a premium in academic settings, due in part to the existence of large scale data sets like this. The combination of EC2, EBS, this public data set, and our AWS in Education program should enable students and educators to analyze, process, display, and study the universe in revolutionary ways.

Today we are adding a new feature which significantly improves the
flexibility of EC2's Elastic Block Store (EBS) snapshot facility. You now have the ability to
share your snapshots with other EC2 customers using a new set of
fine-grained access controls. You can keep the snapshot to yourself (the default), share it with a list of EC2 customers, or share it publicly. Here's a visual overview of the data flow (in this diagram, the word Partner refers to anyone that you choose to share your data with):

The Amazon Elastic Block Store lets you create block storage volumes in sizes ranging from 1 GB to 1 TB. You can create empty volumes or you can pre-populate them using one of our Public Data Sets. Once created, you attach each volume to an EC2 instance and then reference it like any other file system. The new volumes are ready in seconds. Last week I created a 180 GB volume from a Public Data Set, attached it to my instance, and started examining it, all in about 15 seconds.

You can use the AWS Management Console, the command line tools, or the EC2 API to create a snapshot backup of an EBS volume at any time. The snapshots are stored in Amazon S3. Once created, a snapshot can be used to create a new EBS volume in the same AWS region. Sharing these snapshots, as we are now letting you do, makes it possible for other users to create an identical copy of the volume.

The new ModifySnapshotAttribute function gives you the ability to set and change the createPermission attribute on any of your snapshots. We've also added the ResetSnapshotAttribute function to clear snapshot attributes and the DescribeSnapshotAttribute function to get the value of a particular attribute.

The DescribeSnapshots function now lists all of the snapshots that have been shared with you. You can also use this function to retrieve a list of all of our Public Data Sets.

How can you use this? Off the top of my head, here are a number of ideas:

If you are a teacher or professor, create and share a volume of reference data for use in a classroom setting (and take a look at the AWS in Education program too).

If you are a researcher, share your data and your results with your colleagues, both within your own organization and at other organizations.

If you are a developer, share your development and test environments with your teammates. Snapshot the environments before each release to make it easy to regenerate the environment later for regression tests.

If you are a business, you can use snapshots to store data internally, with external clients, or with partners. This could be reference data, results of a lengthy and expensive computation, a set of test cases (and expected results) or even a set of pre-populated database tables.

I'm sure you have some ideas of your own; please feel free to share them in a comment!

If you are interested in both IBM software and Cloud Computing, there’s an online event that you need to know about. It’s no secret that large enterprises are rapidly setting up shop in the Cloud. Applications that range from CRM to ERP to in-house infrastructure. Cloud computing removes so many barriers that adoption seems self evident: time to market, agility, cost, and resiliency are four of the top reasons.

Last February ago we blogged about the arrival of IBM software in the AWS cloud, and I have to say that I am impressed by IBM’s innovative business pricing model that makes it easy for businesses to choose their operational environment based on technical elegance rather than licensing considerations. Enterprise software on demand definitely lowers the bar for software developers to evaluate – and especially to build – scalable applications.

On October 1st IBM and Amazon Web Services are teaming up to present a one-day online session for developers that offers a view into both worlds. That’s important, in my opinion, because while both companies have made access easy, there are still best practices that make your applications work even better in the cloud. You’ll hear directly from product teams about ways to use IBM software in an environment that is familiar yet just a little different. You’ll also hear from developers and ISVs currently using IBM software on the cloud.

You’ll also have the opportunity to try out IBM software on the AWS platform via hands-on labs. There is no cost for these labs, which will run live on Amazon EC2.
Register here.

Auto Scaling - Automated scaling of EC2 instances based on rules that you define.

All of the services work just the same way in Europe as they do in the US. Existing applications and management tools should be able to access the services in this region after a simple change of the service endpoint. As is the case with S3 and EC2, these services are independent of their US counterparts.

Our full slate of infrastructure services is now available in Europe. With the European debut of these services, developers can now built reliable and scalable applications in both of the AWS regions (US and Europe).

September 28 -
SmartPaaS with Pega On the Cloud. Featuring PegasystemsHear how Pegasystems is building smart provisioning services for its customers on AWS. Pega will discuss its background and how it is innovating business process on the cloud. The webinar will feature Pega customer PAREXEL who is actively using AWS today.