Windows Azure Storage

Windows Azure Overview

Before I begin to build the application, a quick overview of Windows Azure and Roles is necessary. There are many resources available to describe these, so I wouldn’t go into a lot of detail here.

Windows Azure is Microsoft’s Cloud Computing offering that serves as the development, service host, and service management environment for the Windows Azure Platform. The Platform is comprised of three pieces: Windows Azure, SQL Azure, and AppFabric.

SQL Azure: Cloud-based relational database management system that includes reporting and analytics.

AppFabric: Service bus and access control for connecting distributed applications, including both on-premise and cloud applications.

Windows Azure Roles

Unlike security related roles that most developers may be familiar with, Windows Azure Roles are used to provision and configure the virtual environment for the application when it is deployed. The figure below shows the Roles currently available in Visual Studio 2010.

Except for the CGI Web Role, these should be self-explanatory. The CGI Web Role is used to provide an environment for running non-ASP.NET web applications such as PHP. This provides a means for customers to move existing applications to the cloud without the cost and time associated with rewriting them in .NET.

Building the Azure application

The first step is, of course, to create the Windows Azure application to use for this demonstration. After the prerequisites have been installed and configured, you can open Visual Studio and take the normal path to create a new project. In the New Project dialog, expand the Visual C# tree, if not already, and click Cloud. You will see one template available, Windows Azure Cloud Service. Note that although .NET Framework 4 is selected, Windows Azure does not support 4.0 yet and the projects will default to .NET Framework 3.5.

After selecting this template, the New Cloud Service Project dialog will be displayed, listing the available Windows Azure Roles. For this application, select an ASP.NET Web Role and a Worker Role. After the roles have been added to the Cloud Service Solution list, you can rename them by hovering over the role to display the edit link. You can, of course, add additional Roles after the solution has been created.

After the solution has been created, you will see three projects in the Solution Explorer.

As this article is about Azure Storage rather than Windows Azure itself, I’ll briefly cover some of the settings but leave more in-depth coverage for other articles or resources.

Under the Roles folder, you can see two items, one for each of the roles that were added in the previous step. Whether you double click the item or right-click and select Properties from the context menu, it will open the Properties page for the given role. The below image is for the AzureStorageWeb Role.

The first section in the Configuration tab is to select the trust level for the application. These settings should be familiar to most .NET developers. The Instances section tells the Windows Azure Platform how many instances to of this role to create and the size of the Virtual Machine to provision. If this Web Role were for a high volume web application, then selecting a high number of instances would improve its availability. Windows Azure will handle the load balancing for all of the instances that are created. The VM sizes are as follows:

Small: 1 core processor, 1.75 GB RAM, 250 GB hard drive

Medium: 2 core processor, 3.5 GB RAM, 500 GB hard drive

Large: 4 core processor, 7 GB RAM, 1000 GB hard drive

Extra large: 8 core processor, 15 GB RAM, 2000 GB hard drive

The Startup action is specific to Web Roles and, as you can see, allows you to designate whether the application is accessed via HTTP or HTTPS.

The Settings tab should be familiar to .NET developers, and is were any additional settings for the application can be created. Any settings added here will be placed in the ServiceConfiguration and ServiceDefinition files since they apply to the service itself, not specifically to a role project. Of course, the projects also have the web.config and app.config files that are specific to them.

The EndPoints tab allows you to configure the endpoints that will be configured and exposed for the Role. In this case, the Web Role can be configured for HTTP or HTTPS with a specific port and SSL certificate if appropriate.

As you can see here, the Worker Role has a different Endpoints screen. The types available from the dropdown are Input and Internal, and the Protocol dropdown includes, http, https, and tcp as the default. This allows you to connect to the Worker Role via any of these protocols and expose the functionality externally if necessary.

Web Role

Since this article is meant to focus on Azure Storage, I’ll keep the UI simple. However, thanks to JQuery and some styles, a simple interface can still look good for little effort.

There is nothing special about the web application, it is just like any other web app you have built. There is one class that is unique to Azure, however, the WebRole class. All Roles in Windows Azure must have a class that derives from RoleEntryPoint. This class is used by Windows Azure to initialize and control the application. The default implementation provides an override for the OnStart method and assigns a handler for the RoleEnvironmentChanging event. This will allow the Role to be restarted if the configuration changes, such as increasing the instance count or adding a new setting. If there were other actions necessary to be taken before the application started, they should be handled here. Likewise, the Run and OnStop methods can be overridden to perform an action before the application is run and before it is stopped, respectively.

Azure Storage

As I’ve said, there are three types of storage available with the Windows Azure Platform: blob, table, and queue.

Blob Storage

Binary Large Object, or blob, should be familiar to most developers and is used to store things like images, documents, or videos; something larger than a name or ID. Blob storage is organized by containers that can have two types of blob: Block and Page. The type of blob needed depends on its usage and size. Block blobs are limited to 200 GB, while Page blobs can go up to 1 TB. Note, however, that in development, storage blobs are limited to 2 GB. Blob storage can be accessed via RESTful methods with a URL such as: http://myapp.blob.core.windows.net/container_name/blob_name.

Although blob storage isn’t hierarchical, it can be simulated by the name. Blob names can use /, so you can have names such as:

Here it appears that the blobs are organized by year, month, and day; however, in reality, the names of blobs are like 2009/10/4/photo1, 2009/10/4/photo2, and 2008/6/25/photo1.

Block Blob

Although a Block blob can be up to 200 GB, if it is larger than 64 MB, it must be sent in multiple chunks of no more than 4 MB. Storing a Block blob is also a two-step process; the block must be committed before it becomes available. When a Block blob is sent in multiple chunks, they can be sent in any order. The order in which the Commit call is made determines how the blob is assembled. Thankfully, as we’ll see later, the Azure Storage API hides these details so you won’t have to worry about them unless you want to.

Page Blob

A Page blob can be up to 1 TB in size, and is organized into 512 byte pages within the block. This means any point in the blob can be accessed for read or write operations by using the offsite from the start of the blob. This is the advantage to using a Page blob rather than a Block blob, which can only be accessed as a whole.

Table Storage

Azure tables are not like tables from an RDBMS like SQL server. They are composed of a collection of entities and properties, with properties further containing collections of name, type, and value. The thing to realize, and what may cause a problem for some developers, is that Azure tables can’t be accessed using ADO.NET methods. As with all other Azure storage methods, RESTful access is provided: http://myapp.table.core.winodws.net/TableName.

I’ll cover tables in-depth later when getting to the actual code.

Queue Storage

Queues are used to transport messages between applications, Azure based or not. Think of Microsoft Messaging Queue, MSMQ, for the cloud. As with the other storage type, RESTful access is available as well: http://myapp.queue.core.windows.net/Queuename.

Queue messages can only be up to 8 KB; remember, it isn’t meant to transport large objects, only messages. However, the message can be a URI to a blob or table entity. Where Azure Queues differ from traditional queue implementations is that it is not a FIFO container. This means, the message will remain in the queue until explicitly deleted. If a message is read by one process, it will be marked as invisible to other processes for a variable time period, which defaults to 30 seconds, and can be no more than 2 hours; if the message hasn’t been deleted by then, it will be returned to the queue and will be available for processing again. Because of this behavior, there is also no guarantee that messages will be in any particular order.

Building the Storage Methods

To start with, I’ll add another project to the solution, a Class Library project. This project will serve as a container for the storage methods and implementation used in this solution. After creating the project, you’ll need to add references to the Windows Azure Storage assembly Microsoft.WindowsAzure.StorageClient.dll, which can be found in the Windows Azure SDK folder, C:Program FilesWindows Azure SDKv1.1ref StorageBase.

Since a CloudStorageAccount is necessary for any access, I’ll create a base class to contain a property for it.

Collapse

public static CloudStorageAccount Account
{
get
{
// For development this can be used
//return CloudStorageAccount.DevelopmentStorageAccount;
// or this so code doesn't need to be changed before deployment
return CloudStorageAccount.FromConfigurationSetting("DiagnosticsConnectionString");
}
}

You’ll see here that we can use two methods to return the CloudStorageAccount object. Since the application is being run in a development environment, we could use the first method and return the static property DevelopmentStorageAccount. However, before deployment, this would need to be updated to an actual account. Using the second method, however, the account information can be retrieved from the configuration file, similar to database connection strings in an app.config or web.config file. Before the FromConfigurationSetting method can be used though, we must add some code to the OnStart method of the WebRole class.

This code basically tells the runtime to use the configuration file for setting information, and also sets an event handler for the RoleEnvironment.Changed event to detect any changes to the configuration file. If a change is detected, the Role will be restarted so those changes can take effect. This code also makes the default RoleEnvironment.Changing event handler implementation unnecessary since they both do the same thing, restarting the role when a configuration change is made.

Implementing Blob Storage

The first thing we need is a reference to a CloudBlobClient object to access the methods. As you can see, there are two ways to do this. Both produce the same result; one is just less typing, but gives more control over the creation.

public void PutBlobBlock(Stream stream, string fileName)
{
// This method returns true if the container did not exist and was created
// but for this purpose it doesn't matter.
Client.GetContainerReference(CONTAINER_NAME).CreateIfNotExist();
// Now that the container has been created if necessary
// we can upload the blob
Client.GetContainerReference(CONTAINER_NAME)
.GetBlobReference(fileName)
.UploadFromStream(stream);
}

As you can see, the first step is to retrieve a reference to the container. The CreateIfNotExist method is a convenience that, as the name implies, will create the container if it doesn’t already exist. An alternative approach would be as follows:

After you have a reference to the container, the next step is to get a reference to the blob. If a blob already exists with the specified name, it will be overwritten. After obtaining a reference to the CloudBlob object, it’s just a matter of calling the appropriate method to upload the blob. In this case, I’ll use the UploadFromStream method since the file is coming from the ASP.NET Upload control as a stream; however, there are other methods depending the environment and usage, such as UploadFile, which uses the path of a physical file. All of the upload and download methods also have asynchronous counterparts.

One thing to note here is that the container names must be lowercase. If trying a name with capitalization, you will receive a rather cryptic and uninformative StorageClientException with the message “One of the request inputs is out of range.” Further, the InnerException will a WebException with the message “The remote server returned an error: (400) Bad Request.”

Implementing Table Storage

Of the three storage types, Azure Table Storage requires the most setup. The first thing necessary is to create a model for the data that will be stored in the table.

For this demonstration, the model is very simple, but, most importantly, it derives from TableServiceEntity which tells Azure the class represents a table entity. Although Azure Table Storage is not a relational database, there must be some mechanism to uniquely identify the rows that are stored in a table. The PartitionKey and RowKey properties from the TableServiceEntity class are used for this purpose. The PartitionKey itself is used to partition the table data across multiple storage nodes in the virtual environment, and, although an application can use one partition for all table data, it may not be the best solution for scalability and performance.

Windows Azure Table Storage is based on WCF Data Services (formerly, ADO.NET Data Services), so there needs to be some context for the table. The TableServiceContext class represents this, so I’ll derive a class from it.

Within the constructor, I’ll make sure the table has also been constructed, so it will be available when necessary. This could, of course, also be done in the RoleEntryPointOnStart method if the table may be used in multiple classes.

Adding to the table should be very familiar to anyone who has worked with LINQ to SQL or Entity Framework. You add the object to the data context, then save all the changes. Note here the RowKey naming. Since I’m using the date for the filename, I need to make a slight modification since RowKey can’t contain “/” characters.

Getting to the contents of the table is a matter of creating a DataServiceQuery for the model and specifying the EntitySet you are interested in. From there, you can use LINQ to access a particular item.

Implementing Queue Storage

Queue storage is probably the easiest part to implement. Unlike Table storage, there is no need to setup a model and context, and unlike Blob storage, there is no need to be concerned with blocks and pages. Queue storage is only meant to store small messages, 8 KB or less. Adding a message to a Queue follows the same pattern as the other storage mechanisms. First, get a reference to the Queue, creating it if necessary, then add the message.

Worker Role

Now we can finally get to the Worker Role. To demonstrate how a Worker Role can be incorporated into a project, I’ll use it to add a watermark to the images that have been uploaded. The Queue that was previously created will be used to notify this Worker Role when it needs to process an image and which one to process.

Just as with the Web Role, the OnStart method is used to setup and configure the environment. Worker Roles have the additional method, Run, which simply creates a loop and continues indefinitely. It’s somewhat odd to not have an exit condition; instead, when Stop is called for this role, it forcibly terminates the loop, which may cause issues for any code running in it.

You can view the sample code for this article to see the details of PhotoProcessing.Run. It simply gets the blob indicated in the QueueMessage, adds a watermark, and updates the Blob storage.

Putting it all Together

Now that everything has been implemented, it’s just a matter of putting it all together. Using the Click event for the Upload button on the ASPX page, I’ll get the file that is being uploaded and the other pertinent details. The first step is to upload the blob so we can get the URI that points to it and add it to Table storage along with the description and date. The final step is adding a message to the Queue to trigger the worker process.

As I said, the UI is very simple, with the focus being on the underlying processes for Azure Storage.

Conclusion

Hopefully, this article has given you an overview of what Windows Azure Storage is and how it can be used. There is, of course, much more that can be covered on this topic, that may be covered in follow-up articles. However, here are some resources that can provide you with additional information and insight about Windows Azure and Windows Azure Storage.