My previous post in this series - on the use of the Service Bus within the Azure Photo Mosaics application – for all intents and purposes completed the explanation of all of the application features, but there’s an alternative implementation that I had planned for, specifically to demonstrate the (then new) feature of Caching. And that’s what I’ll focus on for this post.

Caching Overview

Windows Azure Caching provides a distributed in-memory cache as a service that you can leverage via applications running in the Windows Azure cloud. The cache is essentially a massive property bag of name/value pairs where data is stored across multiple nodes (machines) in the Windows Azure data center and managed on your behalf. Caching is a true service, since the only thing you have to do to set it up is pick which data center will host it, specify the size of the cache, and pick an endpoint name within the Windows Azure Portal. There are a discrete number of cache sizes available (from 128MB to 4GB), and although you pay the same amount for the cache whether it’s 0 or 100% utilized, you can increase or decrease its size when needed (although just once per day).

You may have noticed too that a similarly named feature, Windows Server AppFabric Caching, exists for providing an analogous caching capability for on-premises applications. Although Windows Azure Caching shares a common core (and genesis from the project codenamed Velocity), there are some notable differences that you should be aware of when developing applications that run both on-premises and in the cloud.

Usage Scenario

Within the Azure Photo Mosaics application, I included a configuration option to enable Caching for the storage of the image library tiles. If you recall the flow of the application (below), when the user creates a photomosaic, one of the inputs is a library of images (say from Flickr, his or her own vacation pictures, or what have you) that have been stored in a Windows Azure blob container. Those images are raw images and not yet resized to the dimensions requested for the final image – that after all is another input variable - so the same base images might be used to generate one mosaic with 16x16 pixel tiles and then another with 32x32. Rather than store versions of the same tile for all the available tile sizes, the tiles are generated dynamically by the ImageProcessor Worker Role.

With the default implementation, each instance of the ImageProcessor Worker Role creates an in-memory “Image Library” that contains each of the tiles, resized from the original image in the selected image library. Although the in-memory implementation works fine for the application, there are a couple of drawbacks:

Scalability – since the entire image library is held in the RAM of the virtual machine hosting the given Worker Role, there’s an absolute limit of how large an image library can be. If storage requirements for the generated tiles exceed the RAM allocation, your only option is to scale the application up by selecting a larger VM size, say medium versus small doubling your RAM allocation to 3.5GB. You can only scale up so far, however. Recall that Windows Azure data centers house homogeneous, commodity hardware, so once you reach the largest option (currently extra-large with 14GB) there’s no where else to go!

Performance – each instance of an ImageProcessor Worker Role creates an in-memory tile library that it uses to generate a slice of the final image, and that complete library is re-created for each slice. So, for instance, if you generate a mosaic for a given image and specify you want it processed into six slices, then six tasks will be queued, and the processing for each task will involve recreating the image library. This seeming redundancy is required, since the application is multi-tenanted and stateless, so you cannot rely on the same instance of a worker role processing all of the slices for a given image.

Here, the ImageProcessor first consults the cache to see if a tile for the requested image library in the requested dimension exists. If so, it uses that cached image rather than regenerating it anew (and recalculating its average color). If the tile is not found in the cache, then it does have to retrieve the original image from the image library blob container and resize it to the requested dimensions. At that point it can be stored in the cache so the next request will have near-immediate access to it.

Creating a Cache

Creating a cache with the Windows Azure Management Portal is quite simple and straightforward. When you login to the portal, select the Service Bus, Access Control & Caching option on the left sidebar and then the Caching service at the top left. You’ll then get a list of the existing Caching namespaces:

The properties pane to the right shows information on existing caches, including the current size and peak sizes over the past month and year. To create a new cache, simple select the New option on the ribbon, and you’ll be prompted for four bits of information:

a namespace, which is the first part of the URL by which your cache is accessed. The URL is {namespace}.cache.windows.net, and so {namespace} must be unique across all of Windows Azure,

The region where your cache is located; you’ll pick one of the six Windows Azure data centers here,

The Windows Azure subscription that owns this cache,

The cache size desired, ranging from 128MB to 4GB in six discrete steps (multiples of two).

Configuring the Cache in Code

The easiest way to leverage a Windows Azure Cache in your code is via configuration. You can generate the necessary entries from the Windows Azure Management Portal (as shown below), and simply cut-and-paste the configuration into the web.config file of your Web Role or the app.config file of your Worker Role.

Trace.TraceWarning("Cache could not be instantiated: {0}", ex.Message)

theCache = Nothing

End Try

DataCache which is a reference to the cache itself, and includes the methods Add, Get, Put, and Remove methods to manipulate objects in the cache. Each of these methods deals with the cached item as a System.Object; if you want to retrieve the object as well as additional metadata like the timeout and version, you can use the GetCacheItem method to return a DataCacheItem instance.

Within the Azure Photo Mosaics application, the following code is used to retrieve the tile thumbnail images from the default cache (Line 5). If an image is not found in the cache, the thumbnail is generated (Line 18) and then stored in the cache (Lines 16-19) ready to service the next request.

If you do want to collect additional metrics on the cache utilization, consider overloading the Add, Put, Get, and other relevant methods of DataCache to maintain counters of utilization. In the Azure Photo Mosaics application, I added some simple properties to track cache hits and misses in each of the ImageProcessor Worker Roles:

The “requests” variables indicate the number of times an item was requested (tile thumbnail or color value), and “retrieves” indicate the number of times the item was retrieved from the original source (the same as a cache miss). Cache hits are calculated as requests – retrieves.

In the next post, I’ll continue on this theme of caching by comparing several implementations of the ImageProcessor component that use various approaches to caching.

Some FAQs about Windows Azure Caching

Can I control how long an item will be cached? By default, items expire in 48 hours. You cannot override the expiration policy for a cache (in Windows Azure); however, you can specify eviction times on an item by item basis when adding them to the cache. There is no guarantee an item will be cached for the duration requested, since memory pressure will always push out the least recently used item.

Can I clear the cache manually or programmatically? Windows Azure Caching does not provide this capability at this time.

How much does caching cost? The costs for caching are wholly based upon the size of the cache (but do not include data transfer rates out of the data center, if applicable). As of this writing (December 2011) the cost schedule is as follows:

Cache Size

Monthly Cost

128 MB

$45.00

256 MB

$55.00

512 MB

$75.00

1 GB

$110.00

2 GB

$180.00

4 GB

$325.00

Beyond cache size are there other constraints? Yes, each cache size designation comes with an associated amount of bandwidth, transaction, and connection limits. Since caching is a shared resource, it’s possible that your usage will be throttled to fall within the limitations listed below (current as of December 2011):