Announcing Storage Client Library 2.1 RTM & CTP for Windows Phone

We are pleased to announce that the storage client for .NET 2.1 has RTM’d. This release includes several notable features such as Async Task methods, IQueryable for Tables, buffer pooling support, and much more. In addition we are releasing the CTP of the storage client for Windows Phone 8. With the existing support for Windows Runtime clients can now leverage Windows Azure Storage via a consistent API surface across multiple windows platforms. As usual all of the source code is available via github (see resources section below). You can download the latest binaries via the following nuget Packages:

This remainder of this blog will cover some of the new features and scenarios in additional detail and provide supporting code samples. As always we appreciate your feedback, so please feel free to add comments below.

Fundamentals

For this release we focused heavily on fundamentals by dramatically expanding test coverage, and building an automated performance suite that let us benchmark performance behaviors across various high scale scenarios.

Performance

We are always looking for ways to improve the performance of client applications by improving the storage client itself and by exposing new features that better allow clients to optimize their applications. In this release we have done both and the results are dramatic.

For example, below are the results from one of the test scenarios we execute where a single XL VM round trips 30 256MB Blobs simultaneously (7.5 GB in total). As you can see there are dramatic improvements in both latency and CPU usage compared to SDK 1.7 (CPU drops almost 40% while latency is reduced by 16.5% for uploads and 23.2% for downloads). Additionally, you may note the actual latency improvements between 2.0.5.1 and 2.1 are only a few percentage points. This is because we have successfully removed the client out of the critical path resulting in an application that is now entirely dependent on the network. Further, while we have improved performance in this scenario CPU usage has dropped another 13% on average compared to SDK 2.0.5.1.

This is just one example of the performance improvements we have made, for more on performance as well as best practices please see the Tech Ed Presentation in the Resources section below.

Async Task Methods

Each public API now exposes an Async method that returns a task for a given operation. Additionally, these methods support pre-emptive cancellation via an overload which accepts a CancellationToken. If you are running under .NET 4.5, or using the Async Targeting Pack for .NET 4.0, you can easily leverage the async / await pattern when writing your applications against storage.

Buffer Pooling

For high scale applications, Buffer Pooling is a great strategy to allow clients to re-use existing buffers across many operations. In a managed environment such as .NET, this can dramatically reduce the number of cycles spent allocating and subsequently garbage collecting semi-long lived buffers.

To address this scenario each Service Client now exposes a BufferManager property of type IBufferManager. This property will allow clients to leverage a given buffer pool with any associated objects to that service client instance. For example, all CloudTable objects created via CloudTableClient.GetTableReference() would make use of the associated service clients BufferManager. The IBufferManager is patterned after the BufferManager in System.ServiceModel.dll to allow desktop clients to easily leverage an existing implementation provided by the framework. (Clients running on other platforms such as Windows Runtime or Windows Phone may implement a pool against the IBufferManager interface)

For desktop applications to leverage the built in BufferManager provided by the System.ServiceModel.dll a simple adapter is required:

using Microsoft.WindowsAzure.Storage;using System.ServiceModel.Channels;

Multi-Buffer Memory Stream

During the course of our performance investigations we have uncovered a few performance issues with the MemoryStream class provided in the BCL (specifically regarding Async operations, dynamic length behavior, and single byte operations). To address these issues we have implemented a new Multi-Buffer memory stream which provides consistent performance even when length of data is unknown. This class leverages the IBufferManager if one is provided by the client to utilize the buffer pool when allocating additional buffers. As a result, any operation on any service that potentially buffers data (Blob Streams, Table Operations, etc.) now consumes less CPU, and optimally uses a shared memory pool.

.NET MD5 is now default

Our performance testing highlighted a slight performance degradation when utilizing the FISMA compliant native MD5 implementation compared to the built in .NET implementation. As such, for this release the .NET MD5 is now used by default, any clients requiring FISMA compliance can re-enable it as shown below:

CloudStorageAccount.UseV1MD5 = false;

New Range Based Overloads

In 2.1 Blob upload API’s include an overload which allows clients to only upload a given range of the byte array or stream to the blob. This feature allows clients to avoid potentially pre-buffering data prior to uploading it to the storage service. Additionally, there are new download range API’s for both streams and byte arrays that allow efficient fault tolerant range downloads without the need to buffer any data on the client side.

Client Tracing

The 2.1 release implements .NET Tracing, allowing users to enable log information regarding request execution and REST requests (See below for a table of what information is logged). Additionally, Windows Azure Diagnostics provides a trace listener that can redirect client trace messages to the WADLogsTable if users wish to persist these traces to the cloud.

Logged Data

Each log line will include the following data:

Client Request ID: Per request ID that is specified by the user in OperationContext

Event: Free-form text

As part of each request the following data will be logged to make it easier to correlate client-side logs to server-side logs:

Request:

Request Uri

Response:

Request ID

HTTP status code

Trace Levels

Level

Events

Off

Nothing will be logged.

Error

If an exception cannot or will not be handled internally and will be thrown to the user; it will be logged as an error.

Warning

If an exception is caught and handled internally, it will be logged as a warning. Primary use case for this is the retry scenario, where an exception is not thrown back to the user to be able to retry. It can also happen in operations such as CreateIfNotExists, where we handle the 404 error silently.

Informational

The following info will be logged:

Right after the user calls a method to start an operation, request details such as URI and client request ID will be logged.

Important milestones such as Sending Request Start/End, Upload Data Start/End, Receive Response Start/End, Download Data Start/End will be logged to mark the timestamps.

Right after the headers are received, response details such as request ID and HTTP status code will be logged.

If an operation fails and the storage client decides to retry, the reason for that decision will be logged along with when the next retry is going to happen.

All client-side timeouts will be logged when storage client decides to abort a pending request.

Verbose

Following info will be logged:

String-to-sign for each request

Any extra details specific to operations (this is up to each operation to define and use)

Enabling Tracing

A key concept is the opt-in / opt-out model that the client provides to tracing. In typical applications it is customary to enable tracing at a given verbosity for a specific class. This works fine for many client applications, however for cloud applications that are executing at scale this approach may generate much more data than what is required by the user. As such we have provided the ability for clients to work in either an opt-in model for tracing which allows clients to configure listeners at a given verbosity, but only log specific requests if and when they choose. Essentially this design provides the ability for users to perform “vertical” logging across layers of the stack targeted at specific requests rather than “horizontal” logging which would record all traffic seen by a specific class or layer.

To enable tracing in .NET you must add a trace source for the storage client to the app.config and set the verbosity:

The application is now set to log all trace messages created by the storage client up to the Verbose level. However, if a client wishes to enable logging only for specific clients or requests they can further configure the default logging level in their application by setting OperationContext.DefaultLogLevel and then opt-in any specific requests via the OperationContext object:

With client side tracing used in conjunction with storage logging clients can now get a complete view of their application from both the client and server perspectives.

Blob Features

Blob Streams

In the 2.1 release, we improved blob streams that are created by OpenRead and OpenWrite APIs of CloudBlockBlob and CloudPageBlob. The write stream returned by OpenWrite can now upload much faster when the parallel upload functionality is enabled by keeping number of active writers at a certain level. Moreover, the return type is changed from a Stream to a new type named CloudBlobStream, which is derived from Stream. CloudBlobStream offers the following new APIs:

Flush already exists in Stream itself, so CloudBlobStream only adds asynchronous version. However, Commit is a completely new API that now allows the caller to commit before disposing the Stream. This allows much easier exception handling during commit and also the ability to commit asynchronously.

The read stream returned by OpenRead does not have a new type, but it now has true synchronous and asynchronous implementations. Clients can now get the stream synchronously via OpenRead or asynchronously using [Begin|End]OpenRead. Moreover, after the stream is opened, all synchronous calls such as querying the length or the Read API itself are truly synchronous, meaning that they do not call any asynchronous APIs internally.

Table Features

IgnorePropertyAttribute

When persisting POCO objects to Windows Azure Tables in some cases clients may wish to omit certain client only properties. In this release we are introducing the IgnorePropertyAttribute to allow clients an easy way to simply ignore a given property during serialization and de-serialization of an entity. The following snippet illustrates how to ignore my FirstName property of my entity via the IgnorePropertyAttribute:

Compiled Serializers

When working with POCO types previous releases of the SDK relied on reflection to discover all applicable properties for serialization / de-serialization at runtime. This process was both repetitive and expensive computationally. In 2.1 we are introducing support for Compiled Expressions which will allow the client to dynamically generate a LINQ expression at runtime for a given type. This allows the client to do the reflection process once and then compile a Lambda at runtime which can now handle all future read and writes of a given entity type. In performance micro-benchmarks this approach is roughly 40x faster than the reflection based approach computationally.

All compiled expressions for read and write are held in a static concurrent dictionaries on TableEntity. If you wish to disable this feature simply set TableEntity.DisableCompiledSerializers = true;

Serialize 3rd Party Objects

In some cases clients wish to serialize objects in which they do not control the source, for example framework objects or objects form 3rd party libraries. In previous releases clients were required to write custom serialization logic for each type they wished to serialize. In the 2.1 release we are exposing the core serialization and de-serialization logic for any CLR type. This allows clients to easily persist and read back entities objects for types that do not derive from TableEntity or implement the ITableEntity interface. This pattern can also be especially useful when exposing DTO types via a service as the client will longer be required to maintain two entity types and marshal between them.

A general purpose adapter pattern can be used which will allow clients to simply wrap an object instance in generic adapter which will handle serialization for a given type. The example below illustrates this pattern:

publicclass EntityAdapter<T> : ITableEntity where T : new(){ public EntityAdapter() {// If you would like to work with objects that do not have a default Ctor you can use (T)Activator.CreateInstance(typeof(T));this.InnerObject = new T(); }

/// <summary>/// Gets or sets the entity's current ETag. Set this value to '*' in order to blindly overwrite an entity as part of an update operation./// </summary>/// <value>The ETag of the entity.</value>publicstring ETag { get; set; }

Note, the Compiled Serializer functionality will be utilized for any types serialized or deserialized via TableEntity.[Read|Write]UserObject.

Table IQueryable

In 2.1 we are adding IQueryable support for the Table Service layer on desktop and phone. This will allow users to construct and execute queries via LINQ similar to WCF Data Services, however this implementation has been specifically optimized for Windows Azure Tables and NoSQL concepts. The snippet below illustrates constructing a query via the new IQueryable implementation:

The IQueryable implementation transparently handles continuations, and has support to add RequestOptions, OperationContext, and client side EntityResolvers directly into the expression tree. Additionally, since this makes use of existing infrastructure optimizations such as IBufferManager, Compiled Serializers, and Logging are fully supported.

Note, to support IQueryable projections the type constraint on TableQuery of ITableEntity, new() has been removed. Instead, any TableQuery objects not created via the new CloudTable.CreateQuery<T>() method will enforce this constraint at runtime.

Conceptual model

We are committed to backwards compatibility, as such we strive to make sure we introduce as few breaking changes as possible for existing clients. Therefore, in addition to supporting the new IQueryable mode of execution, we continue to support the 2.x “fluent” mode of constructing queries via the Where, Select, and Take methods. However, these modes are not strictly interoperable while constructing queries as they store data in different forms.

Aside from query construction, a key difference between the two modes is that the IQueryable interface requires that the query object be able to execute itself, as compared to the previous model of executing queries via a CloudTable object. A brief summary of these two modes of execution is listed below:

Fluent Mode (2.0.x)

Queries are created by directly calling a constructor

Queries are executed against a CloudTable object via ExecuteQuery[Segmented] methods

EntityResolver specified in execute overload

Fluent methods Where, Select, and Take are provided

IQueryable Mode (2.1+)

Queries are created by an associated table, i.e. CloudTable.CreateQuery<T>()

Queries are executed by enumerating the results, or by Execute[Segmented] methods on TableQuery

Note the three extension methods which allow a TableRequestOptions, an OperationContext, and an EntityResolver to be associated with a given query. These extensions are available by including a using statement for the Microsoft.WindowsAzure.Storage.Tables.Queryable namespace.

The extension .AsTableQuery() is also provided, however unlike the WCF implementation this is no longer mandatory, it simply allows clients more flexibility in query execution by providing additional methods for execution such as Task, APM, and segmented execution methods.

Projection

In traditional LINQ providers projection is handled via the select new keywords, which essentially performs two separate actions. The first is to analyze any properties that are accessed and send them to the server to allow it to only return desired columns, this is considered server side projection. The second is to construct a client side action which is executed for each returned entity, essentially instantiating and populating its properties with the data returned by the server, this is considered client side projection. In the implementation released in 2.1 we have allowed clients to separate these two different types of projections by allowing them to be specified separately in the expression tree. (Note, you can still use the traditional approach via select new if you prefer.)

Server Side Projection Syntax

For a simple scenario where you simply wish to filter the properties returned by the server a convenient helper is provided. This does not provide any client side projection functionality, it simply limits the properties returned by the service. Note, by default PartitionKey, RowKey, TimeStamp, and Etag are always requested to allow for subsequent updates to the resulting entity.

Client Side Projection Syntax with resolver

For scenarios where you wish to perform custom client processing during deserialization the EntityResolver is provided to allow the client to inspect the data prior to determining its type or return value. This essentially provides an open ended hook for clients to control deserialization in any way they wish. The example below performs both a server side and client side project, projecting into a concatenated string of the “FirstName” and “LastName” properties.

The EntityResolver can read the data directly off of the wire which avoids the step of de-serializing the data into the base entity type and then selecting out the final result from that “throw away” intermediate object. Since EntityResolver is a delegate type any client side projection logic can be implemented here (See the NoSQL section here for a more in depth example).

Type-Safe DynamicTableEntity Query Construction

The DynamicTableEntity type allows for clients to interact with schema-less data in a simple straightforward way via a dictionary of properties. However constructing type-safe queries against schema-less data presents a challenge when working with the IQueryable interface and LINQ in general as all queries must be of a given type which contains relevant type information for its properties. So for example, let’s say I have a table that has both customers and orders in it. Now if I wish to construct a query that filters on columns across both types of data I would need to create some dummy CustomerOrder super entity which contains the union of properties between the Customer and Order entities.

This is not ideal, and this is where the DynamicTableEntity comes in. The IQueryable implementation has provided a way to check for property access via the DynamicTableEntity Properties dictionary in order to provide for type-safe query construction. This allows the user to indicate to the client the property it wishes to filter against and its type. The sample below illustrates how to create a query of type DynamicTableEntity and construct a complex filter on different properties:

In the example above the IQueryable was smart enough to infer that the client is filtering on the “customerid” property as a string, and the “orderdate” as a DateTimeOffset and constructed the query accordingly.

Windows Phone Known Issue

The current CTP release contains a known issue where in some cases calling HttpWebRequest.Abort() may not result in the HttpWebRequest’s callback being called. As such, it is possible when cancelling an outstanding request the callback may be lost and the operation will not return. This issue will be addressed in a future release.

Summary

We are continuously making improvements to the developer experience for Windows Azure Storage and very much value your feedback. Please feel free to leave comments and questions below,

Could not load file or assembly 'Microsoft.WindowsAzure.Storage, Version=2.1.0.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)

The Task / Task<T> methods all use the Async postfix nomenclature. For example blobref.DeleteAsync(); This can be awaited via async/await if you like, or executed with standard tpl methods. If you do not see these *Async methods please confirm you have the 2.1 bits and not the 2.0.x.x package. For more code samples please reference the Task test methods in the GitHub repo above (for example github.com/…/CloudBlockBlobTest.cs)

When I try to use the IQueryable mode from within WinRT; e.g. TableQuery<ComplexEntity> query = (from ent in table.CreateQuery<ComplexEntity>()

I get a compile error, that

'Microsoft.WindowsAzure.Storage.Table.CloudTable' does not contain a definition for 'CreateQuery' and no extension method 'CreateQuery' accepting a first argument of type 'Microsoft.WindowsAzure.Storage.Table.CloudTable' could be found (are you missing a using directive or an assembly reference?)