Remote BLOB Store Provider Library Implementation Specification

Summary: This is a specification to be used by
those creating a storage provider plug-in library for the SQL Server 2008
Remote BLOB Store feature.

Remote BLOB
Store (RBS) is designed to move the storage of large binary data (BLOBs) from
database servers to commodity storage solutions.

With RBS,
BLOB data is stored in storage solutions such as Content Addressable Stores
(CAS), commodity hardware with data integrity
and fault-tolerance systems, or mega service storage
solutions like MSN Blue. A reference to the BLOB is stored in the database. An
application stores and accesses BLOB data by calling into the RBS client
library. RBS manages the life cycle of the BLOB, such as doing garbage
collection as and when needed.

RBS is an
add-on that can be applied to Microsoft SQL Server 2008 and later. It uses
auxiliary tables, stored procedures, and an executable to provide its services.
A reference to the BLOB (provided by the BLOB Store) is stored in RBS auxiliary
tables and an RBS BLOB ID is generated. Applications store this RBS BLOB ID in
a column in application tables. These columns in application tables are called RBS Columns in this specification. The
RBS Column is not a new data type; it is just a simple binary(20).

RBS Provider Requirements

The
requirements of RBS are covered in Functional Description, later in this paper.
The requirements of RBS providers are listed here.

Goals of an RBS Provider

The main goal
of an RBS provider is to enable the use of a particular type of BLOB store
(called a target BLOB store) to store
RBS BLOB data.

Typically,
target BLOB stores offer large storage space at a low cost, including hardware
costs, maintenance, and expandability. The technical requirements and
recommendations for RBS providers are listed here.

Required

An RBS
provider must:

Provide
an implementation of the BlobStore
abstract class that uses the target BLOB store to store BLOB data. Honor the
semantics specified by RBS.

Allow
multiple instances of the provider (pointing to the same or different instances
of the target store, and using the same or different credentials) to be used
simultaneously from one or more client machines.

Recommended

An RBS
provider should optimally:

Allow
the use of the features of the target BLOB store through RBS interfaces and
configuration options wherever possible. It should also minimize the need for
custom configuration options to exploit features of the target BLOB store.

Implement
optional optimizations and capabilities if possible. These help improve
performance and provide extra functionality.

Guarantees Provided by RBS
Providers

Required

An RBS
provider must guarantee:

Link-level
consistency. This means that there are no dangling references─if the
provider gives out a StoreBlobId to
represent a newly stored BLOB, the BLOB can be accessed later using the same StoreBlobId as long as it is not
deleted.

That
the BLOB persists when a Store()
call returns. BLOB data and any metadata that the provider associates with a
BLOB must be persisted by the BLOB store before the call to store the BLOB
returns successfully. This means that if the BLOB store goes down because of a
power outage or other reason, after the successful completion of a Store() operation, the BLOB is
available after the BLOB store comes online.

Recommended

An RBS
provider should optimally guarantee:

BLOB
data immutability. This means that BLOB data cannot be changed after a BLOB is
stored initially. This guarantees that the data returned on reading a BLOB is
the same as the data that was given to the provider when the BLOB was
stored─no changes are allowed after that.

Deliverables

Each provider
must deliver the following pieces, together known as a Provider Pack:

Provider
library (set of managed DLLs and dependencies, such as native libraries)

Documentation

Sample
configuration files

Installer

Optional:
Provider source code if this is a sample provider

Functional Description

Overview and Component Descriptions

An RBS
provider consists of a managed library and, optionally, a set of native
libraries that communicate with the BLOB store. The basic components and their
interactions are as follows:

Application
– RBS Maintainer or an application that uses RBS, such as Microsoft SharePoint.

RBS
Client Library – In the case of applications other than RBS Maintainer, the
provider library is called by RBS client library and not the application
directly.

BLOB
Store – An entity which is used to store BLOB data. This can be a CAS storage
solution (such as EMC Centera or Microsoft SRS), SMB file server, a mega
storage service (such as MSN XStore) or even a SQL Server database.

Provider
Library – Managed library for implementing the BlobStore abstract class. This also referred to as the provider. It knows how to use the BLOB
store for storing BLOBs.

Native
Library for BLOB Store – Any libraries used by the provider library to
communicate with the BLOB store. This is optional.

Figure 1: Provider Architecture

Figure 2: Provider Architecture with
Native Library

Figure 3: Provider Architecture with RBS
Client Library

Sample Control Flow

Following is
a sample control flow for a simple operation.

The
application calls the provider library to perform an operation.

The
provider library calls into the native library to perform the operation.

Provider Abstract Class

RBS defines
an abstract class named BlobStore,
that must be inherited and implemented by provider writers. The reasons to use
an abstract class instead of an interface are as follows:

It
is easy to extend an abstract class in future versions without breaking
backward compatibility, which is not possible with interfaces. For example, in
an abstract class, new methods (with default implementations) can be added
without breaking compatibility with previous versions.

The
core function of the provider library is that it is an RBS provider, so it
makes sense to have it inherit an abstract class.

Some
common code that may be useful to many providers can be included in the
abstract class. The derived providers can chose to either use it or write their
own code.

Overview

Following is
an overview of the steps performed by the application (RBS maintainer or RBS
client library) on a provider library.

RBS
loads the provider library managed DLL and uses configuration information to
find the required class within that DLL that is derived from BlobStore.

RBS
gets information about the provider through configuration information that is
added to the machine-wide CLR configuration file when a provider library is
installed.

Using
this provider information, it associates zero or more BLOB stores with this
provider class.

When
a BLOB store associated with this provider class needs to be used, one object
of the class is instantiated.

The
object is initialized with information about the BLOB store.

Operations
(such as storing and fetching BLOBs, creating pools, and so on) are performed
using this object. The object may be cached for use later and operations may be
performed again after long pauses.

Dispose()
is called on the provider object and it is not used after that.

Multiple
instances of the same class can be used simultaneously to access the same or
different BLOB stores.

The next
section lists what must be implemented by the provider class. They are
discussed in groups.

Exceptions

BLOB store
providers are only expected to throw exceptions of type BlobStoreException. A valid exception code must be specified while
throwing an exception. Each operation has a set of expected exception codes.
Throwing any other exceptions or codes indicates a bug in the provider or that
exceptions have occurred outside the provider’s control. The valid exception
codes are:

AccessDenied.
The caller or application does not have permissions to perform the requested
action.

NoMoreSpace.
No more storage space is available on the BLOB store or pool.

PoolNotFound.
Specified pool does not exist on the BLOB store.

BlobNotFound.
Specified BLOB does not exist on the BLOB store or pool.

BlobIdAlreadyExists.
A BLOB with the specified StoreBlobId
already exists in the same pool, so a new one cannot be created.

BlobInUse.
A BLOB is currently being used, so it cannot be deleted or expunged.

ConfigurationDoesNotAllowOperation.
Current configuration of the BLOB store does not allow the requested operation.

OperationFailedAuthoritative

The
requested operation failed for a reason not included in other codes.

The
failure is authoritative - no part of the operation was performed.

OperationFailedMaybe

The
requested operation may have failed for a reason not included in other codes.

The
failure is not authoritative─all, some or no part of the operation may
have been performed.

NotImplemented.
The requested operation is not
implemented by this BLOB store provider.

Providers are
encouraged to include descriptive messages while throwing any exception.

Initialization

Constructor()

After
a provider class is picked for a store registered with RBS, an object of the
provider class is instantiated to use that store. RBS instantiates an object of
the provider class by using the empty constructor. Within this constructor, the
provider must call the base constructor (base()).

RBS
calls this method once on an object of the provider class before using it for
any operations.

Configuration
information is passed in the form of ConfigItemList
objects that contain multiple ConfigItem
objects. ConfigItems are explained
in the RBS Functional Description. They are essentially (key, value) pairs.
There is a pre-defined list of ConfigItems
that RBS client library defines. In addition, providers can define their own ConfigItems that are used for
provider-specific configuration.

CommonConfiguration contains configuration information
that is understood by the RBS client library. ConfigItems present in this are: StoreMajorVersion, StoreMinorVersion,
and StoreLocation. CoreConfiguration and ExtendedConfiguration contain
provider-specific configuration items associated with this BLOB store in the
RBS database. The core configuration consists of configuration information that
is required to access existing BLOBs in the back-end BLOB store. The extended
configuration consists of configuration information that is not needed to
access existing BLOBs, but is needed for other operations, such as create pool,
store BLOB, and so on. Extended configuration information is optional and may
not be present. This is because extended configuration information is not
included in BLOB Locators, which can be used to access a BLOB. BlobStoreCredentials is optional (it
may be null). If specified, the specified credentials should be used to connect
to the store.

Providers
are encouraged to check validity of the passed configuration items and
credentials and build internal structures as part of initialization. They may
optionally connect to the store as well.

Allowed
exception codes are: AccessDenied, ConfigurationMissing.

void Dispose()

This
is the opposite of Initialize(),
previously described and is called by RBS to indicate that the internal
structures, connections etc. can be cleaned up. An object will not be used
after Dispose() is called on it.

Pool Operations

Poll
operations are operations performed on pools. None of these operations are
performed by RBS in parallel (on multiple threads) on the same provider object.
For each operation, a list of expected exceptions is specified. If some
exception other than those specified is thrown, it indicates either a bug or
extraordinary circumstances.

byte[] storePoolId
CreatePool(ConfigItemList configuration)

This
creates a new pool on the BLOB store. A byte array representing the StorePoolId for that pool is returned.

If
the OptimizationSpecifiedIds
capability is TRUE, StorePoolId must
be less than or equal to 16 bytes.

This
method is called to start enumerating the list of BLOBs in a particular pool.
Since the number of BLOBs expected in a pool is very high, we need support for
paging─retrieving a few entries at a time. This method is called to set
up any context and internal structures do represent such an enumeration.

The
provider is free to create any type of object to store its enumeration state.
The object should then return the enumeration state from this method. RBS keeps
uses this object in subsequent method calls to enumerate BLOBs.

This
method must be implemented even if OptimizationSortedEnumeration
is TRUE.

This
method is called by RBS to get a sorted enumeration of BLOBs in a pool. The
provider is expected to return a ResumeObject
that can be used to enumerate BLOBs in that pool in sorted order of StoreBlobId. The enumeration should
return BLOBs belonging to that pool with (StoreBlobId
>= StartingStoreBlobId).
Comparison of BLOB IDs is a binary comparison of all the bytes of the ID. In
addition, all BLOBs returned should have a CreateTime
such that (CreateTime >= CreateTimeFilterStart) and (CreateTime <= CreateTimeFilterEnd). If CreateTimeFilterStart
or CreateTimeFilterEnd is set to DateTime.MinValue or DateTime.MaxValue respectively, the clause for that parameter should be
skipped (that clause is assumed to be satisfied). Both times are specified in
UTC.

This
method is equivalent to retrieving BLOB entries from a completely sorted list of BLOBs belonging to the specified pool,
starting at the lowest entry that satisfies (StoreBlobId >= StartingStoreBlobId).
For any two consecutive entries in the returned array B1 and B2, the following
conditions hold:

B1
< B2

CreateTimeFilterStart
<= B1 CreateTime <= CreateTimeFilterEnd

CreateTimeFilterStart
<= B2 CreateTime <= CreateTimeFilterEnd

There
is no BLOB Bk belonging to the specified pool such that (B1 < Bk < B2)
and (CreateTimeFilterStart <= Bk CreateTime <= CreateTimeFilterEnd)

This
method must be implemented if OptimizationSortedEnumeration
is TRUE.

This
method is called by RBS, specifying a ResumeObject
that was previously returned by the provider. The provider is expected to
return an array of BLOB entries belonging to that pool. MaxNum is the maximum number of entries to be returned from this
call.

Next
time this method is called, the provider should continue enumerating BLOBs in
the pool at the point where the current call stops. No BLOBs should be returned
twice and no BLOBs should be missed. Returning less than MaxNum number of entries indicates that there are no more BLOBs
left to enumerate.

BlobInformation includes the StoreBlobId, the CreateTime
of the BLOB (this should be the same value that was returned when the BLOB was
stored) and Length of the BLOB.

Allowed
exception codes are: OperationFailedAuthoritative.

void EndEnumerateBlobs(object
resumeHandle)

This
method is called to end enumerating BLOBs in a pool. The provider can clean up
any internal state related to this enumeration.

Allowed
exception codes are: None.

BLOB Operations

These are
operations that are performed on BLOBs within pools. These operations may be
performed by RBS in parallel (on multiple threads) on the same provider object.
So, they must be thread-safe. For each operation, a list of expected exceptions
is specified. If some exception other than those specified is thrown, it
indicates either a bug or extraordinary circumstances.

BlobStoreWriterStream
CreateNewBlob(byte[] storePoolId)

This
is the “Push” version of storing a BLOB─the provider is expected to
return a writable stream, into which RBS or the application writes data that
must be stored in the BLOB store. The BLOB should be stored in the specified
pool.

BlobStoreWriterStream is inherited from System.IO.Stream and has one additional
method: Commit(). When RBS calls Commit() on this object, the provider
should commit the BLOB on the back-end BLOB store and return the BlobInformation for the stored BLOB.
The stream cannot be used after that.

This
method must be implemented even if the OptimizationSpecifiedIds
capability is TRUE.

This
is the “Pull” version of storing a BLOB─a stream containing the data to
be stored is given. The BLOB should be stored in the specified pool.

The
specified stream supports reading (CanRead
is TRUE) and supports querying the Length
property. No other assumptions (including assumptions related to CanSeek) should be made about this
stream object.

This
method must be implemented even if the OptimizationSpecifiedIds
capability is TRUE.

This
is the “Pull” version of fetching a BLOB─the provider returns a readable
stream that contains the BLOB data.

The
returned stream object must allow reading and seeking (CanRead and CanSeek are
TRUE) and must support querying the Length property (correct length should be
returned). It must disallow writing (CanWrite
is FALSE).

BlobStoreWriterStream Class

BlobStoreWriterStream is inherited from System.IO.Stream and has one additional
method: Commit(). The members are
briefly outlined in the below table. Important methods are described after the
table.

Return
Value

Method

Write(buffer, offset,
count)

Optional

Get CanRead

Optional

Get CanSeek

TRUE

Get CanWrite

Get Length

Get Position

Optional

Set Position, Seek(),
SetLength()

Flush()

Close()

Optional

Read(buffer, offset,
count)

BlobInformation

Commit()

Table 1

void Close()

If
this method is called, the BLOB data should be discarded and the BLOB should
not be stored in the BLOB store.

BlobInformation Commit()

When
this method is called, the provider should ensure the BLOB is stored in the
BLOB store, and return the details of the BLOB (StoreBlobId, CreateTime,
and Length). This method should do
an implicit Close().

Supporting Objects

These objects
are all defined by the RBS client library infrastructure and are used by the
provider library. Details on each of these objects are in the RBS class library
documentation.

BlobInformation

Return
Value

Method

BlobInformation

Constructor()

BlobInformation

Constructor(StoreBlobId,
CreateTime, Length)

StoreBlobId

Get StoreBlobId

StoreBlobCreateTime

Get
StoreBlobCreateTime

BlobLength

Get BlobLength

Set StoreBlobId

Set
StoreBlobCreateTime

Set BlobLength

Table 2

BlobStoreCredentials

Return
Value

Method

BlobStoreCredentials

Constructor(Credentials)

BlobStoreCredentials

Constructor(Username,
Password)

Credentials

Get Credentials

Username

Get Username

Password

Get Password

Table 3

Data Types Used

Friendly
Name

C#
Type

StorePoolID

byte[]

StoreBlobID

byte[]

StoreBlobCreateTime

DateTime

BlobLength

long

InStream

Stream

OutStream

Stream

ConfigItem

ConfigItem

ConfigItemList

ConfigItemList

Config Value

string

Config Key

string

BlobInformation

BlobInformation

BlobStoreWriterStream

BlobStoreWriterStream

ResumeObject

Object

Table 4

Setup

As part of
setup for the provider library, Setup must register the DLL and class names to
be used by RBS client library. In addition, configuration information about the
provider needs to be registered. This is done through the machine-wide CLR xml
configuration file.

The different
pieces of information needed are described below. Helper classes present in the
RBS client library can be used to set this configuration during setup. Look at
the sample provider in the RBS SDK for examples on how to use these helper classes
to specify the xml elements.

BlobStoreType

Type:
string.

This
is a Unicode string of up to 128 characters. This uniquely identifies the type
of this provider. This is the same string that is used by applications and DBAs
in the BlobStoreType field when
configuring RBS BLOB stores for a database. Examples are “EMC Centera”,
“Microsoft SRS”. Provider writers are encouraged to start the type with the
name of the company so as to avoid collisions with other provider writers.

DllFile

Type:string.

This
specifies the path to locate the assembly in which the provider class is
present.

ClassName

Type:
string.

This
specifies the name of the class implementing the BlobStore abstract class within the specified assembly.

ProviderVersion

Type:
string.

These
fields indicate the version number for this provider class. The provider writer
is free to pick any non-negative values for these fields. It is expected that
these numbers increase over a period of time as new versions are released.

MinSupportedBackendStoreVersion

Type:
string.

These
fields indicate the minimum version number of the backend BLOB store that is
supported by this provider library.

ImplementedCommonBlobStoreSpecificationVersion

Type:
string.

These
fields indicate the version number of the RBS specification (RBS client library
and BlobStore abstract class) that
is implemented by this provider library. This means that the provider
understands and complies with all the requirements of the specified version of
RBS specification.

This
property is not used currently, but may be used in the future. Providers are
required to set this correctly.

ProviderSpecificConfigKey

This
describes ConfigItems that are
specific to this provider. Provider-specific configuration items can be used to
store configuration information about the back-end BLOB store. This
configuration is passed to the provider in the Initialize method.

Multiple
instances of this element are allowed. One such element needs to be specified
for each ConfigItem key that the
provider class understands (only provider-specific keys, not common keys
defined by RBS). It has the following fields:

name

Type:
string

Key
name of the provider-specific configuration item.

format

Type:
string

The
format of this configuration item, must be among: (Name, Boolean, Number,
Binary, Duration).

Provider/Store Version Picking Algorithm

RBS uses
standard four-part version numbers, that is, w.x.y.z where each of the terms is
progressively decreasing in significance.

The RBS
client library uses the above set of version numbers to determine which
provider libraries to use with which back-end BLOB stores. The algorithm used
is described below.

Build
a list of provider libraries available for each BlobStoreType.
Current_RbsVersion is the version of this RBS client library. Load all the
provider libraries available and for each provider class:

Add
this provider class with version {ProviderVersion} to the list of providers
available for type {BlobStoreType}. Maintain the list in sorted
order─descending order of {ProviderVersion}.

For
a BLOB store that is registered as an RBS BLOB store in the database, find a
suitable provider class. BackendStoreVersion is the version of the backend BLOB
store as specified in the database.

Find
the list of providers available for this {BlobStoreType}. Process each entry in
the list in order (highest version first):

If (MinSupportedBackendStoreVersion
> BackendStoreVersion) skip to the next entry. The store is too old for this
provider class.

Else pick this provider class for this
store.

This
specification should be used to guide the development of provider plug-in
libraries for the Remote BLOB Store feature of SQL Server 2008.