This chapter discusses two main areas of concern when you build and deploy multi-tenant applications. The first is related to application lifecycle management (ALM) and covers topics such as testing, deployment, management, and monitoring. The second is related specifically to independent software vendors (ISVs) that are building multi-tenant applications, and discusses topics such as how to enable onboarding, customization, and billing for tenants and customers that use the application.

ALM Considerations for Multi-Tenant Applications

All applications require a consistent policy for their lifecycle to ensure that development, testing, deployment, and management are integrated into a reliable and repeatable process. This helps to ensure that applications work as expected, provide all the required features, and operate efficiently and reliably.

Jana Says:

Managing the application lifecycle for multi-tenant applications is usually more complex than that for other types of application because some tasks must be carried out on a per-tenant basis.

However, there are some additional considerations for multi-tenant applications. For example, you may need to implement more granular management, monitoring, and update procedures based around the separation for each tenant. This may include backing up individual tenant's data separately to minimize security concerns, or being able to update specific instances of the application that are reserved for some tenants.

Goals and Requirements

Tailspin’s ALM goals and requirements for the Surveys application encompass those that are applicable to most multi-tenant applications.

In terms of testability, there are two main areas that must be addressed: unit testing of application components both during and after development, and functional testing of the application or sections of it in a realistic runtime environment. Tailspin must address both of these areas by designing the application to maximize testability. The developers at Tailspin want to implement the application in a way that allows them to use mock objects to simplify unit tests. They also want to support testing for sections of the application, such as the background tasks carried out by worker roles, as well as being able to quickly and easily deploy the application with a test configuration to a staging platform for functional and acceptance testing.

Tailspin wants to be able to perform as much testing as possible using the local compute and storage emulators. Of course, Tailspin will still do a complete test pass in Windows Azure before the final deployment, but during development it is more convenient to test locally. This is one of the factors that Tailspin will consider when it makes technology choices. For example, if Tailspin uses Windows Azure Caching instead of Windows Azure Shared Caching, it can test the caching behavior of the application locally. However, if Tailspin chooses to use SQL Database federations then it cannot test that part of the application locally.

Markus Says:

You should check the latest documentation for both the compute and storage emulators to identify any differences in their behavior from the Windows Azure services.

In addition to the unit tests and functional testing, Tailspin wants to verify that the Surveys application will scale to meet higher levels of demand, and to determine what scale units it should use. For example, how many worker role instances and message queues should there be for every web role instance when Tailspin scales out the application. There is little point in performing stress testing using the local compute and storage emulators; you must deploy the application to the cloud in order to determine how it performs under realistic load conditions.

When testing is complete, Tailspin’s administrators want to be able to deploy the application in a way that minimizes the chance of error. This means that they must implement reliable and repeatable processes that apply the correct configuration settings and carry out the deployment automatically. They also want to be able to update the application while it is running, and roll back changes if something goes wrong.

After the application has been successfully deployed, the administrators and operators at Tailspin must be able to manage the application while it is running. This includes tasks such as backing up data, adjusting configuration settings, managing the number of instances of roles, handling requests for customization, and more. In a multi-tenant application, administrators and operators may also be responsible for all or part of the onboarding process for trials and new subscribers.

Finally, administrators and operators must be able to monitor the application to ensure that it is operating correctly, meeting its SLAs, and fulfilling business requirements. For a multi-tenant application, administrators will also want to be able to monitor both the operation and the runtime costs on a per-tenant basis. Where the application supports different levels of SLA or feature set for different types of tenants, monitoring requirements can become considerably more complicated. For example, heavy loading on specific instances will often require the deployment of additional instances to ensure that SLAs are met. Tasks such as this will typically require some kind of automation that combines the results of monitoring with the appropriate management actions.

Poe Says:

Procedures for managing and monitoring multi-tenant applications and the data they use must take into account the requirements of individual tenants to maintain security and to meet SLAs.

Overview of the Solution

This section describes the options Tailspin considered for testing, deploying, managing, and monitoring the Surveys application, and identifies their chosen solutions.

Testing Strategies

One of the advantages with a multi-tenant application compared to multiple different application implementations is that there is only a single code base. Every tenant runs the same core application code, and so there is only one application and one set of components to test.

However, because most multi-tenant applications support user customization through configuration, the test processes must exercise all of the configurable options. Unit tests, functional tests, and acceptance testing must include passes that exercise every possible combination of configurable options to ensure that all work correctly and do not conflict with each other.

If tenants can upload their own components or resources (such as style sheets and scripts) testing should encompass this to as wide an extent as possible. While it will not be possible to test every component or resource, the tests should ensure that these cannot affect execution of the core application code, or raise security issues by exposing data or functionality that should not be available.

When testing features that allow users to upload code, scripts, or style sheets your test process should resemble that of a malicious user by uploading items that intentionally attempt to access data and interfere with execution. While you cannot expect to cover every eventuality, this will help to expose possible areas where specific security measures must be applied.

In terms of test environment requirements, multi-tenant applications are generally no different from any other application. Developers and testers run unit tests on local computers and on the build server to validate individual parts of the application. This includes using the local Windows Azure compute and storage emulators on the development and test computers. The one area that may require additional test environment capacity is to provide separate resources, such as databases, to represent multiple tenant resources during testing.

Functional tests are run in a local test environment that mirrors the Windows Azure runtime environment as closely as possible; or in a staging environment on Windows Azure. Typically this will use a different subscription from the live environment to ensure that only administrators and responsible personnel have access to the keys needed for deployment and access to live services and resources such as databases.

Acceptance tests occur after final deployment, and the Windows Azure capability for rolling back changes through a virtual IP swap allow rapid reversion to a previous version should a failure occur. Acceptance testing must include testing the application from the user’s perspective, including applying any customizations that are supported.

Other types of tests such as performance, throughput, and stress testing are carried out as part of the functional tests, and throughout final testing, to ensure that the application can meet its SLAs.

Poe Says:

Stress testing should include verifying that any autoscaling rules add or remove sufficient resources to support the level of demand and control your costs.

Designing to Support Unit Testing

The Surveys application uses Windows Azure table and blob storage, and the developers at Tailspin were concerned about how this would affect their unit testing strategy. From a testing perspective, a unit test should focus on the behavior of a specific class and not on the interaction of that class with other components in the application. From the perspective of Windows Azure, any test that depends on Windows Azure storage requires complex setup and tear-down logic to make sure that the correct data is available for the test to run.

For both of these reasons, the developers at Tailspin designed the data access functionality in the Surveys application with testability in mind, and specifically to make it possible to run unit tests against their data store classes without a dependency on Windows Azure storage.

Markus Says:

The Surveys application uses the Unity Application Block to decouple its components and facilitate testing.

The solution adopted by the developers at Tailspin was to wrap the Windows Azure storage components in such a way as to facilitate replacing them with mock objects during unit tests, and use the Unity Application Block to instantiate them. A unit test should be able to instantiate a suitable mock storage component, use it for the duration of the test, and then discard it. Any integration tests can continue to use the original data access components to test the functionality of the application.

Unity is a lightweight, extensible dependency injection container that supports interception, constructor injection, property injection, and method call injection. You can use Unity in a variety of ways to help decouple the components of your applications, to maximize coherence in components, and to simplify design, implementation, testing, and administration of these applications. You can learn more about Unity and download the application block from “Unity Container.”

Tailspin also wanted to be able to separately unit test the background tasks implemented as Windows Azure worker roles. Tailspin’s developers created a generic worker role framework for the worker roles that makes it easy to add and update background task routines, and also supports unit testing.

The worker role framework Tailspin implemented allows individual jobs to override the PreRun, Run, and PostRun methods to set up, run, and tear down each job. The support in this framework for operators such as For, Do, and Every to execute tasks in a worker also makes it easy to write unit tests for jobs that will be processed by a worker role. Chapter 4, “Partitioning Multi-Tenant Applications,” describes the implementation of these operators, and the section “Testing Worker Roles” later in this chapter illustrates how they facilitate designing unit tests.

Stress Testing and Performance Tuning

Tailspin performed stress testing on the Surveys application running in the cloud in order to uncover any bottlenecks that limit the application’s scalability, and to understand how to scale out the application. Bottlenecks might include limits on the throughput that can be achieved with Windows Azure storage, limits on the amount of processing that the application can perform with given the available CPU and memory resources and the algorithms used by the application, and limits on the number of web requests that the web roles can handle.

After a stress test identified a bottleneck, the team at Tailspin evaluated the available options for removing the bottleneck, made a change to the application, and then re-ran the stress test to verify that the change had the expected effect on the application.

To perform the stress testing, Tailspin used Visual Studio Load Test running in Windows Azure to simulate different volumes of survey response submissions to the public Surveys web site. For more information about how to run load tests in Windows Azure roles that exercise another Windows Azure application, see Using Visual Studio Load Tests in Windows Azure Roles on MSDN.

Application Deployment and Update Strategies

Multi-tenant application deployment and updating follows the same process as other types of applications. There should be only one core code package for the application because all customization for individual tenants should be accomplished through just configuration settings for each tenant. These configuration settings should ideally be stored in a separate location, such as a database or Windows Azure storage, and not in the service configuration files. This removes the requirement to upload different versions of the application.

Markus Says:

Although you can modify settings in your Windows Azure application’s service configuration file (.cscfg) on the fly, you cannot add new settings without redeploying the role. In a multi-tenant application, you typically require new settings for each tenant. Therefore, Tailspin chose to store tenant configuration data in blob storage and read it from there whenever the application needs it.

Where tenants have the ability to upload additional resources, such as style sheets and logos, these resources should be stored outside of the application. Therefore, deploying a new application or updating an existing one will not impact individual tenant's resources. The updated application will continue to read configuration settings from the central configuration store and access the tenant's resources from the central location where they are stored.

To provide a reliable and repeatable deployment and update experience, Tailspin uses scripts that run as part of the build process when deploying to the test environment, and separate scripts accessible only to administrators that are executed to deploy or update the application in the live runtime environment. This prevents the possibility of errors that could occur when using the Windows Azure Management Portal.

The scripts modify the settings in the Web.config file that cannot be stored in the service configuration files, such as the settings for Windows Identity Foundation (WIF) authentication. The scripts also accept a parameter that defines which of the service configuration files will be uploaded to Windows Azure during deployment or update. There are separate service configuration files in the source code project for use when deploying to the local and cloud test environments.

Application Management Strategies

Windows Azure incorporates several features that are useful for managing applications after deployment. These include the Windows Azure Management Portal, the Windows Azure Management API, the Windows Azure PowerShell cmdlets, and many Microsoft and third party tools and services.

Administrators can use the Windows Azure Management Portal to modify configuration settings in the service configuration file while the application is deployed; and to stop and start roles, change the number of instances, and see basic runtime information about roles. All of these tasks can also be accomplished by using the Windows Azure Management API, and administrators can use a series of PowerShell cmdlets that provide a wide range of methods for interacting with the API.

Note:

You can obtain the Windows Azure PowerShell cmdlets from the Windows Azure Download Page.

Tailspin uses the Windows Azure PowerShell cmdlets to interact with the Windows Azure Management API for almost all management tasks. This provides a reliable and repeatable process for common tasks that administrators must carry out. However, for some tasks administrators will use the Windows Azure Management Portal—particularly for tasks that are not carried out very often.

Reliability and Availability

One of the major concerns for administrators and operators who manage the application is to ensure that it is available at all times and is meeting its SLAs. It is very difficult to estimate the workload for a multi-tenant application because the tenants are unlikely to provide detailed estimates of usage, and the peaks can occur at various times if users are located in many different time zones.

Tailspin realized that one of the core factors for meeting SLAs would be to ensure sufficient instances of the application are running at all times, while minimizing costs by removing instances when not required. To achieve this Tailspin will incorporate the Enterprise Library Autoscaling Application Block into the application to automatically add or remove role instances based on changes in average load and usage of the application.

Poe Says:

Tailspin plans to keep detailed usage information that it can analyze for trends. If Tailspin can identify times of the day, week, month, or year when usage is regularly higher or lower it can preemptively add or remove resources. The Autoscaling Application Block enables Tailspin to perform this type of autoscaling in addition to reactive scaling based on average load or usage.

To achieve a base level of reliability Tailspin always ensures that a minimum of two instances of each role are deployed so that failures, or reorganization of resources within the Windows Azure datacenter, will not prevent users from accessing the application.

Backup and Restore for Data

The most significant changes to administrative and management tasks for a multi-tenant application when compared to a standard business application are concerned with the processes used to back up and restore data. Each tenant’s data will typically be isolated from all others through the use of a separate database, table, partition, blob container, or storage account. It is vital that this isolation is not compromised during the backup and restore process.

If the data is held in separate databases, the backup procedures can simply address each database in turn, and store the backup in separate files or blobs. However, these files or blobs must also be securely stored so that only responsible staff and the appropriate tenant can access them. A tenant must obviously not be able to access a backup that contains other tenants’ data.

If tenants’ data is stored in separate Windows Azure subscriptions you must consider whether your responsibility includes backup and restore processes. One of the reasons that tenants may want to use their own subscription and account for their data storage is to maximize security of the data or to meet regulatory limitations (such as on the location or storage of the data). In most cases, the tenant should be responsible for backing up and restoring this data.

Bharath Says:

Subscribers who export their survey data to SQL Database can use the SQL Database Import/Export Service to back up their data. This service enables you to export your data to blob storage and then optionally download it to an on-premises location for safekeeping.

If all of the tenants’ data is held in shared storage, such as in a shared database or in a single Windows Azure storage account, specific care must be taken when designing the backup and restore processes. While it is possible to create a single backup of the database or storage account, doing so means that administrators must exercise extra care in storing the backup and when restoring some or all tenants’ data. One solution may be to offer the capability in the application for tenants to create a backup containing just their own data on demand, and allow them to store it in a location of their choosing.

In the Surveys application, data for each tenant is stored in Windows Azure tables. Tailspin will implement a mechanism that allows tenants to back up the data for each tenant separately, and store each tenant’s backup in a separate Windows Azure storage blob to maintain isolation. For examples illustrating how to backup Windows Azure table storage, see “Table Storage Backup & Restore for Windows Azure” on CodePlex, and the blog post “Protecting Your Tables Against Application Errors.”

There are several other management-related concerns particularly applicable to ISVs and software vendors, rather than to the development of in-house applications and services. These include how the application supports onboarding and configuration for subscribers, per tenant customization, and financial goals. These topics are discussed in more detail in the section “ISV Considerations for Multi-Tenant Applications” later in this chapter.

Application Monitoring Strategies

Windows Azure incorporates several features that are useful for monitoring applications. These include the Windows Azure Management Portal, the Windows Azure Management API, the Windows Azure diagnostics mechanism, and many tools and services available from Microsoft and from third parties. For example, Microsoft System Center can be used to monitor a Windows Azure application and raise alerts when significant events occur.

Developers should make use of the Windows Azure diagnostics mechanism to generate error and trace messages within the application code. In addition, administrators can configure Windows Azure diagnostics to record operating system events and logs, and other useful information. All of the monitoring information collected by the diagnostics mechanism can be accessed using a range of tools to view the Windows Azure tables and blobs where this data is stored, or by using scripts to download the data for further analysis.

Bharath Says:

In a multi-tenant application you must take special care when dealing with log files because diagnostic data can include tenant specific data. If you allow tenants to see log files, perhaps for troubleshooting purposes, you must be sure that diagnostic data is not shared accidentally with the wrong tenant. You must either keep separate logs for each tenant, or be sure that you can filter logs by tenant before sharing them.

Where tenants can upload resources to customize the application, administrators can take advantage of Windows Azure endpoint protection to guard against the occurrence of malicious code such as viruses and Trojans finding their way onto the server. You can install endpoint protection into each web and worker role instance in your application and then configure Windows Azure diagnostics to read error and warning messages from the Microsoft Antimalware source in the system event log.

Note:

See “Microsoft Endpoint Protection for Windows Azure” on the Microsoft download site for more information. On this page you can also download the document “Monitoring Microsoft Endpoint Protection for Windows Azure,” which describes how to collect diagnostic data from Windows Azure endpoint protection.

Tailspin implements code in the application that writes events to Windows Azure diagnostics by using a custom helper class and the Windows Azure Diagnostics listener. This includes a range of events and covers common error situations within the application. A configuration setting in the service configuration file controls the level of event logging, allowing administrators to turn on extended logging when debugging the application and turn it off during ordinary runtime scenarios.

Inside the Implementation

Now is a good time to walk through some of the code in the Tailspin Surveys application in more detail. As you go through this section, you may want to download the Visual Studio solution for the Tailspin Surveys application from http://wag.codeplex.com/.

Unit Testing

Tailspin designed many classes of the Surveys application to support unit testing by taking advantage of the dependency injection design pattern. This allows mock objects to be used when testing individual classes without requiring the complex setup and teardown processes often needed to use the real objects.

For example, this section describes how the design of the Surveys application supports unit testing of the SurveyStore class that provides access to Windows Azure table storage. This description focuses on tests for one specific class, but the application uses the same approach with other store classes.

The following code example shows the IAzureTable interface and the AzureTable class that are at the heart of the implementation.

The Add method that takes an IEnumerable parameter should check the number of items in the batch and the size of the payload before calling the SaveChanges method with the SaveChangesOptions.Batch option. For more information about batches and Windows Azure table storage, see “Performing Entity Group Transactions” on MSDN.

The generic interface and class have a type parameter T that derives from the Windows Azure TableServiceEntity type you use to create your own table types. For example, in the Surveys application the SurveyRow and QuestionRow types derive from the TableServiceEntity class. The IAzureTable interface defines several operations: the Query method returns an IQueryable collection of the type T, and the Add, AddOrUpdate, and Delete methods each take a parameter of type T. In the AzureTable class the Query method returns a TableServiceQuery object, the Add and AddOrUpdate methods save the object to table storage, and the Delete method deletes the object from table storage.

To create a mock object for unit testing, you must instantiate an object that implements the interface type IAzureTable. The following code example from the SurveyStore class shows the constructor. Because the constructor takes parameters of type IAzureTable, you can pass in either real or mock objects that implement this interface.

This parameterized constructor is invoked in two different scenarios. The Surveys application invokes it indirectly when the application uses the SurveysController MVC class. The application uses the Unity dependency injection framework to instantiate MVC controllers. The Surveys application replaces the standard MVC controller factory with the UnityControllerFactory class in the OnStart method in both web roles, so when the application requires a new MVC controller instance Unity is responsible for instantiating that controller. The following code example shows part of the ContainerBootstrapper class from the TailSpin.Web project that the Unity container uses to determine how to instantiate objects.

When the application requires a new MVC controller instance, Unity is responsible for creating the controller. The constructor that Unity invokes to create a SurveysController instance takes a number of parameters including a SurveyStore object. The third call to the RegisterType method in the previous sample defines how Unity instantiates a SurveyStore object to pass to the SurveysController constructor. The first two calls to the RegisterType method in the previous sample define the rules that tell the Unity container how to instantiate the two IAzureTable instances that it must pass to the SurveyStore constructor shown earlier.

Markus Says:

To see how the web role uses Unity when it instantiates MVC controllers, examine the code in the Global.asax file that creates a UnityControllerFactory instance.

In the second usage scenario for the parameterized SurveyStore constructor, you create unit tests for the SurveyStore class by directly invoking the constructor and passing in mock objects created using the Moq mocking library. The following code example shows a unit test method that uses the constructor in this way.

The test creates a mock IAzureTable<SurveyRow> instance, uses it to instantiate a SurveyStore object, invokes the GetSurveyByTenantAndSlugName method, and checks the result. It performs this test without touching Windows Azure table storage.

The Surveys application uses a similar approach to enable unit testing of the other store components that use Windows Azure blob and table storage.

Testing Worker Roles

Tailspin also considered how to implement background tests in worker roles so as to minimize the effort required for unit testing. The implementation of the “plumbing” code in the worker role, and the use of Unity, makes it possible to run unit tests on the worker role components using mock objects instead of Windows Azure queues and blobs. The following code from the BatchProcessingQueueHandlerFixture class shows two example unit tests.

The ForCreateHandlerForGivenQueue unit test verifies that the static For method instantiates a BatchProcessingQueueHandler correctly by using a mock queue. The DoRunsGivenCommandForEachMessage unit test verifies that the Do method causes the command to be executed against every message in the queue by using mock queue and command objects.

Testing Multi-Tenant Features and Tenant Isolation

The developers at Tailspin included tests to verify that the application preserves the isolation of tenants. The following code sample shows a test in the SurveysControllerFixture class that verifies that the private tenant web site uses the correct tenant details when a tenant chooses to export survey data to a SQL Database instance.

Performance and Stress Testing

The test team at Tailspin conducted a set of high volume stress tests in order to determine the expected throughput with a given number of role and queue instances, and to understand how to scale the application to meet higher levels of demand. This section focuses on the specific results of stress testing the Surveys application. However, most of the factors will be relevant to the majority of Windows Azure applications.

Markus Says:

The results we got from our stress tests may be specific to the Surveys application, but the factors involved and our solutions are likely to be relevant to the majority of Windows Azure applications.

During the stress testing exercise, the team identified a number of issues with the code that limited the scalability of the application and, as a result, the developers proposed a number of changes to overcome these limitations.

Optimistic and Pessimistic Concurrency Control

The application saves the survey summary statistics data and the list of survey responses to blob storage. A worker role collects the data, processes it, and writes it back to blob storage. When more than one worker role instance is running they could try to write to the same blob simultaneously, and so the application must use either an optimistic or a pessimistic approach to managing concurrent access issues.

Markus Says:

Often, the only way you can make a sensible choice between optimistic and pessimistic concurrency is by testing the application to the limits, counting failures, and measuring actual performance under realistic conditions with realistic data.

As part of the stress testing, Tailspin evaluated both optimistic and pessimistic concurrency approaches when the application writes to these blobs to determine which approach enabled the highest throughput. With a heavily loaded system, and running three worker role instances, the test team saw approximately one optimistic concurrency exception per 2,000 saved survey responses. Therefore, Tailspin decided to use the optimistic concurrency approach when the application writes to these blobs.

Maintaining a List of Survey Answers

To support paging through survey answers in the order they were received by the system, and exporting to a SQL Database instance, the application maintains a list of survey responses for each survey in a blob. Chapter 3, “Choosing a Multi-Tenant Data Architecture,” describes this mechanism in detail.

However, stress testing revealed that this can lead to a bottleneck in the system as the number of survey responses for a survey grows. Every time the system saves a new set of survey responses, it must read the whole list of existing responses from blob storage, append the new answers to the list, and then save the list back to blob storage.

The developers at Tailspin plan to address this problem by introducing a paging mechanism, so that it uses multiple blobs to store the list of survey responses for each survey. Each blob will hold a list of survey responses, but once the list reaches a certain size the application will create a new list. In this way, the size of the list that the application is currently writing to will never grow beyond a fixed size.

This will also require some changes in the logic that enables paging through survey responses in the UI and reading survey responses for export to SQL Database.

Azure Queues Throughput

According to the information in the post “Windows Azure Storage Abstractions and their Scalability Targets” on the Windows Azure Storage Team blog, a Windows Azure queue has a performance target of processing 500 messages per second. The Tailspin Surveys application uses two queues to deliver survey responses from the public web site to the worker role for processing (one for responses to surveys published by tenants with a standard subscription, and one for responses to surveys published by tenants with a premium subscription). It’s possible, with a high volume of users responding to surveys, that the number of messages that these queues need to process could exceed 500 per second.

Markus Says:

Sometimes performance bottlenecks aren’t the fault of your bad code, they are limitations of services or systems you rely on. In this case you must either live with the limits, or redesign your code to find a workaround. But take care that the additional complexity you introduce does not have a greater impact on your application’s performance than the limitation you originally encountered.

Tailspin plans to partition these queues, and modify the application to work with multiple instances of these queues in order to support higher rates of throughput. For example, the web role could use a round-robin approach to write messages to the multiple queue instances in turn and the worker role could use a separate thread to handle each of the queue instances. However, care is required in designing this kind of feature to ensure an appropriate number of queue instances are available when you scale the application (either manually or automatically) and the number of role instances changes.

Synchronous and Asynchronous Calls to Windows Azure Storage

The stress tests indicated that synchronously writing first to blob storage and then synchronously posting a message to a queue took up a significant portion of execution time in the web role. Typically, you can improve the throughput when you write to Windows Azure storage by using asynchronous calls to avoid blocking the application while the I/O operation completes. For example, if you need to write to storage and send a message to a queue you can initiate both operations asynchronously.

However, there are some issues that would make it difficult to convert these into asynchronous write operations in the Surveys application. For example, the web role must finish writing a survey response to blob storage before it sends a message to the queue that instructs the worker role to process it. Performing the writes to blob storage and the queue concurrently by using asynchronous code could result in errors if writing to the blob fails, or if the message arrives in the worker role before the web role finishes writing the survey response to storage.

Markus Says:

Just because you can do things asynchronously and concurrently doesn’t always mean that you should. Some processes in an application need to be performed in a predetermined or controlled order, or must finish before the next task starts. This is particularly the case if you need to check for an error before starting the next process.

Tailspin also considered whether it should use asynchronous calls when the application saves summary statistics and answer lists to blob storage. These write operations take place as part of the processing cycle in the worker role that consists of reading the blob data, making changes to that blob data, and then writing the data back to blob storage.

The application uses an optimistic concurrency approach that checks the data in the blob hasn’t changed between the time it was read and the time that the application attempts to write it back. If the application used an asynchronous call to write the data back to blob storage, it’s possible that the read operation in the next cycle will start before the previous write operation is complete—increasing the likelihood of an optimistic concurrency exception occurring.

Tailspin decided not to use asynchronous calls when the application writes summary statistics data and survey answer response lists to blob storage.

Additional Performance Tuning Options

Further performance tuning options that Tailspin will consider and test include:

Managing the Surveys Application

Tailspin stores all the configuration data used to manage tenants of the Surveys application in blob storage. The private web site, defined in the Tailspin.Web project, includes a set of pages that are only available to Tailspin administrators for managing Tailspin Surveys tenants.

The sample application currently enables Tailspin administrators to add new tenants and update the details of existing tenants. It does not currently enable administrators to delete tenants.

The “Subscribers list” screen shows the Tailspin administrator a list of the current tenants in the Tailspin Surveys application. A Tailspin administrator can edit the details of existing subscribers from the subscribers list screen and add a new subscriber on the “Add a new subscriber” screen.

Tailspin plans to implement a process to enable administrators to remove a subscriber. A Delete hyperlink on the subscribers list screen will trigger this process, and must perform the following steps:

Delete the tenant blob that contains the subscriber’s configuration data from the tenants blob container.

Delete all of the subscriber’s survey questions from the Questions table and survey headers from the Surveys table. In the case of the Surveys table, each subscriber’s surveys are stored on a separate partition. In the case of the Questions table, the partition key is a combination of the subscriber name and survey name: the delete process must find all of the partitions where the partition key starts with the subscriber’s ID.

Delete all the blob containers that contain the subscriber’s survey answers (every survey has its own blob container for storing survey responses). The subscriber’s ID is part of the container name.

Delete all the blobs in the surveyanswerssummaries and surveyanswerslists blob containers that belong to the subscriber (every survey will have its own blob in each of these containers). The subscriber’s ID is part of the blob names.

Delete any data used for customizing the subscriber’s surveys such as logos in the logos blob container.

If the subscription includes a SQL Database, de-provision the database.

Delete the subscriber’s configuration data and survey definitions from cache.

If the subscriber uses the Tailspin identity provider, delete any accounts belonging to the subscriber from the store used by the identity provider.

Some of the actions in the previous list can be performed quickly and Tailspin plans to perform these actions synchronously when the administrator has confirmed that the subscriber must be deleted. These actions are to delete the cached data, to delete the data from the Surveys table, and to delete the subscriber’s configuration data from the tenants blob container. When these items have been deleted the subscriber will not be able to access the private tenant site, and the subscriber’s surveys will not be listed on the public site.

Tailspin can delete a subscriber’s configuration data from blob storage quickly because the subscriber’s ID is the name of the blob, it can delete the entries from the Surveys table quickly because all the subscriber’s surveys are stored in the same partition, and it can delete the cached data quickly because the application uses a separate cache region for each tenant.

The remaining actions, which may take longer to perform, can be performed asynchronously. Deleting a subscriber’s entries in the Questions table may take time because the entries span multiple partitions and therefore the process must scan then entire table to locate all the entries to delete. Deleting the subscriber’s blobs from the surveyanswerssummaries and surveyanswerslists blob containers may take time because the process must iterate over all the blobs in the container to identify which ones belong to the subscriber.

Tailspin can quickly delete some subscriber data and disable access for that subscriber. It can delete all of the remaining data later to free up storage space.

Monitoring the Surveys Application

Tailspin uses Windows Azure diagnostics to collect information from the Surveys application at runtime. Tailspin administrators can then monitor these log files for any unexpected events or behavior. For example, the administrators can monitor the messages from the Transient Fault Handling Application Block to identify if there are any changes in Windows Azure that are affecting how the application is using Windows Azure storage or SQL Database. These types of retries will happen from time to time, which is why Tailspin uses the Transient Fault Handling Application Block. However, if the administrators see a large number of retries occurring they can take steps to investigate the status of the Windows Azure services or other dependent services.

The AzureTable, AzureQueue, and AzureBlobContainer classes in the application all inherit from the AzureObjectWithRetryPolicyFactory class that specifies the message that the application writes to the Windows Azure logs when the block detects a transient fault. The following code sample shows the AzureObjectWithRetryPolicyFactory class.

ISV Considerations for Multi-Tenant Applications

Questions such as how to handle the onboarding process for new subscribers, how to manage per user customization, and how to implement billing are relevant to both single tenant and multi-tenant architectures. However, they require some special consideration in a multi-tenant model.

Goals and Requirements

Tailspin’s goals and requirements for supporting tenants and customers that pay to use the Surveys application encompass those that are applicable to most multi-tenant applications created by ISVs.

When a new subscriber signs up for a multi-tenant application, the application must undergo configuration and other changes to enable the new account. The onboarding process must typically be automated, and it touches many components of the application. Tailspin wants to automate as much of this process as possible to simplify the onboarding process for new subscribers, and to minimize the costs associated with setting up a new subscriber.

Markus Says:

The onboarding process touches many components in your applications.

It is common for ISVs to offer different levels of subscription, such as standard and premium subscriptions, which may vary in terms of functionality, support, and service level (for example, guaranteed availability and response times). This can make both the onboarding and the daily operation more complex to manage. Tailspin intends to offer different levels of service, and so must consider how this will affect the design of the application.

Another common feature of multi-tenant applications is enabling subscribers to customize parts of the application for their customers, such as the appearance of the UI or the availability of specific features and capabilities. The amount of customization required will vary for different scenarios and different types of application, and it is another factor that can have a large impact on the complexity of designing and managing multi-tenant applications. Tailspin intends to offer some levels of UI customization to tenants, but will limit this to simple changes such as style sheets and logos. Tailspin also wants to enable premium subscribers to add metadata, such as a product ID or an owner, to survey definitions. Premium subscribers will be able to use this contextual data as links to other data within their own systems

Poe Says:

ISVs will typically want to allow tenants to customize the application, but this can add complexity to the solution and may increase security concerns if not properly controlled.

Finally, ISVs will need to be able to bill tenants based on their usage of the application. While Windows Azure does provide billing information for an application, calculating the costs for each tenant is less easy to achieve. Tailspin wants to be able to bill tenants at different rates based on both usage and the type of subscription that tenant has.

Overview of the Solution

This section describes the options Tailspin considered for managing individual tenants in the Surveys application, and identifies the solutions Tailspin chose.

Onboarding for Trials and New Subscribers

For Tailspin, the key issue related to onboarding is how much of the process should it automate. Building a system that handles self-service sign up is complex, but it does make it easier for potential subscribers to try out the system. The self-service onboarding process must include a number of steps, including the following:

Validate the tenant. Tailspin must ensure that paying subscribers have a valid payment method such as a credit card.

Create any tenant specific configuration settings. It should be possible to create (and change) tenant configuration values without restarting any part of the application. For Tailspin Surveys, tenant configuration values are stored in Windows Azure blob storage using one blob per tenant. This includes all of the information that the Tailspin federation provider needs to establish a trust relationship with the tenant’s identity provider. If the tenant has chosen to use the Tailspin identity provider, the application will also need to add user accounts to the membership database. In addition, the Surveys application will use the tenant configuration data when it adds tenant identifiers to data collected at runtime by logging mechanisms, and when it performs any tenant specific backup operations.

Provision any tenant specific resources. Tenants with premium subscriptions can choose to have their own SQL Database server to store their exported data. The SQL Database Management REST API enables you to create server instances. If you need to provision any other Windows Azure resources, such as storage accounts or cloud services, you can use the Windows Azure Service Management API.

Notify Tailspin administrators of any additional steps that must be completed on behalf of the tenant. Tailspin does not anticipate the need for any manual steps for its administrators as part of the onboarding process.

Notify the subscriber of any additional steps that it must take. For example, Tailspin Surveys subscribers can use a custom DNS name to access their surveys.

Notify the subscriber of any applicable terms and conditions including the SLA for the subscription type.

Configuring Subscribers

Tailspin chose to store all of the configuration data for each tenant in Windows Azure blob storage. Tailspin uses one blob per tenant and uses the JSON serializer to write the Tenant object to the blob. Almost all of the tenant configuration data is stored in this way, making it easy for Tailspin to manage the details of its subscribers. The only exceptions to storing tenant configuration data in blobs in the tenants blob container are that tenant logos are stored in the logos blob container, and those tenants who use the Tailspin identity provider store their users account details in the identity provider’s membership database.

Supporting Per Tenant Customization

Tailspin Surveys includes three ways that subscribers can customize the application.

Each tenant can customize the UI seen by survey respondents to add tenant specific branding. Initially, each tenant will be able to upload a logo that displays on every survey page. Tailspin also plans to enable tenants to use CSS style sheets to further customize the UI. The application enables this UI customization by allowing subscribers to upload the necessary files to Windows Azure blob storage. Enabling support for custom CSS style sheets is more complex than for logos because a poorly designed style sheet could make the surveys unreadable; Tailspin plans to develop some validation and filtering functionality to minimize this risk.

Markus Says:

We limit the types of custom CSS style selectors we accept to prevent the UI from being rendered unusable, and to protect the application from malicious attack or other unexpected side effects.

Premium tenants can add their own custom metadata to their surveys to enable linking with their own applications and services. The application uses a custom schema for each tenant to store this additional data in table storage. It also uses a custom assembly for each tenant that takes advantage of this feature, which enables the tenant to save and view this custom data in the private tenant web site. For more information about how Tailspin implemented this feature see the section “Accessing Custom Data Associated with a Survey” in Chapter 3, “Choosing a Multi-Tenant Data Architecture.”

Subscribers can also customize how to authenticate with Tailspin Surveys. They can choose to use their own identity provider, Tailspin’s identity provider, or a third party identity provider. This configuration data is stored in the tenant blob. For more information about how the different authentication schemes work see Chapter 6, “Securing Multi-Tenant Applications.”

Financial Goals and Billing Subscribers

Tailspin developed the Surveys application as a commercial service from which it hopes to make a profit. The revenue from the application will come from tenants who sign up for one of the paid services. The costs can be broken down into the following categories:

Tailspin incurred costs during the project to develop the Surveys application. These costs included developer salaries, software licenses, hardware, and training.

Tailspin incurs running costs. Windows Azure bills Tailspin monthly for the resources it consumes, such as web and worker role usage, data transfer, and data storage.

The costs associated with the first two categories may be difficult to identify, especially because some of the items may be associated with other projects and applications; for example, an administrator may be responsible for multiple applications. The costs in the third category are very easy for Tailspin to identify from the monthly billing statements. If the application consumes a significant quantity of Windows Azure resources, these running costs may be the most significant costs associated with the application.

The revenue that Tailspin receives from its tenants should be sufficient to generate a suitable return on investment, enabling Tailspin to recoup its initial investment costs and generate a surplus.

Tailspin evaluated two alternative pricing strategies for the Tailspin Surveys application. The first is to charge subscribers a fixed monthly amount for the package they subscribe to, the second is to charge subscribers based on their resource consumption.

Charging subscribers a fixed monthly fee has the following advantages and disadvantages:

Subscribers know in advance what their costs will be every month.

Tailspin knows, based on subscriber numbers, what its income will be every month.

There is a risk for Tailspin that, if it doesn’t sign up enough subscribers, it won’t cover its costs.

For Tailspin, implementing such a billing scheme is relatively straightforward.

It may be perceived as unfair, with some users effectively subsidizing others depending on their usage pattern.

Tailspin must set limits that prevent subscribers from using resources excessively. With no limits in place, Tailspin may face unexpectedly large bills at the end of a month, or the performance of the application may suffer.

Jana Says:

Using the Autoscaling Application Block is not just a great way to scale applications automatically—it can also be used to set upper limits on your use of cloud resources.

Charging subscribers based on their monthly resource usage has the following advantages and disadvantages:

Tailspin can pass on its Windows Azure running costs to its tenants, plus a percentage to ensure that it always covers its monthly running costs.

Subscribers cannot predict their monthly costs so easily.

Subscribers may want to set a cap on their potential monthly costs, or receive notifications if they exceed a particular amount.

Tailspin must ensure full transparency in the way that it calculates subscribers’ monthly bills.

Tailspin must add suitable monitoring to the application to accurately capture each subscriber’s usage.

This approach may be viewed as fairer because there is no cross subsidization between tenants.

This approach is more complex to implement.

Tailspin opted for the first approach, where subscribers pay a fixed monthly fee for their subscription. Subscribers prefer this approach because their costs are predictable, and Tailspin prefers it because it can implement it relatively easily.

Note:

Windows Azure Marketplace can provide you with a channel for marketing your hosted service. It can also provide billing services to collect payments from subscribers. For more information, see Windows Azure Marketplace on MSDN.

Tailspin will set different monthly limits for the different subscription levels. Initially, Tailspin plans to implement the following restrictions on subscribers:

It will set different limits for premium and standard subscribers on the number of surveys they can have active at any one time. Tailspin can enforce this by checking how many surveys the subscriber currently has active whenever the subscriber tries to publish a new survey.

It will set different limits on the duration of a survey. Tailspin can enforce this by recording, as part of the survey definition, when the subscriber published the survey. The application can check whether the maximum duration that a survey can be available for has been reached whenever it loads the list of available surveys for a subscriber.

Tailspin will also consider placing different limits on the maximum number of responses that can be collected for the different subscription levels. This will require the application to track the number of survey responses each tenant and survey receives and notify the subscriber when it is approaching the limit. The application already collects this data as part of the summary statistics it calculates.

Tailspin will monitor the application to see if any subscriber surveys result in poor performance for other users. If this occurs, it will investigate additional ways to limit the way that subscribers can consume resources.

Note:

The sample application does not currently impose any limits on the different types of subscriber.

Inside the Implementation

Now is a good time to walk through some of the code in the Tailspin Surveys application in more detail. As you go through this section, you may want to download the Visual Studio solution for the Tailspin Surveys application from http://wag.codeplex.com/.

Onboarding for Trials and New Subscribers

The following sections describe how Tailspin handles onboarding for new subscribers. The onboarding process collects the information described in this section and then persists it to blob storage using one blob per tenant. The web and worker roles in the Tailspin Surveys application use the tenant information in blob storage to configure the application dynamically at runtime.

Basic Subscription Information

The following table describes the basic information that every subscriber provides when they sign up for the Surveys service.

Information

Example

Notes

Subscriber Name

Adatum Ltd.

The commercial name of the subscriber. The application uses this as part of customization of the subscriber's pages on the Surveys websites. The Subscriber can also provide a corporate logo.

Subscriber Alias

adatum

A unique alias used within the application to identify the subscriber. For example, it forms part of the URL for the subscriber's web pages.

The application generates a value based on the subscriber name, but it allows the subscriber to override this suggestion.

Subscription Type

Trial, Individual, Standard, Premium

The subscription type determines the feature set available to the subscriber and may affect what additional onboarding information must be collected from the subscriber.

Payment Details

Credit card details

Apart from a trial subscription, all other subscription types are paid subscriptions. The application uses a third-party solution to handle credit card payments.

Apart from credit card details, all this information is stored in Windows Azure storage; it is used throughout the onboarding process and while the subscription is active.

Authentication and Authorization Information

Chapter 6 of this guide, “Securing Multi-Tenant Applications,” describes the three alternatives for managing access to the application. Each of these alternatives requires different information from the subscriber as part of the onboarding process. For example, the Standard subscription type uses a social identity provider to authenticate a user’s Microsoft or Google account credentials, and the Premium subscription type can use either the subscriber's own identity provider or Tailspin’s identity provider.

Provisioning a Trust Relationship with the Subscriber's Identity Provider

One of the features of the Premium subscription type is integration with the subscriber's identity provider. The onboarding process collects the information needed to configure the trust relationship between subscriber's Security Token Service (STS) and the Tailspin federation provider (FP) STS. The following table describes this information.

The Surveys application creates a rule in its FP to map this identifier to the administrator role in the Surveys application.

User identifier claim type

http://schemas.xmlsoap.org/ws/2005/05/identity/claims/name

This is the claim type that the subscriber's STS will issue to identify a user.

Thumbprint of subscriber's token signing key

d2316c731b39683b743109278c81e2684523d17e

The federation provider STS compares this to the thumbprint of the certificate included in the security token sent by the subscriber’s STS. If they match, the Tailspin federation provider can trust the security token.

Claims transformation rules

Group:Domain Users => Role:Survey Creator

These rules map a subscriber's claim types to claim types understood by the Surveys application.

The sample code includes the Tailspin.SimulatedIssuer project, which includes a simple federation provider that manages the federation with Tailspin’s subscribers. This federation provider reads the information it needs from the tenant’s configuration data in blob storage. The following code sample from the FederationSecurityTokenService class in the Tailspin.SimulatedIssuer project shows how this simple federation provider uses the tenant information to perform the claims transformation from the tenant’s claim into a claim that the Tailspin Surveys application recognizes.

The following code sample from the TenantStoreBasedIssuerNameRegistry in the Tailspin.SimulatedIssuer project shows how the Tailspin federation provider verifies that a security token is from a trusted source. It compares the subscriber’s thumbprint stored in the tenant configuration data with the thumbprint of the signing certificate in the security token received from the tenant’s STS.

In the future Tailspin could decide to use ADFS, Windows Azure Access Control, or a different custom STS as its federation provider STS. As part of the onboarding process, the Surveys application would have to programmatically create the trust relationship between the Tailspin federation provider STS and the subscriber's identity provider, and programmatically add any claims transformation rules to the Tailspin federation provider.

Provisioning Authentication and Authorization for Basic Subscribers

Subscribers to the Standard subscription type cannot integrate the Surveys application with their own STS. Instead, they define their own users in the Surveys application. During the onboarding process they provide details for the administrator account that will have full access to everything in their account, including billing information. They can later define additional users as members of the Survey Creator role, who can only create surveys and analyze the results.

Individual subscribers use a third-party social identity such as a Microsoft account, Open ID credentials, or Google ID credentials to authenticate with the Surveys application. During the onboarding process they must provide details of the identity they will use. This identity has administrator rights for the account and is the only identity that can be used to access the account.

Geo-location Information

During the onboarding process, the subscriber selects the geographic location where the Surveys application will host its account. The list of available locations is a subset, chosen by Tailspin, of the locations where there are currently Windows Azure data centers. This geographic location identifies the location of the Subscriber website instance that the subscriber will use, and where the application stores data associated with the account. It is also the default location for hosting the subscriber's surveys, although the subscriber can opt to host individual surveys in alternate geographical locations. For more information about how Tailspin plans to implement this behavior, see Chapter 5, “Maximizing Availability, Scalability, and Elasticity.” Currently, the sample application allows a subscriber to select a hosting location, saves this in the tenant configuration, but does not use it.

Database Information

During the sign-up process, a subscriber can also opt to provision a Windows Azure SQL Database instance to store and analyze its survey data. The application creates this database on a SQL Database server in the same geographical location as the subscriber’s account. The application uses the subscriber alias to generate the database name and the database user name. The application also generates a random password. The application saves the database connection string in Windows Azure storage, together with the other subscriber account data.

At the time of writing, there is a soft limit of 150 databases per SQL Database server. Tailspin could monitor manually how many databases are created on each SQL Database server, and then add new server instances as required. Alternatively, Tailspin could automate this process using the SQL Database Management REST API. For more information, see “Operations on Windows Azure SQL Database Servers.”

Customizing the Surveys Application for Each Subscriber

A common feature of multi-tenant applications is enabling subscribers to customize features of the application for their subscribers, such as the appearance of the application and the availability of selected UI features and functionality.

How Tailspin Allows Subscribers to Customize the User Interface

The current version of the Surveys application enables subscribers to customize the appearance of their pages by using a custom logo image. Subscribers can upload an image to their account, and the Surveys application saves the image as part of the subscriber's account data in blob storage. The application can then display the image on pages in the public and private web sites.

The current solution allows a subscriber to upload a single image to a public blob container named logos. As part of the upload process, the application adds the URL for the logo image to the tenant's blob data stored in the blob container named tenants. The following code sample from the TenantStore class shows how the application saves the subscriber’s logo image to blob store and then updates the tenant’s configuration data with the URL of the image:

Tailspin plans to extend the customization options available to subscribers in future versions of the application. These planned extensions, which are not included in the sample, will enable subscribers to customize the appearance of their survey pages to follow corporate branding by using cascading style sheets (CSS) technology.

Tailspin is concerned about the security implications of allowing subscribers to upload custom .css files, and plans to limit the CSS features that the site will support. To do this, Tailspin plans to provide a UI where subscribers can provide custom definitions for a predefined list of CSS selectors that are applied to the HTML elements used to display the survey page and its questions. The Surveys application will store these custom CSS selector definitions as part of each tenant’s configuration data, enabling each subscriber to customize its surveys using its own style. The following code sample shows a selection of the CSS selectors that the application currently uses and that could, potentially, be overridden using this approach.

The Surveys application will construct a custom style sheet dynamically at runtime using the custom definitions saved by the subscriber, and link to it in the HTML pages. The following code sample shows how the Survey Display page in the public site might apply the custom CSS selectors defined by the Adatum subscriber.

The page imports the custom styles generated by the DynamicStyle.aspx page after the default styles so that any customizations defined by the subscriber override the base styles.

Tailspin will implement a scanning mechanism to verify that the CSS customizations provided by the tenants do not include any of the CSS features that the Surveys site does not support, or that could compromise the application’s security.

Poe Says:

Cascading style sheets behaviors are one feature that the Surveys site will not support.

Billing Subscribers in the Surveys Application

Tailspin plans to bill each subscriber a fixed monthly fee to use the Surveys application. Subscribers will be able to subscribe to one of several packages, such as those outlined in the following table.

Subscription type

User accounts

Maximum survey duration

Maximum active surveys

Trial

A single user account linked to a social identity provider, such as Windows Live or OpenID.

5 days

1

Basic

A single user account linked to a social identity provider, such as Windows Live or OpenID.

14 days

1

Standard

Up to five user accounts provided by the Surveys application.

28 days

10

Premium

Unlimited user accounts linked from the subscriber's own identity provider.

56 days

20

The advantage of this approach is simplicity for both Tailspin and the subscribers, because the monthly charge is fixed for each subscriber. Tailspin must undertake some market research to estimate the number of monthly subscribers at each level so that it can set appropriate charges for each subscription level.

Bharath Says:

Tailspin must have good estimates of expected usage to be able to estimate costs, revenue, and profit.

In the future Tailspin wants to be able to offer extensions to the basic subscription types. For example, Tailspin wants to enable subscribers to extend the duration of a survey beyond the current maximum, or increase the number of active surveys beyond the current maximum. To do this, Tailspin will need to be able to capture usage metrics from the application to help it calculate any additional charges incurred by a subscriber.

Note:

At the time of writing, the best approach to capturing usage metrics is via logging. Several log files are useful. You can use the Internet Information Services (IIS) logs to determine which tenant generated the web role traffic. Your application can write custom messages to the WADLogsTable in response to events such as a survey being completed. The sys.bandwidth_usageview in the master database of each Windows Azure SQL Database server shows bandwidth consumption by database.