Implementing Support and Monitoring For a Business-Critical Application Migrated to Windows Azure

Technical Case Study

Published: August 2011

The following content may no longer reflect Microsoft’s current
position or infrastructure. This content should be viewed as reference
documentation only, to inform IT business decisions within your own company or
organization.

Microsoft IT had recently migrated BCWeb—a complex,
business-critical application—to the Windows Azure platform. To ensure ongoing
application availability, the team needed to implement a reliable and
comprehensive monitoring and support solution for BCWeb. Microsoft IT accomplished
this by combining the Windows Azure integration and monitoring capabilities
with the Microsoft System Center Operations Manager management capabilities.

Microsoft IT
needed to create
and implement a
support and
monitoring
solution for
BCWeb—an
enterprise
application that
was recently
migrated to the
Windows Azure
platform.

Microsoft IT
leveraged the
Windows Azure
platform's
flexibility and
extensibility
with the System
Center
Operations
Manager 2007 R2
integration
capabilities to
provide a
comprehensive,
centralized, and
manageable
support and
monitoring
system for BCWeb.

A consolidated management and support environment under System Center Operations Manager 2007 R2

Accurate and timely monitoring and alerting for BCWeb critical components

A large number of reusable monitoring components that can be leveraged in future Windows Azure applications

Best practices to apply to future Windows Azure applications

Introduction

Business Case Web (BCWeb) is an internal, web-based
application that Microsoft uses to create the business case for product pricing
exemptions. BCWeb is composed of three distinct application components: the
core BCWeb component, the Workflow Routing and Approval system (WRAP), and
Rapport. The core BCWeb component is responsible for providing a user interface,
and for the underlying functionality that enables users to generate business
cases for pricing exceptions. WRAP routes the pricing exception requests for
approval within the Microsoft corporate infrastructure. Rapport provides a user
interface for the WRAP approval process.

BCWeb has a user base of 2,500 internal Microsoft
employees. In 2010, Microsoft used BCWeb to process approximately 27,000
pricing exception requests.

BCWeb Platform Overview

BCWeb was migrated to Windows Azure as a pilot project to
develop and capture best practices for migrating enterprise applications to
Windows Azure. The core BCWeb components are hosted on the Windows Azure
platform. However, BCWeb is also integrated with a number of components that
are hosted on the Microsoft IT corporate network, and are external to the Windows
Azure platform.

Situation

The primary reason for migrating BCWeb to Windows Azure was
as a migration pilot project. However, BCWeb was also experiencing performance
and reliability issues in its previous environment. Although the Windows Azure
migration brought increased reliability and performance to BCWeb, ongoing
tuning of the application environment was required. Microsoft IT realized that
it needed a comprehensive monitoring solution to enable ongoing reliability,
and to measure internally established service level agreements (SLAs).

BCWeb Architecture

BCWeb is divided into three distinct Windows Azure
Services, which in turn house the main application components: BCWeb, WRAP, and
Rapport. The three applications are separated by design to enable a modular
approach to application updates and refactoring.

Windows Azure Components

The first component application—the BCWeb core—is
implemented as a Windows Azure Web role that hosts the UI for generating
business case documents. BCWeb uses two Worker roles: the first Worker role
hosts the core BCWeb Service and other Windows Communication Foundation
(WCF)–based services, and the second Worker role hosts background and
notification processes used by the BCWeb application. The WRAP application is
implemented as a multi–instance Worker role that contains all of the necessary
services required to perform the routing and approval operations for
BCWeb–generated business case documents. The Rapport Windows Azure Service
hosts the Rapport application. Rapport is composed of a Web role that hosts the
UI, and a Worker role that hosts the Rapport Windows Communication Foundation (WCF)
Service. SQL Azure databases host native data storage for the entire BCWeb
application infrastructure.

On-Premises Distributed Components

BCWeb includes several critical components that are not
hosted on the Windows Azure platform. These components primarily provide access
to external data that is required for BCWeb functionality. The two primary
external components are SAP (for business data), and the Microsoft corporate
Active Directory® Domain Services database (for infrastructure and
organizational data). Both of these components are outside the management scope
of BCWeb, but are critical to its functionality. Both components are also
hosted on-premises within the Microsoft corporate network. An on-premises
database—the Licensing Information Repository (LIR)—hosts information used for
data warehousing. The BCWeb transactional SQL Azure databases export
information on an ongoing basis to the on-premises LIR database (hosted on
Microsoft SQL Server)—for reporting purposes.

BCWeb Windows Azure Architecture Diagram

Figure 1. BCWeb Windows Azure Architecture

Solution

Microsoft IT knew that implementing a support and
monitoring solution for BCWeb would be a challenging task. The BCWeb migration
to Windows Azure meant that the support and monitoring processes used with the
previous BCWeb version would require reassessment and redesigning to
accommodate the new application infrastructure.

Design Goals

Microsoft IT began planning for the BCWeb support and
monitoring solution with several general design goals in mind:

The solution must provide support and monitoring for all
critical aspects of BCWeb functionality, including components
hosted on the Windows Azure platform, and components hosted
on-premises that are external to Windows Azure.

BCWeb monitoring should be centralized and consolidated into
one management console.

The solution should leverage existing Microsoft IT
infrastructure as much as possible

Windows Azure–based monitoring components should be used as
much as possible.

Providing Support for a Distributed Application

The new version of BCWeb contained both components from the
Microsoft corporate network, and components from the Windows Azure platform. As
a result, several changes to the previous support model were required.

The distributed nature of BCWeb on the Windows Azure
platform forced Microsoft IT to reassess the methods used to support the
application. In the previous BCWeb version, the scope of support was limited to
the Microsoft corporate network. One of the important considerations when
leveraging Windows Azure for internal enterprise applications is that corporate
network users connect to resources outside of the of the network (Windows
Azure) to run "internal" applications.

In the BCWeb Windows Azure version, the following components
and their associated support teams became part of the application's support
infrastructure:

These systems would need to be incorporated into the BCWeb
support model, and the previously established SLAs would require reassessment
to reflect the BCWeb support requirements' increased complexity.

The BCWeb team was still the contact point for end users,
but BCWeb support now relied on the Windows Azure platform support team, the AD FS
support team, and the Microsoft IT network support team, to provide support for
their associated systems.

As a result, the following areas needed reassessment:

SLAs for response and resolution time. The
BCWeb support team had to include the response times for the
other support teams in its overall response and resolution time
SLAs.

SLAs for performance and availability.
BCWeb application SLAs needed to integrate performance and
availability benchmarks from all integrated components.
Performance and availability for BCWeb was now subject to the
performance and availability of several components outside the
control of the BCWeb team.

The support team quickly discovered that with a hybrid
application, support complexity and dependencies increase as more third-party
components are involved. All of these components had an impact on the BCWeb end-to-end
SLAs.

Determining Key Points of Failure

The first task in establishing a reliable and
comprehensive monitoring solution for BCWeb was to determine the key points of
failure for the application. The BCWeb support team identified the key points
of failure within BCWeb, and then put the appropriate monitoring processes in
place to either prevent failure, or quickly identify when a failure occurred.

When Microsoft IT designed the monitoring solution, these
Points of failure were the first aspects of BCWeb that they addressed.

Designing Operational Monitoring for BCWeb

Microsoft IT outlined the following general monitoring
requirements for BCWeb:

Error logging. Record warning and error-related messages from
all applicable components.

When considering monitoring methods for BCWeb, Microsoft
IT identified that the Windows Azure platform could not natively support the
level of monitoring that BCWeb would require. Additionally, the on-premises components
outside of Windows Azure would need monitoring. Thus, Microsoft IT required a
monitoring solution that would allow the BCWeb support team to accurately
assess the application's condition based on all of its various components.

Leveraging System Center Operations Manager 2007 R2 to Consolidate
Monitoring and Support

Microsoft IT decided to use System Center Operations
Manager 2007 R2 to monitor the new version of BCWeb. Microsoft IT chose System
Center Operations Manager for the following reasons:

Monitoring could be centralized into one console, and
consolidated to include Windows Azure and on-premises
components.

Microsoft IT approached each of these categories differently
using System Center Operations Manager.

End-User Perspective and SLA Requirements

Microsoft IT used the System Center Operations Manager
Web Application template to enable scripted website navigation that mimicked
typical end-user interactions with the different BCWeb UI components. This
enabled the team to monitor true availability of the web applications and implement
alerts. It also enabled Microsoft IT to collect historical availability data to
compare with established SLAs.

Web and Worker Role Performance

The development team discovered that the built-in Windows
Azure Diagnostics feature could provide a large amount of diagnostic
information regarding the state of the Windows Azure Compute roles—the Web and
Worker roles in the case of BCWeb. When the development team combined System
Center Operations Manager with the Windows Azure Management Pack, they were
able to access a large number of performance counters and events that contained
the information they needed about the Web and Worker roles. By building
trending and alerting functionality, the team was able to monitor the health of
the Compute roles. The team used the Windows Azure Management Pack to:

Discover each Windows Azure application.

Provide status of each Windows Azure role instance.

Collect and monitor Windows Azure performance information.

Collect and monitor Windows events.

Collect and monitor the Microsoft .NET Framework trace
messages from each Windows Azure role instance.

Application Health

The overall health of BCWeb depends on several
components, including Windows Azure. To monitor the Windows Azure part of
BCWeb, and address some of the aspects of the BCWeb application that were not natively
monitored by the Windows Azure Management Pack—especially monitoring
on-premises components—the development team extended the capabilities of the
Windows Azure Management Pack to monitor key aspects of application health. Specifically,
they created performance counters that monitored application-specific items such
as requests to ASP.NET Application objects and .NET Framework CLR exceptions.
The development team also extended the Windows Azure management pack to monitor
business logic exception events when accessing on-premises components.

For on-premises components, the development team also
leveraged built-in .NET Framework components to monitor application health
through performance and historical trends. For example, the team planned to use
the StopWatch class to time calls to the
SAP web service, and then represent the results as a performance counter that
System Center Operations Manager could then monitor.

SQL Azure Performance and State

One large deficiency in the available solutions through
System Center Operations Manager was the lack of any monitoring capability for
SQL Azure.

In the previous version of BCWeb, a large portion of system
monitoring used tools native to SQL Server. Unfortunately, three keys legacy BCWeb
tools were not available on SQL Azure:

Table 1. SQL Azure Component Comparison

SQL Server Component

Feature Purpose

Feature Status on SQL Azure

SQL Agent

Manage and execute automated tasks (SQL Server jobs)

Not Available

SQL Profiler

Capture and analyze SQL Server performance data

Not Available

DMVs

Provide diagnostic and configuration information
about SQL Server

Partially Available

As a result of these discrepancies, the development team elected
to build a custom management pack using both historical trending and threshold
alerting to monitor the health and performance of SQL Azure

For example, the team created a performance counter that
measured the size of a SQL Azure database using a Transact-SQL (TSQL) query.
System Center Operations Manager collected this data daily, using the following
script.

The result of this script was a performance counter that
System Center Operations Manager monitored every five minutes.

Additionally, the development team examined the
application code for references to DMV information that was not available in
SQL Azure, and then refactored the code to remove the references and retrieve
the information from alternate DMV locations in SQL Azure.

Benefits

Microsoft IT used System Center Operations Manager 2007
R2, the Windows Management Pack for System Center Operations Manager, and
custom-designed performance counters within Windows Azure to realize the
following benefits:

A consolidated management and support environment within System
Center Operations Manager 2007 R2

Accurate and timely monitoring and alerting for BCWeb
critical components

A large number of reusable monitoring components that can be
leveraged in future Windows Azure applications

Best Practices

Microsoft IT established the following best practices
when implementing Windows Azure monitoring:

Use System Center Operations Manager 2007 R2 and the
Windows Azure Management Pack for consolidated and centralized
application monitoring.

Develop applications with the most recent version of the
Windows Azure Software Development Kit (SDK) to implement the
newest monitoring features.

Conclusion

By using System Center Configuration Manager 2007 R2, the
Windows Management Pack for System Center Operations Manager, and custom-designed
management pack components, Microsoft IT was able to provide a robust and
centralized monitoring environment for BCWeb.

The solution included monitoring of the BCWeb Windows
Azure-based components, and the critical aspects of on-premises components that
were not native to Windows Azure. Microsoft IT also captured numerous best
practices that will be used in future distributed application migrations.

Products & Technologies

Windows Azure Web role

Windows Azure Worker role

Windows Azure AppFabric

SQL Azure

Microsoft SQL Server 2008 R2

Microsoft Visual Studio® 2010

Windows Azure SDK 1.4

System Center Operations Manager 2007 R2

Windows Azure Management Pack for Operations Manager

For More Information

For more information about Microsoft
products or services, call the Microsoft Sales Information Center at (800)
426-9400. In Canada, call the Microsoft Canada Order Centre at (800) 933-4750.
Outside the 50 United States and Canada, please contact your local Microsoft
subsidiary. To access information via the World Wide Web, go to:

Microsoft, Windows, and Windows
Server are either registered trademarks or trademarks of Microsoft Corporation
in the United States and/or other countries. The names of actual companies and
products mentioned herein may be the trademarks of their respective owners.

This document is for informational
purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS
SUMMARY.