Introduction

Audience

This document is aimed at IT professionals and personnel evaluating enterprise application integration (EAI) technology. We assume a technical audience with a low familiarity with EAI. Although we do not describe the high-level issues of EAI, we do discuss the main issues that we recognize so that the reader understands our perspective on the problem.

The document itself is not intended to define a solution; rather it is intended to provide the building blocks from which IT professionals can design and develop solutions tailored to their individual problems. It provides guidance about how to use those building blocks to meet the requirements of common EAI scenarios.

About This Document

Enterprises across all industry sectors need to implement EAI. Different industries and sectors have different requirements and use concepts that are specific to their individual problem domains. This complicates the design of EAI solutions, and means that solution architects must develop industry-specific EAI offerings.

However, as one examines EAI solutions more closely, it is clear that there are common concepts across industry solutions. In fact, many of the underlying functional concepts, services, and products are common across different industries. The differences between industries tend to focus on business-specific scenarios that fit into common categories, rather than on functional EAI issues.

With this in mind, this document seeks to distill the essential template or pattern for a common (and thus reusable) EAI architecture. This pattern extends to the use of Microsoft technologies to meet the generic EAI solution needs.

The advantage of creating a pattern for the common elements of the EAI offering is that it allows industry-specific solution providers, partners, and customers to build tailored EAI offerings on the reusable architecture components without having to redefine the common implementation patterns and practices for each solution.

By building this pattern we can focus on the logical services that are required to implement the generic or pattern EAI architecture. We can then examine the implementation specifics as separate topics in the "Physical Services" and "Implementation" sections.

This document is based on the implementation experiences of Microsoft Consulting Services (MCS), as well as those of our key system integration partners, in over 1000 deployments of Microsoft® BizTalk® Server over the last 18 months. Over 600 of these projects have been focused exclusively on EAI integration challenges, and all of them have involved an EAI interface to some degree. This document aims to capture the current best-practice approaches for designing and deploying EAI solutions in popular scenarios and to provide guidance to the reader about appropriate approaches using the pattern solution template.

Defining EAI

For the purposes of this paper, we define EAI as:

EAI is the discipline of integrating applications and data within the enterprise into automated business processes.

Specifically, when we use the term "EAI," we are referring to the integration of systems within the enterprise — for example, application, data, and process integration. We will use the term "B2B" to refer to external integration needs. We will distinguish between these services because they have significantly different non-functional requirements at this time. In fact, however, we view these as different points on a continuum of the same service requirements, and we believe that the distinction will disappear over time. Therefore we aim to provide a technical solution that will support both needs and thus simplify the future life of enterprise architects.

Benefits of EAI

The major business benefits of EAI are as follows:

Reduced IT costs due to a more productive EAI toolset. Historically, most organizations have solved EAI challenges by writing large amounts of code. Using better tools can reduce the initial financial and time outlays, as well as the ongoing maintenance costs of this effort.

Reduced administrative costs through automation of manual processes. Automating the many manual processes that exist in every organization can eliminate large areas of personnel costs.

Reduced operational costs through more efficient value-chain processes. Automating key value-chain processes that reduce business process cycle times can reduce costs in many ways. For example, a more efficient supply chain can reduce the cost of carrying inventory.

Higher customer satisfaction and loyalty through new services and programs. EAI projects are essential for offering new information and business services more quickly than your competitors. For example, key online customer "self-service" operations can be done more easily when using EAI tools to connect the appropriate systems.

Better and faster business decisions. Aggregating business information and making it available in near-real time can fundamentally improve your ability to make better business decisions more quickly than your competitors.

Pattern Reference Model

This document relies heavily upon, and assumes that the reader is familiar with, the Microsoft Pattern Reference Model and, at a high level, with the Microsoft Patterns work.

Throughout this document, we will use a diagram like the one shown here to indicate the Microsoft Pattern Reference Model layer to which each section of the document refers.

Microsoft EAI Product Family

A key point is that EAI involves solving a number of individually distinct aggregated problems that are best addressed by several products rather than a single one. This document provides guidance about the implementation patterns of the products that together represent the comprehensive EAI platform from Microsoft.

Microsoft has a number of products that support the development and implementation of comprehensive EAI solutions. These products cover different parts of the EAI landscape, and can be integrated to provide a complete solution. There is no single-product answer to the problem of EAI; EAI is a solution built from different products, each with its own particular functional capabilities.

Microsoft BizTalk Server provides the broadest coverage of services within the EAI platform. BizTalk Server is supported by the following products and technologies that can be used with it to address additional service requirements:

Microsoft Host Integration Server

Microsoft SQL Server™

Windows core services including:

XML Web services

Microsoft Data Access Components (MDAC) data connectivity services

COM+ application services

These products are described in more detail throughout this document.

EAI Business Pattern

To build an IT pattern, it is first important to capture the business requirements of the problem domain and define how they will be solved by IT. As mentioned earlier, one of the aims of this document is to provide guidance that can be utilized across industry sectors; it is therefore important that we specify the business problem that we are addressing in generic terms.

Problem

What are the business drivers for EAI, and how can we (IT) best support needs for business agility?

Context

To amplify the problem statement, consider the context of the business problem, which defines the patterns that define the EAI problem space. The following business forces define the context of the problem:

Importance of cost reduction

Increasingly competitive market conditions require organizations to be more aggressive in reducing administrative, operational, and information technology costs wherever possible. Automating manual and error-prone processes can deliver immediate cost and cycle-time benefits, and the benefits of streamlining value-chain operational processes commonly include reduction of inventory carrying costs as well as reduction in write-offs through obsolescence. A common EAI services environment can directly contribute to the bottom line by reducing the costs associated with application integration and system maintenance. It can also play a more strategic role in the organization by supporting key business initiatives through the appropriate application of information technology.

Responsiveness to business strategy change

It can be very difficult to quickly and easily support sea changes in business strategy by using existing IT systems. The classic example for EAI was the change from a product-oriented view of the world to a customer-oriented view. The enterprise's IT systems needed to be significantly re-architected to support such a change, but the required wholesale rewrite of existing mission-critical systems made that course of action impossible. Therefore, a data and application mechanism was required to provide the business with the new views that it needed, while working within most of the constraints of the existing systems.

Moving to Web services

This is the next change that businesses expect to have to cope with. In this movement, the classical concept of how a business delivers its services dramatically shifts. Instead of delivery through its own staff or other human agents, the business can start to deliver its services through direct interaction with the systems of other businesses. The big questions will be what services to expose and under what contracts they need to operate; after a business resolves these issues, the next question will be how to implement the change — which is where EAI will play a crucial role.

Supporting business collaboration

This is a superset of the problem sometimes referred to as "business process integration." The requirement is for business processes to act collaboratively across a network between two or more business systems, and managed through automated services. The collaboration is event-based, requiring automated interaction between one business process and another in a sequential or parallel fashion but not tied into a rigid workflow.

Supporting the rate of business process change

The rate of business process change can drive the need for an agile EAI solution that is capable of supporting the changing needs of the business. Such a system must be correctly designed to allow the required degree of agility without compromising the security and manageability of the services provided. The rate of change within a business can rapidly cause poorly architected EAI solutions to fail, or become bogged down in the sheer amount of effort required to add or update business services.

Providing secure integration

It is essential to most businesses that integration solutions follow the business security needs, and do not open up holes in the security domains.

Solution

The business requirements will be met by providing these functions:

Integrating business processes

Integrating heterogeneous applications

Integrating heterogeneous data sources

To best address the business requirements, these goals will drive the solution provided:

Speed of application integration

Reuse of resulting components and services

Ease of access to and flexibility of integration services

Conceptual Solution

In this section, we provide an IT solution that meets the needs of the business problem that was defined in the preceding section.

Problem

How can we provide an EAI solution that can quickly and flexibly integrate business applications and data within a secure and managed framework, and can support business process integration, both internally and externally?

Context

The proposed solution needs to solve two distinct IT problems:

Integrating disparate applications within an enterprise

Automating business process

The context of the proposed solution is shaped by internal IT forces as well as by constraints imposed by the environment. The following topics discuss the forces, constraints, and solutions in more detail.

IT Forces

The following IT forces affect the solution context:

Application integration

One of the principal requirements for integration is application integration. There are many tools available from vendors that support, to different degrees, the specific requirements of application integration. Integration with an application requires an interface to a process within the application. The interface can be one-way or two-way, and must allow interaction between the application and the integration framework. This kind of interface requires highly disciplined software engineering techniques to minimize impact on the operation of the application. The interface also needs to be secure and manageable.

Managed integration

Integration mechanisms and interfaces normally evolve over a long period of time. Many of the mechanisms (and interfaces to a lesser degree) are very poorly managed. As the number of interfaces increases — particularly interfaces that connect mission-critical processes — the risk and impact of failure of these interfaces also increases. Ultimately, it becomes impossible to efficiently manage the interfaces manually. Businesses require that integration mechanisms and their interfaces are managed within the enterprise operations management service and that they maintain the same service quality levels.

Data exchange

A business may require data to be exchanged directly with a data source or through an interface with a business application. Again, the interface requirement may be one-way or two-way, and requires highly disciplined software engineering techniques in order to impact the operation of the application or data source as little as possible. The interface also needs to be secure and manageable.

Internal Constraints

Before embarking on any integration project, designers should recognize the internal constraints of their business and technical infrastructure and environment. These factors can present challenges that impact several aspects of the design and delivery of an integration project. We cannot address these issues specifically in the pattern because they essentially describe the trade-offs you will need to make in implementing the pattern in your environment. The purpose of including them here is to note these tensions and to assert that the pattern is designed to be flexible enough to cope with them.

The following illustration shows how some of these forces and constraints might interact. See Appendix A for a short summary of each constraint.

Fundamental IT Problem: Integrating Disparate Applications Within an Enterprise

As enterprises have grown and evolved, new application systems and data stores have increased the heterogeneity of operational systems and hence the challenge of integrating them. This was characterized for many years by a strategy of adding stand-alone application systems to service particular solution needs within organizations. The result has been the creation of "islands" of business function encapsulated in numerous applications, processes, and data sources.

To make the most effective use of this information, the organization must be able to integrate these systems to allow the sharing of the application functionality, the business processes, and the data held in the different applications.

The following illustration shows some of the different applications that might exist within an organization. These valuable IT services need to be used fully and effectively to provide the best return on investment, and they also need to support business agility. EAI is the process of integrating these disparate applications.

In the past, the typical approach to integrating applications within an enterprise was to generate the programming code to create point-to-point integration interfaces between the applications as the need arose. This was the often the most financially appealing option (in the short term) because it meant that the larger (and initially more costly) problem of architecting a common application integration infrastructure did not need to be addressed. A fundamental problem with this approach is that the practice of point-to-point integration between applications naturally results in higher maintenance costs, lower observed scalability and availability for the distributed system, and the inability to quickly change the business processes based on those applications when business and competitive needs arise.

The absence of any comprehensive and deliberate integration strategy will lead to a complex lattice of point-to-point integration services. The following illustration shows a possible result of this strategy. This solution is not reusable, flexible, or scalable. It presents an integrity, integration, and management problem, and its total cost is high.

The problem with such systems (even systems that currently operate successfully) is that they are susceptible to the development of a complex lattice of inter-application connections and dependencies. In such scenarios, it is not uncommon for the scale and the complexity of the interconnections to become a barrier to the addition of new business functions.

Also, in such scenarios, the nature and complexity of the interconnections — coupled with the fact that, in many cases, not all of the dependencies are well understood or well documented — can lead to a situation where the architecture becomes highly brittle. By brittle, we mean that although the system continues to function in its present (stable) state, changes to any of the component applications can have unforeseen and unpredictable consequences on other applications and processes in the enterprise. This is commonly referred to as the challenge of "tightly coupled" application connections, where changes to one system or application cause failure in other systems that call or depend upon it.

Additional IT Problem: Business Process Integration

The IT organization is now increasingly being challenged to provide an automated business process integration system with a cross-IT system scope. This pressure will become greater as businesses want to expose their business services as Web services. In many instances, these Web services will operate in the context of the management of an inter-organizational business process, and will be managed according to contractual business-service levels.

Solution

A principal goal of an EAI solution should be to quickly and flexibly integrate business applications and data within a secure and managed framework, providing application-to-application (A2A) services. The purpose of an EAI platform is to insert a simple, conceptually central service hub that offers a variety of secure connectivity services. Thus, any application can send a service request to the hub and not worry about what application will service the request. This simple concept will then be extended to provide many powerful services, which are described later in this document.

Additionally, a key goal of using EAI technology is to automate tasks that are currently manual, by using business process integration. The use of standardized EAI services within the enterprise makes this cost effective. The use of such services between enterprises is essential to support managed business-to-business (B2B) collaboration processes.

About the EAI Solution

The following illustration shows how EAI might work to simplify the connection between services — in this case, between a Web server and mainframe applications.

Enterprise application integration is often difficult, and typically:

Requires connectivity between heterogeneous technology platforms.

Involves complex business rules and processes.

Involves long-running business processes, where logical units of work may span days or weeks as they move through different processes within the organization, or may be extremely short.

Is driven by the need to extend/enhance an existing automated business process or to introduce an entirely new automated business process.

Hence, an EAI solution will have these characteristics:

Exposes a technology-independent interface. Uses "business semantics" to request a service by using a document format such as XML or flat files. As systems become more XML-aware, XML will become the preferred approach to requesting services in the future.

Allows non-XML service requests at the functional or data levels to support systems that cannot expose their service needs in XML format.

Uses a common and shared set of process rules and service rules to ensure consistency and reuse of integration services.

Is capable of reusing the existing transport protocols that typically already exist in the enterprise.

Insulates itself from existing technologies by using the concept of service interfaces commonly known as "adapters."

The following section, "Logical Services," provides the details of the set of services that an EAI solution should offer.

This example illustrates the conceptual EAI solution.

Synchronous or Asynchronous Interface

For any EAI solution, a major design decision is whether the integration being designed needs to be synchronous or asynchronous. The following topics describe our definitions of these terms. In EAI, we expect most integration needs to be met by functions built on asynchronous, loosely coupled solutions.

Synchronous Processing

The following illustration shows the concept of synchronous processing. The processes are said to be synchronized because each step waits for the next step to complete before it continues.

Process A is initiated.

Process A calls/invokes Process B.

Process B is initiated.

Process B calls/invokes Process C.

Process C is initiated.

Process C finishes and returns to Process B.

Process B can now continue.

Process B finishes and returns to Process A.

Process A can now continue.

Process A finishes.

More accurately, the issue is that no process can proceed until the dependent process has finished. This restriction often results in "blocking" behavior in the technical and business implementations of the solution.

In EAI solutions, synchronous processing is associated with the need to provide sequential coordinated data requests and forwards between the coupled applications. The scale-out methodology for a synchronous process is often expensive because the only option requires adding hardware to the solution.

Asynchronous Processing

The following illustration shows the simplest representation of asynchronous processing. It assumes that no process depends on the processes that it calls or invokes or that call or invoke it. The processes are "loosely coupled" by design.

Process A is initiated.

Process A calls/invokes Process B.

Process A finishes.

Process B is initiated.

Process B calls/invokes Process C.

Process B finishes.

Process C is initiated.

Process C finishes.

A loosely coupled solution built primarily on asynchronous, message-based interactions between systems allows the distributed system to be more highly available and scalable. This is because work can be performed independently by different nodes in the distributed system, allowing the observed availability of given services in the distributed system to increase. Simultaneously, the collective work performed by the distributed nodes increases the overall processing throughput capabilities by reducing inefficient blocking behaviors found in synchronous interactions.

A fundamental characteristic of such an architectural approach is that the interactions are normally based on store-and-forward or message queuing services. Queuing services provide the following benefits:

Service can continue in cases where the server application is not available at the exact time its service is requested.

Workloads can be smoothed during peak demands because queues can grow and then be worked through in slower times, if that is acceptable.

-Or-

Workloads can be smoothed by adding servers to pull more work from the queues in peak demand periods. After the queues are cleared, the servers can be detached.

The number of queues can be varied to better manage workload.

The service requester is never tightly bound to the server — only to the queuing system, which can route his request anywhere it needs to satisfy the service. Thus, the servers can begin as a legacy server set today, and in the future can be switched to a new application server set without the service requester knowing or caring.

When EAI services are provided on top of this asynchronous integration infrastructure. a very flexible solution is generated.

Logical Services

The Logical Services layer of the pattern describes the logical IT services that are required to support the EAI conceptual solution. These services provide the logical building blocks of the EAI pattern variant and the details of how these services are deployed. The products that support the particular services are explored in the following sections.

Problem

What common services should an integration broker provide in order to support the needs of EAI as defined in the conceptual solution?

Context

The need for the functions provided by the services will depend upon the EAI strategy that you adopt for your enterprise.

In this generic EAI pattern we identify the principal services that are needed to support most EAI solutions. In practice, each EAI solution will require a different mix of the services described in this solution. It is unlikely that any business will implement all these services in their solution. The services identified here should therefore be thought of as a pick-list, not a check-list.

The purpose of identifying the services is so that you can identify whether and how well the service is provided by the technology of your choice, and hence how much effort you will have to put into using that technology. We use them to identify the capabilities of Microsoft technology to meet the functional needs of EAI, and demonstrate how the various Microsoft products in the EAI platform map directly to the comprehensive set of required logical services.

Solution

In accordance with our observations in the "Conceptual Solution" section, a distinction has been made when defining the logical services for enterprise application integration. The services have been divided into:

EAI services. Services that are required to resolve the heterogeneity of the ontology and semantics of different enterprise applications.

EAI services have evolved because of the limitations of relying simply on transport technologies to connect applications. Businesses may have a large IT infrastructure, and a complex mixture of applications, mission-critical business processes, and business data supported by heterogeneous legacy applications. To integrate these applications and data they need services that abstract the developer and the business person above the complexity of transport. They need services that focus on solving business problems.

Transport services. Services that are required to enable and manage the transportation of messages to and from participating enterprise applications.

Security services. Services that are often required across all aspects of the infrastructure, including directly at the enterprise applications and in the transport and EAI services. Security services provide key functionality such as authentication, authorization, and message encryption.

Management services. Services that provide functions that are vital to keeping the EAI infrastructure agile, and functions that are used to monitor system health and business data.

The following illustrationshows all the logical services that EAI solutions will require, depending upon the unique needs of each scenario.

EAI services require transport services in order to implement enterprise application integration, but applications can be integrated without EAI services. For example, if several applications that all utilize (and can therefore communicate in) a consistent data format are interconnected by a message-queuing technology such as MSMQ or IBM MQSeries, that data can be shared without EAI services. However, in most EAI implementation scenarios, message delivery is just one of many required services, and therefore the EAI solution relies upon the transport services for the ultimate delivery of messages to and from the respective applications. Traditionally, transport technologies alone were commonly used to integrate applications, but more is required to truly deliver on the benefits that EAI solutions can offer. To do so, organizations require solutions that make integration easier, cheaper, more flexible, and easier to manage. The provision of EAI services aims to meet these requirements by providing common, reusable services for accomplishing application integration tasks.

By dividing the model in this manner, the EAI services can process and manipulate the messages/data/processes/etc., and the transport services can deliver the messages/data/processes/etc. to the EAI services and dispatch the responses. Thus the transport services can be considered as providing the transport mechanisms for the messages and data flowing to and from the EAI services. The transport services also provide the "glue" to allow the interconnection of the different applications and business processes in the enterprise.

In this model, security and management services apply to all aspects of the infrastructure. Often security technologies are driven from the business rather than from the EAI technology. This model assumes the position that the EAI services live within a larger security model for the enterprise.

These services are briefly described in the following topics. Detailed information about each service is available in Appendix B.

EAI Services

EAI services are divided into three main groups:

Integration. Resolving semantics and data formats among applications.

Orchestration. Integrating applications at the process level.

Metadata. Storing and managing the data that the EAI service requires.

Integration

The services for integration focus on the process of resolving the differing semantics and data formats of different applications. Each service performs different tasks that may be required to integrate one application with another. Not all services will be invoked for every interface that is established between every application. The services that are used will depend on the differences between the applications established across an interface.

All of these services assume that data has been passed from at least one application, known in this pattern as the source application, and is destined for at least one application, the target application, although the target may not be explicitly known. This does not mean that the services are limited to data integration, merely that data is the basic currency used by any EAI tool to integrate applications.

Integration services are as follows (see Appendix B for details):

Parse. Takes a stream of input data from the network and creates structured data from it.

Map. After structured data has been validated, the Map service tries to map it to the output data.

Filter. Provides a mechanism for users to filter out information from certain data.

Validate. Can be used to validate many elements of the data, such as syntax, format, and range.

Transform. Uses the rules specified in the map of each data element to transform the contents of each element of input data to the corresponding element of output data.

Format. Moves the content of input data elements to the corresponding elements of output data as specified in the map.

Compose/Decompose. The Compose service composes new data from elements of other input data, using information from the Map service. The Decompose service decomposes input data into the appropriate output data.

Enrich. Allows the business owner to specify from where the EAI tool should acquire information to add to the input data to create the required output data.

Route. Allows the EAI service to represent a route to facilitate integration.

Publish. Collects information from applications and publishes it.

Orchestration

The integration services focus predominantly on resolving the heterogeneity of applications at a data level. Orchestration services, on the other hand, address the requirements of integrating applications at the process level. They are concerned with factors such as time, order, correlation, integrity, and events, as well as long-running transactions.

Orchestration services are as follows (see Appendix B for details):

Schedule. Examines queues for data whose processing needs to be scheduled for specific times, and adjusts queue processing as appropriate.

Transaction Integrity. Manages resources so that units of work are processed in an ACID (atomicity, consistency, isolation, durability) fashion.

Process Flow. Executes and manages a defined sequence of events.

Non-delivery. Manages data when the data cannot be routed to a target.

Metadata

A significant amount of metadata must be stored and managed to support an EAI service. Most EAI tools hold this data internally, often within proprietary data stores. However, businesses often require access to this metadata for purposes such as:

Replication to other instances of an EAI service

Programmable access by other services

Backing up and restoring for resilience

Sharing metadata with other businesses

Data mining and reuse

Change management

Metadata services therefore store and manage all the data required to support an EAI service. The metadata services are as follows (see Appendix B for details):

Data Models. Define the structure, syntax, and owners of data that is received from source applications and sent to target applications or published.

Names. Supports an interface for defining names and identifying the entities that they reference.

Message Database Search and Query. Allows searching for messages and querying data in the message database.

Transport Services

Transport is often referred to as the "glue" that connects two or more applications together. The applications can be network applications, operating systems, file management systems, database management systems, transaction processing management systems, or business applications.

The main goal of transport services should be to simplify the connection of programs. These services should insulate the task of connecting programs from the complexity of the underlying operating system and communications network.

The transport services defined here are not complete. There are other relevant transport services such as screen scraping that are not included. In this pattern we have only included those services that are particularly relevant to supporting enterprise application integration services.

Interfacing

Interfacing supports synchronous or asynchronous inter-application communication. The transport services that support interfacing are as follows (see Appendix B for details):

Dispatch. Manages the dispatch of procedure or method calls.

Delivery. Uses the protocol of the network to send and receive data.

Message Queue. Manages and orders the persistence of messages.

Serialize/Deserialize. The Serialize service takes the output data structure and serializes it into a flat file that can be transmitted across a network. The Deserialize service does the reverse.

Address Translate. Translates between the logical business address assigned to the destination of an integration service and the network address required by the network protocol.

Decode/Encode. The Decode service converts data to the same code page as the platform on which the EAI tool is running. If the EAI tool knows the code page of the target system, the Encode service can encode the character set of the output data into the code page of that system.

Security Services

With the proliferation of data sources, applications, networks, and methods of access to all of these, every business must have a security strategy that summarizes the security risks of its infrastructure and prescribes a security model to mitigate those security risks.

In this pattern we have placed security services in a logical plane separate from the enterprise application integration and transport services. This is not to suggest that the EAI and transport services do not require security services. Integrating applications can increase security risks and the complexity of the solutions needed to mitigate them. Therefore we strongly recommend that any security measures implemented in the EAI system are aligned with the enterprise security strategy of the business. The problem we are trying to address is that many EAI solutions still have proprietary security implementations that are difficult to integrate with the overall enterprise security services. The services that we have defined here are the generic services that we believe an EAI system will need, but we recognize that each business will have specific implementation requirements that we have not covered. This design approach admits from the outset that the EAI system should align with the security requirements of the business rather than the other way around.

The security services are as follows (see Appendix B for details):

Authenticate. Validates the identity of the user or interface that wants to access the service.

Authorize. Manages what the user or interface is allowed to do.

Encrypt/Decrypt. The Encrypt service encrypts the output data for security reasons before it is transmitted across a network. When the EAI service receives encrypted data, the Decrypt service is invoked to decrypt the data for processing.

Manage Certificates. Manages digital certificates that are used to establish the credentials of a user, interface, or application.

Sign. Handles digital signatures that can be used to authenticate the sender or requester of a message or service.

Audit. Tracks activity within the EAI systems. The Audit service is focused specifically on events that are detected by the Authenticate and Authorize services.

Management Services

One of the major financial drivers facing business today is the need to drive down cost, including the cost of IT. Many studies have shown that, particularly for medium to large enterprises, the implementation of an enterprise IT management strategy can produce significant cost savings. Management of IT therefore needs to have an enterprise-wide perspective. One of the ways to reduce cost of IT management is to develop common tools, common processes, and a common organization for managing all IT. This includes EAI systems.

We have therefore placed management services in a logical plane separate from the enterprise application integration and transport services. This is because we recommend that many businesses should consider how to reuse existing management services to manage the EAI and transport services, or develop an enterprise-level set of common management services. EAI products that require proprietary management tools specific to their environment will correspondingly increase the cost of that overall solution.

The services defined here are not complete; we assume that enterprises will need other services for other systems. These services are the ones we believe are necessary to support the services that we have identified as enterprise application integration and transport services.

Audit. Records EAI system and service events, both normal and abnormal.

Recovery. For a process that fails, this service recovers the state of the process to a previously known status that preserves data integrity.

Physical Services

This physical pattern proposes a generic physical implementation of the logical services defined in the EAI logical pattern. Physical solution categories are identified and described, and the logical service components identified in the preceding section are related to products and technologies.

The mapping of these services is proposed based on the assumption that the EAI solution will be built by using the Microsoft® Windows Server System and technologies.

Problem

How does EAI technology from Microsoft support the logical services defined in the EAI logical pattern, and demonstrate appropriate architectural attributes (such as scalability, reliability, and manageability) for those designing a solution based on the architecture?

Context

Each organization has its own integration requirements shaped by both its business requirements and the history of its previous investments in IT infrastructure and applications. A consequence of these factors is that there is no single "all-embracing" EAI solution that an organization simply implements to resolve its integration needs. Rather, a solution is assembled from various products and technologies that provide the required logical services, based on an organization's unique business requirements.

In the "Logical Services" section, we described a collection of services that can be used to address different EAI solutions; the logical services layer provides a "palette" of services that different EAI solutions can use. Different EAI solutions will use different sets of services from the logical layer in this EAI pattern.

With this in mind, the physical layer is divided into a number of sub-patterns, each representing the three main "modes" of integration in an EAI solution. Again, as with the logical services, different EAI solutions will utilize different sub-patterns and combinations of the different sub-patterns.

By subdividing the physical layer in this manner we can provide self-contained scenarios that give a context in which the mapping between logical services and physical products and technologies can be explored.

The three main modes of integration in an EAI solution can be generalized as follows:

Data. Integration of heterogeneous data sources

Application. Integration of heterogeneous applications

Process. Integration of business processes

Higher layers of integration are built upon the lower layers.

The integration of a business process requires that the participating data sources or applications must first be integrated. The integration of data sources and/or applications must be considered as the foundation for implementing the higher-level services of process integration and the increased business value that this brings. Before an organization can consider moving to these higher-level services, a solid framework must first be established for integrating data sources and applications.

For many organizations, significant business value can be achieved simply through the establishment of a core EAI service, addressing specific and common application or data integration requirements, without ever moving to the next-level integration of business processes.

In developing an integration solution, a common design decision is whether to integrate with an application at the data layer or at the application layer. In many cases the degree of choice is dictated by the availability of an appropriate application programming interface (API). Where genuine choice is available, it is strongly recommended that integration is achieved at the application layer rather than directly at the data layer, because data-layer integration can potentially bypass application logic that protects the data, and lead to adverse effects on the application itself.

Building a Solution with Microsoft EAI Products

Microsoft has a number of products that provide the comprehensive set of logical EAI services introduced in the previous section. The product providing the greatest coverage of logical services is Microsoft BizTalk® Server. BizTalk Server is complemented by the following products and technologies that can be used in conjunction with BizTalk Server to fulfill additional logical services:

Microsoft Host Integration Server

Microsoft SQL Server™

Windows core services including:

XML Web services

MDAC data connectivity services

COM+ application services

A key point here is that EAI is a solution, not a product. Microsoft has several products that support the development of an EAI solution; these products cover different parts of the EAI landscape, and can be integrated to provide a complete EAI solution.

The following products can be used together to support the development of comprehensive EAI solutions:

BizTalk Server. An application integration server designed to support complex, distributed business processes. In particular, BizTalk Server offers a core messaging engine providing an integration framework together with "orchestration" — the ability to graphically depict a business process in flow-chart like manner, and link this to executable components. Orchestration technology is a powerful tool when addressing long-running and complex procedural problems.

Host Integration Server. A comprehensive set of transactional gateways, services, and connectors that facilitate integration between the Microsoft Windows® platform and host environments such as CICS, IMS, AS/400, and UNIX. It includes advanced features such as password mapping, XA transaction support, CICS application access, and virtual private networking (VPN) for Internet-based applications.

SQL Server. A powerful relational database that also offers a number of data integration features through the ability to replicate databases, transform and import data by using Data Transformation Services (DTS), and execute queries across heterogeneous data sources.

Windows core services. The Windows platform ships with several core services that can be leveraged in EAI scenarios. Windows core services include the following services:

XML Web services. The fundamental building block in the move to distributed computing on the Internet moving forward. Open standards and the focus on communication and collaboration among people and applications have created an environment where XML Web services are becoming the platform for application integration. Applications can be constructed by using multiple XML Web services from various sources that work together regardless of where they reside or how they were implemented. XML Web services are successful for two reasons: First, they are based on open standards, making them interoperable, and second, the technology used to implement them is ubiquitous.

MDAC is also supplemented with data connectivity components in Host Integration Server 2000 to provide access to mainframe data sources.

COM+ application services. COM+ services provide transaction and security boundaries around business logic components that can be used to manipulate data as it flows through an EAI process.

How Microsoft Products Satisfy EAI Integration Requirements

As described earlier, an EAI solution can be broken down into three types of integration: data, application, and process. The following appendices describe the EAI solution in detail as it relates to these three types of integration:

Appendix C: Integration of Heterogeneous Data Sources

Appendix D: Integration of Heterogeneous Applications

Appendix E: Integration of Business Processes

In addition, these appendices describe how each of the various Microsoft products has a different part to play in the development of an integrated EAI solution.

The following illustration shows the main integration areas into which each product fits. Note that there is a separate connectivity layer around the individual networking technologies that provide the low-level connectivity for the upper integration layers. Appendices C, D, and E describe this connectivity layer in context with the relevant integration layer — for example, data connectivity falls under data integration.

Implementation

In this pattern document so far, we have identified the services required to support EAI solutions in a general sense. In this section we will describe some implementation scenarios that are representative of the many different types of EAI solutions.

We will use scenarios to describe typical implementation patterns and to focus on specific implementation topics. Because the scope of EAI is so substantial, we recognize that we do not cover every implementation pattern here. We have selected the ones that we have observed as the most common in practice, effectively reflecting the cross section between common EAI deployment challenges and the thousands of organizations using the Microsoft platform for EAI integration today.

The EAI deployments of these customers are primarily based on Microsoft® BizTalk® Server, which acts as the foundation upon which EAI solutions are deployed on the Microsoft platform. When we look closely at these deployments, we see specific scenarios that are quite common and that address integration challenges that are common and well known in organizations that possess heterogeneous infrastructures.

We will define these observed deployment patterns as classes of implementation scenarios, which we will call variations.

Problem

How are Microsoft EAI technologies used to implement all or parts of common EAI implementation scenarios?

Context

In the "Physical Services" section we segmented the problem into these components.

In our implementation solutions we will focus on application integration and process integration.

Implementations that support application and process integration are addressed directly in the following variations.

Data integration that is related to "pure" data implementations (where no application or process integration is involved in driving the data integration needs) is not dealt with in this pattern's implementation section. That implementation is dealt with in the Microsoft Data Warehouse pattern or the Autonomous Computing pattern (depending on whether data needs are Consolidated Informational or Reference Data).

Solution Variations

We will define our solutions in these major implementation variation classes. As previously described, these implementation variations are based on an analysis of organizational deployments using BizTalk Server and the additional complementary services that are represented in the Microsoft EAI platform solution.

A classic hub-and-spoke messaging integration solution

This addresses EAI integration at the application layer. This solution is characterized as a central logical hub that provides data transformation, routing, receipt, and delivery services, among other complementary functions, to address the common challenges of heterogeneous application integration. In this approach, various distributed applications communicate with the hub through messages, and the hub performs all intelligent processing and state management of those processes.

This also addresses application integration. As in the preceding solution, this solution is characterized as one where a central hub provides data transformation, routing, receipt, and delivery services as well as other complementary functions, but in this variation distributed nodes exist that also possess these capabilities. The result is a centralized primary hub, with distributed processing nodes that can perform application integration locally as well as communicate in a structured and automated way with the central hub.

Business process orchestration solution

This addresses process integration in addition to application integration. In this variation, orchestration services are used to perform higher-level business process automation functions, leveraging the services of the environment to handle challenges of state management, transaction management, error and exception handling, concurrency, and other rule-based considerations of business processes. Leveraging these more sophisticated "state engine" services as part of defined and deployed processes may or may not utilize core data transformation and routing services also present in the platform, though in practice, those services are utilized.

Web Services solution

This addresses application or process integration with the ability to expose the service outside of the normal trusted domain within which EAI has traditionally operated. Web Services solutions are normally characterized by utilizing standards-based messaging technologies to expose an existing (often legacy) application as a Web service, as well as combining Web service interactions (and often non-Web service interactions) into higher-level business processes.

In each of these variations BizTalk Server will be a core component of the overall solution. In each case where BizTalk Server is used, BizTalk Server itself can be implemented in a set of architectural patterns that depend on the non-functional requirements of the implementation.

The following Microsoft white papers describe various scales of BizTalk Server implementations, from the smaller to the larger configurations. These white papers are downloadable from the Web, and are based on best-practice advice from Microsoft. These papers also advise about how to best implement resilience in your BizTalk Server implementation.

Solution: Hub-and-Spoke Messaging Integration

This is a common implementation and addresses most application integration scenarios with acceptable scalability and availability, while based on a model that predictably minimizes complexity as well as deployment and maintenance costs. Examples are:

Integration of systems within an IT environment, as described in the "EAI Business Pattern" section.

Integration of remote environments — such as stores for a retailer or branches for a financial institution — to the central IT systems.

The key design consideration for this environment is that there is only one (logical) broker environment, which connects to the systems that it integrates by using the interface language and protocol that is appropriate for each system. The following illustration shows this environment.

Often this approach is facilitated by adapters that support integration with various ISV software products. These adapters can be application adapters or technology adapters.

As a common example, an application adapter for SAP supports the SAP APIs and can expose these in a structured fashion. This allows organizations to integrate with SAP without having to duplicate the integration work required to leverage those APIs, or the broker vendor having to support hundreds of common systems.

When the target system is not a package, but a specific technology, a technology adapter can be utilized. An example is the adapter for integration with IBM CICS transactions. Using this adapter avoids the "plumbing" problem of connecting to CICS to pass the service request or receive the reply. However, all the core work of understanding how to construct the request, and hence how to define the mapping between requester and service provider, has to be done by the EAI service installer.

The broker primarily takes responsibility for mapping requests from one format to another, for routing requests to the correct service providers, and for integrity of the end-to-end transaction. However, this mapping and routing can potentially result in challenges with scaling up the service when the load becomes substantial. While a single broker can deal with hundreds or thousands of interfaces depending upon the volume involved in accessing those interfaces, the difficulty can quickly become one of management. This can be addressed by introducing multiple brokers to manage subsets of the systems to be integrated. Subsequently integrating the intelligent hubs themselves is the subject of the next implementation solution described (hub-and-intelligent spoke).

How BizTalk Server Supports a Hub-and-Spoke Configuration

BizTalk Server Messaging

By using the deployment patterns described in the following topics, BizTalk Server can be scaled to efficiently process extremely large amounts of message traffic. The pattern also provides redundancy and fail-over in the architecture, and has no single point of failure.

BizTalk Server itself can be adapted to communicate with a variety of different systems by using common standards and protocols, or by creating custom components to allow the adaptation to other systems.

When using BizTalk Server to process the messaging requirements of the integration, it should first be broken down into the steps by which it processes messages. This happens in two discrete stages: receiving a message and processing a message.

Receiving a Message

An inbound message can be posted to an Active Server Pages (ASP) page or directly processed by an ISAPI filter that passes the message to an inbound message queue. The BizTalk Server receive function removes the message from the queue and puts it in the BizTalk Server shared queue, which resides in an SQL database.

This is only one example of receiving a message; the message can be received in many of different ways, including:

File drops

The file is placed at a specified point on a file system (an FTP directory or similar) and BizTalk Server polls the location at set intervals until it finds the file. The content of the file is then placed in the BizTalk Server shared queue.

Message queues

The message is placed in a specified message queue and BizTalk Server polls the location at set intervals until it finds a message. The content of the message is then placed in the BizTalk Server shared queue.

Processing a Message

When a message is placed in the shared queue, it can then be accessed by the BizTalk Server processing server to be passed on to a destination system; this is done by BizTalk Messaging Services.

BizTalk messaging can send the message out by using several methods, including:

Message queues

External applications

When sending or receiving messages to or from external applications, an application integration component may be used. These components should reside on the processing server and not the receive servers because they are a core part of the BizTalk Server processing functionality. These components can be purchased or created to adapt to the application that is being used.

BizTalk messaging not only allows the simple delivery of messages, but also provides a level of transformation (mapping) from one message format to another. This translation ability allows different types of messages to become interchangeable.

BizTalk Server Configuration

In order to provide scaling and resilience to BizTalk Server, the configuration of a BizTalk Server solution must be understood.

BizTalk Server utilizes a database to store both its configuration and the shared message queues. These two items can be located in separate databases if desired to provide greater scaling on a larger deployment. Further, clustering services can be employed to guarantee a highly available integration service.

The following illustration shows an example BizTalk Server deployment.

In this deployment, we have used two receiving servers and six processing servers. The database server is running in a fail-over cluster configuration to provide resilience.

BizTalk Server Deployment

Deploying BizTalk Server into a server farm environment (specifically represented by a BizTalk Server Group in the deployment guide) is normally recommended when high-scale as well as complete fault tolerance is required. To accomplish this, a three-layer architecture is recommended to deliver the application functionality for Internet-type client support. These layers are described in the following paragraphs:

Access/acceleration resources layer. Devices at this layer sit closest to the incoming network resource and provide functions such as load balancing, intrusion detection, and caching.

The load-balancing layer presents a single system image to clients in the form of a virtual host name and distributes client requests across multiple application servers. This approach, for example, works perfectly for load balancing incoming HTTP requests across multiple physical servers, providing scalability to the service as well as true fault tolerance, in addition to organized management for the array of application servers. There are a variety of approaches to load balancing, including round robin DNS and various intelligent load-balancing technologies.

Round Robin DNS (RRDNS) is a method of configuring a Domain Name Service (DNS) server so that DNS lookups of a particular host name are sequentially distributed across a pool of IP addresses instead of a single address. Each successive client visiting a site is directed to a different application server, thus achieving a rudimentary load-balancing effect.

RRDNS is inexpensive and straightforward to implement, but has shortcomings that make it a poor choice for load balancing in a server farm environment. The principal problem with RRDNS is that there is no mechanism that enables the DNS server to receive feedback about the current load and availability of the application servers. If a server is overloaded or offline due to a failure, the DNS server will unwittingly continue to send clients to the failed server until an administrator manually updates the DNS configuration. Even then, there is often a considerable delay as the changes propagate through the Internet DNS system. During this time, clients directed to the failed server will experience frustrating downtime.

Another challenge with RRDNS is that it makes poor use of application servers that have different processing power and I/O characteristics. The round robin assignment of clients to servers assumes that all server resources are equal. If one application server has a 166 MHz processor and the others have 400 MHz processors, RRDNS will quickly overload the slower server and cause it to deliver poor service to the end users.

In response to the deficiencies of RRDNS, many vendors have developed intelligent load-balancing products. Examples of such products include Cisco Content Switch and Microsoft Windows® Load Balancing Service (WLBS).

These products typically present a virtual IP address to clients and use sophisticated algorithms to intelligently distribute load across an array of application servers. These products can balance the load across application servers with different processing power and I/O capabilities without overloading any individual server. This allows administrators to mix and match servers of various capabilities to optimally leverage their aggregate capacity.

Unlike RRDNS, intelligent load balancers automatically detect failure of a server in the array and instantaneously route traffic to another server. Most products also support fail-over configurations so that a backup unit can quickly take over if the primary unit fails. This prevents the load-balancing layer of a server farm environment from becoming a single point of failure.

A firewall at this layer will provide the first line of defense, for the underlying site, against specific types of intrusion into the site.

Web/application resources layer. These resources are servers that process most of the business logic for the application. Typically, they run a Web server and supporting services. An example of this is a server that runs specific application logic built into scripted Web pages and COM objects.

Data resources layer. The data resources layer is where the application data is stored, accessed, and managed. The server farm architecture avoids replication problems by providing a highly available centralized file server rather than replicated data on application server drives. This layer contains the database that stores e-commerce data, such as product catalogs, user registration information, shipping information and site activity logs. It also may provide connectivity to other systems that hold application data resources, such as SAP, Siebel, and legacy applications.

In addition to these three layers, three global components of a server architecture farm — security, operations/management, and integration — require special attention.

Security. The server farm architecture allows for integration of required security components based on customer business requirements. Server farm architectures allow for strategic deployment of multiple firewalls between the distinct layers, such as a firewall between the access/acceleration resources layer and the Web/application resources layer, and another between the Web/application resources layer and the data resources layer.

Operations/management. Server farm environments involve many different servers that work together to deliver a complete application. A robust approach to systems management is required in order to keep all the systems online and running at peak performance. Internet applications deployed by using a server farm architecture can be integrated with all popular management frameworks.

Integration. Enterprise environments contain many best-of-breed applications. Frequently, these applications do not natively talk to one another. Integration provides a comprehensive set of services to facilitate connectivity between applications.

Suggested Architecture

The following illustration shows the general architecture that is suggested for a BizTalk Server deployment of this type.

This provides a full network topography for the BizTalk messaging deployment. In this deployment, the receiving BizTalk Servers are placed in the DMZ of the Web site with the processing servers sitting behind them. The firewalls give a double level of protection, both on the inside of the network and within the DMZ. The network is held within two separate domains to provide extra security. Cisco CSS load balancers can provide the load balancing between the processing servers, and full fail-over is implemented between the twin links to the Internet. Each server uses its dual-ported Network Interface Cards (NICs) to allow communication in the event that one of the switches fails. Although not shown on the diagram, the BizTalk Server receive processors should also be dual-homed, that is, one NIC for front access (Internet side) and one for back access (processing side).

All systems can be managed from two management platforms (one on the DMZ and one in the back-end domain).

Smaller deployments are also possible, although they will not have the same level of availability as this larger type of deployment. In this description of the deployments, the level of detail for the networking infrastructure will be very high and will not include the full resilience because it depends entirely on the balance between availability and performance.

Small-Scale Implementation

Logical View

In this deployment, all of the systems can reside on a single computer. This solution does not have much resilience or scalability, but can process a small number of messages and would be adequate for a non-business-critical path.

Physical View

This server would not need any interconnecting switches because it is running on a single computer. It would, however, need to be connected to a network to accommodate the clients and destination applications; this would depend entirely on the deployment environment.

If required by load demands, SQL Server could be put on a separate computer.

Medium-Scale Configuration

In this configuration BizTalk Server receive services, BizTalk Server processing services, and SQL Server databases are placed on separate computers. This gives improved processing throughput for either large documents or a larger number of messages.

Logical View

If any of the three servers fail, the system will no longer be available.

Physical View

Highly Available and Scalable Solution

This configuration provides for a highly available and scalable solution.

Logical View

This arrangement provides for availability in the event of an issue with either of the BizTalk Servers. At this point the single point of failure lies in the database server, which could be clustered to eliminate the risk.

Physical View

Note that this diagram does not include all the necessary networking pieces. To provide a fully available system, the appropriate switches and other pieces must be included. This configuration is highly scalable by increasing the number of BizTalk receive servers and adding the appropriate number of BizTalk processing servers in the ratio of one receive server for every three processing servers.

Scaling the System

To increase the processing of messages in this system, the system can be scaled up in the following manner, depending on the volume of traffic required:

SQL Server cluster (scale up). The number of processors in the SQL Server cluster can be increased to allow for additional processing. If required, an additional cluster can be added to deal with the shared queues separately from the BizTalk Server application's databases.

Processing and receive servers (scale out). These can be added in varying ratios depending upon the processing load and its characteristics.

Example: Retailer

The classic hub-and-spoke architecture provides a pattern that is commonly used by retailers, as shown in the following illustration.

Store sales information is collected and forwarded to the EAI service, either in a trickle-feed or batched-file approach. This is typically done by using MSMQ, or MQSeries, to a BizTalk Server receive service in EAI. The store sales information is often formatted in XML so that it is easier for the EAI service to understand and manage the transformations and routings that it will perform.

The EAI service can now route to legacy systems that manage provisioning and other services. Because these are usually mainframe based, Microsoft Host Integration Server is used to connect to the legacy system.

The EAI service can also route the data to new services. We illustrate this with a Data Warehouse service. If this is implemented on Microsoft SQL Server™ then the DBMS itself understands XML and no transformation is required to store the data. With a trickle-feed system from the stores, near-real-time decisions can be made in the warehouse regarding supplier requests that should be sent out through the EAI systems for restocking of specific stores and store contents. A major retailer has implemented a fraud-detection process that works from this warehouse content and has proven very successful in practice.

Solution: Hub-and-Intelligent-Spoke Messaging Integration

This is a more recent implementation variation. Examples are:

In the preceding solution, the number of systems to be integrated grows so large that it is best to manage them with several brokers.

Integration of large departments with autonomous IT environments that are significantly complex in their own right. The distinguishing characteristic of a department is that it must behave as part of the corporate whole, while requiring autonomy due to size and possibly business requirements.

Integration of remote environments such as divisions for a conglomerate, or companies within a holding group. The characteristic that distinguishes these environments from the "department" concept is that these are probably totally autonomous and very likely have no requirement to behave as part of a corporate whole. Nevertheless they want to leverage each others' strengths for competitive advantage.

The key design consideration for this environment is that there is one (logical) broker environment for each of the participating major entities. This broker accesses the systems that it integrates by using the interface language and protocol that is appropriate to each system. But then it also talks to the brokers of the other entities in the group. In this way, the entities can preserve complete autonomy of their own domains. There is no intrusion into their domains by adapters from other domains, which would create a tight coupling. Brokers request services from other brokers; brokers maintain a boundary that insulates the other systems in their domain. Services can be protected and managed through this isolating interface.

This can be implemented in more than one way. One is the network of brokers, where all brokers are peers and each has full knowledge of all the other brokers in the network and what services they offer. This approach works well as long as scaling up the number of brokers is not a requirement. However if the number of brokers is likely to be tens or hundreds, then the "metadata" problem of knowing what brokers exist and what services they offer becomes a non-trivial management problem.

The following illustration shows the configuration that we see as a common pattern for this solution.

In this configuration the top EAI service has two roles. It functions as an integration service for any local services that it needs to support. These initially tend to be minimized to the set that is useful in its role of "manager of hubs," but very soon other benefits emerge. One key benefit is that these services can now aCct as "virtual organization entities" as we will explain later.

This is the pattern used by the UK Government Gateway, an extremely large BizTalk Server deployment for the central UK government that is the integration foundation for over 200 central government departments and over 482 local authorities. This pattern has been far more specifically developed and defined, because it has been packaged by Microsoft Europe and is now offered as the Microsoft Gateway Solution Offering (GSO).

Example: Gateway Solution Offering by Microsoft EMEA

GSO Goals

The Gateway Solution Offering (GSO) aims to reduce the complexity and cost of enabling electronic channels for delivery of electronic relationships and services between an organization's internal departments, partners, and customers. These factors prevent customers and businesses from receiving the benefits of "virtually integrated organizations."

A successful solution does the following:

Allows the quick and easy adoption of electronic services; for example, leveraging existing departmental back-end systems as appropriate to reduce time to market.

Loosely couples the front-end services from the back-end systems to allow each to be implemented and evolve at its own pace.

Avoids duplication of the facilities and services necessary to connect individual departments within an organization to customers over the Internet.

Provides a basis for delivering joined-up services by centralizing an authentication service and enabling a customer to interact with many departments within an organization.

Enables the provision of customer-driven applications that can interact with organizations in a consistent manner.

GSO Relationship to the EAI Pattern

The following illustration shows the services with which the Gateway Solution Offering implements the pattern.

GSO Implementation

The GSO expresses the core functionality of its instantiation of the pattern as shown in the following illustrations. The departmental interface server (DIS) can provide interface services not only from the hub to the spokes, but also to other department servers through the hub. This architectural approach avoids the complexity of the many-to-many connections that would be required between interface servers if they were allowed to talk directly to each other (for example, in a networked-hubs model), while still allowing each distributed hub to leverage the benefits of the pure hub-and-spoke model.

The following terminology and abbreviations are used:

DIS = departmental interface server (GSO terminology)

R&E = registration and enrollment

TxE = transaction engine

The high-level configuration of the system is as follows:

The DIS interface services are implemented as follows:

Solution: Business Process Orchestration (BPO)

EAI needs are driven by business requirements to deliver functions from multiple application or data services in ways that were not originally planned when those services were created. It is logical that the EAI service should also ensure that the business requirement is actually fulfilled — that is, it should manage the business process that requires integration.

This is a complicated issue because there is a question of how fine-grained this process management should be. That question leads to the question of how fine-grained the EAI hub service integration should be. Some designers start with the goal of making the EAI hub the conduit for every future application-to-application or component-to-component request for service. Successful implementations resolve this goal in one of two ways:

Drop the idea when they realize the size of the EAI service required to support the goal, and the challenges of meeting the variety of Service Level Agreements (SLAs) that arise, or

Accept that the EAI service is in fact going to be a huge and complex service and set about building it accordingly.

The fundamental architectural decision for BPO is whether it will aim for full-blown fine-grained workflow services or whether it will operate in a clearly-defined subset. It is normally prudent to start with the subset. BPO provides the inter-domain process integration that ensures that the interactions between the systems being integrated are coordinated with integrity. At the same time, the systems are allowed to handle their internal integrity so that they are treated as trusted domains in providing the service. In the "Logical Services" section this was described as "orchestration of business process."

The typical pattern of use of orchestration is to coordinate events and the consequences of those events occurring. The event is the receipt of a service request. The consequences are the set of other events that it triggers, all of which have clearly defined outcomes that must be achieved for the process to achieve a defined state. Finally, an action is taken to close the contract that is implied by the set of events that happened.

The orchestration process requires great flexibility to deliver this capability. Intermediary events can have different outcomes that lead to different final results. The results may or may not be in response to the original event. Some of the intermediary events may require action by a service that is provided by a different company. Some events may take a long time to respond. BizTalk Orchestration is aimed at supporting this type of BPO.

BizTalk Server Processing

BizTalk Messaging Services are designed to support the receipt of messages that flow into a business process, or to send messages that flow out of a business process. BizTalk Orchestration is designed to manage business processes. Therefore, the two services are designed to work together, with BizTalk Messaging Services providing a receipt and delivery support layer for BizTalk Orchestration Services.

BizTalk Orchestration Services also can use BizTalk Messaging Services to integrate one business process with another by sending or receiving messages between the two business processes.

To send or receive messages between two business processes, you must:

Use BizTalk Orchestration Services to create an XLANG schedule that sends a message and an XLANG schedule that receives it.

Use BizTalk Messaging Services to create a messaging port. This messaging port must be configured to instantiate a new instance of the receiving XLANG schedule and deliver a message to a specified port in that schedule.

Use BizTalk Messaging Services to create a channel for the messaging port that you created. This channel must be configured to receive a message from the sending XLANG schedule.

A common scenario for integrating the two services is the correlation of messages within a single running XLANG schedule instance — that is, to have an XLANG schedule instance send a message to an internal application or a trading partner, and to expect a message in return. An example is sending a purchase order and expecting a purchase order acknowledgement in return.

If you need to apply encryption, digital signatures, mapping, or tracking to message contents, use BizTalk Messaging Services.

New Business Processes with BizTalk Server Processing

BizTalk Orchestration Services is ideally suited for developing business processes. Business-process design and implementation have traditionally been performed in two distinct phases: the visual-design phase and the coding phase. The visual-design phase typically consisted of the analysis of an existing business process (such as corporate procurement) and the creation of a workflow diagram or an interaction diagram to describe the process. The coding phase was usually performed separately. In this paradigm, you would build an abstract visual model of a business process and then map the model to an implementation framework.

One of the important features of BizTalk Orchestration Services is the integration of these previously distinct phases within a unified design environment. This design environment provides a versatile drawing surface and a comprehensive set of implementation tools. BizTalk Orchestration Services enables you to:

Create XLANG schedule drawings that describe business processes.

Implement business processes by connecting specific actions within a drawing to ports that represent locations to which messages are sent or from which messages are received. Ports are named locations, and messages represent the data sent or received between actions and ports.

BizTalk Orchestration Services is also designed to manage business processes that might need to be altered quickly or often. In the past, developers have created COM+ components that controlled the business processes, and more traditional COM+ components that did the work. BizTalk Orchestration Services enables you to replace the business-process-control components with XLANG schedules. However, it is not recommended that you use BizTalk Orchestration Services to define processes at the work level. Instead, use your existing traditional COM+ components. The value of BizTalk Orchestration Services diminishes if it is used to control small portions of a larger business process. Ideally, it is recommended that you migrate all of your business processes to BizTalk Orchestration Services.

Long-Running Transactions with BizTalk Server Processing

In addition to the integration of design and implementation functionality, BizTalk Orchestration Services provides another important feature: the ability to create and manage robust, long-running, loosely coupled business processes that span organizations, platforms, and applications. During an asynchronous, loosely coupled, long-running business process, a product that is ordered over the Internet might have to be built from parts that are in inventory. Some of these parts might even be temporarily out of stock. The entire business process might take weeks or months to complete. In contrast, a tightly coupled business process involves the synchronous exchange of messages. For example, when a customer withdraws money from a bank account, the debiting of the account is immediately followed by the delivery of the money.

By providing an integrated, graphical modeling environment, BizTalk Orchestration Services provides the following important benefits:

When business processes change, the implementation can be quickly and easily redefined.

Concurrent processes can be easily designed, implemented, and maintained.

Transactions (long-running, short-lived, and nested) can be easily structured and maintained.

One of the key strengths of BizTalk Orchestration Services is to manage and maintain the state of long-running transactions. If you already have a means for controlling state, you should migrate that entire process into BizTalk Orchestration Services. Controlling state in multiple places is not recommended.

BizTalk Orchestration Services is a business process automation tool. It is not intended to be a complete workflow system replacement. In particular, it is not intended to define role-based, hierarchical escalation in person-to-person processes. For business processes that contain role-based aspects that are escalated in a no-response situation, these processes are more appropriately implemented as Microsoft Exchange workflows, which can be integrated with BizTalk Orchestration Services.

BizTalk Server Orchestration

The pattern describes the use of BizTalk Orchestration to implement a business process automation engine (or orchestration hub). The hub exposes COM interfaces and allows multiple enterprise applications and clients to invoke and participate in heavily sequenced business processes. Responses may be returned from the hub synchronously or asynchronously. The following illustration shows this process.

Pattern Context

This pattern addresses the need to leverage highly orchestrated business processes, made up of multiple calls to potentially multiple enterprise applications. Such processes often return messages whose content is supplied by more than one data source, usually in the format of an XML document.

An example of such a business process might be "GetOrderStatus" for a B2C e-commerce site that retrieves the customer's details from a CRM application, retrieves the order's pick status from a warehouse management application, and may even interrogate a Web service provided by the transportation company to get the estimated delivery time.

Synchronous responses are often undesirable in EAI patterns, but in a user-driven Web model as described above it would probably be unacceptable for the user to request their order status and then return to receive the results of that request. It would be possible to simulate synchronous processing through the use of middleware between the Web server and the business process engine, but this complexity can be avoided by employing a synchronous processing model through COM.

Thus, in the given example, requests are received through a Web server that makes requests of the orchestration engine for the required order status. These requests, through an orchestrated business process, are fulfilled by interrogating the appropriate data sources, consolidating the responses, and returning the response synchronously to the Web server.

EAI Services Supported

Integration

Data Format

Through orchestration, application-specific request and response formatting can be abstracted from the client, allowing the hub to communicate by using an open format such as an XML-based SOAP envelope. This approach means that existing enterprise applications should require little or no modification, and that third-party Web services can be accommodated.

Furthermore, message metadata can be added and handled through orchestration. This could include timestamps, process audit information, and standardized error reporting.

Orchestration

Transaction Integrity

COM-based interaction with the orchestrated business process allows for full use of the Distributed Transaction Coordinator in the process call. The entire orchestration may be treated as a single transaction to be committed or rolled back as a single unit, or individual transactions may be implemented within the orchestration.

Middleware

There may well be requirements to employ some middleware technology to enable the communication between the business process and each of the back-end data sources. This will almost certainly be the case in cross-platform environments or when communicating with legacy data sources such as CICS mainframe systems.

In addition to the MSMQ port supported as standard by BizTalk Orchestration (and MQSeries supported through the BizTalk Adapter for MQSeries), any COM-enabled middleware technology, or any custom or third-party BizTalk adapter could be used to enable the business process to interact with almost any back-end data source.

Since the solution may or may not employ BizTalk Server Messaging, the Messaging service and the BTA, SQ, and BTM databases are not necessarily required. Also, in the absence of BizTalk Messaging, the overhead created by per-orchestration-instance MSMQ queues is also removed.

A business process is created as an XLANG schedule file in BizTalk Orchestration Designer and is compiled for deployment locally to the business process engine. Each schedule implements a port bound to a generic COM component to maintain schedule state and create a COM interface through which any parameters may be passed into the process with a response returned.

The following illustration shows an XLANG schedule.

Process Wrapper

A generic COM+ application abstracts the interaction between COM and the schedule. This gives the client a consistent and simplified view of the process. This COM+ application can also take advantage of COM+ services such as object pooling to allow for schedules to be pre-instantiated, reducing the performance overhead associated with schedule startup.

Where the business process engine is made available through the Internet, Microsoft Internet Information Services and Active Server Pages will provide client access to the ProcessWrapper component.

Implementation Considerations

Performance and Scalability

The solution can be scaled out for performance by using Microsoft Application Center 2000 to component-load-balance the ProcessWrapper COM+ components across multiple BizTalk Server Orchestration servers. In this instance, all XLANG schedule files and orchestration components are installed locally to each BizTalk Server.

Where client access to the business process engine is over the Internet, further scale-out can be achieved by implementing IP load-balancing across multiple Microsoft Internet Information Services (IIS) servers, which may then use component-load-balancing in their requests to the orchestration hub.

When it is necessary to process high volumes of messages in a synchronous manner, then the use of BizTalk Server may not be suitable because this is not one of the primary use cases for which the product was designed. This needs to be evaluated on a case-by-case basis. Synchronous message processing in BizTalk Server requires the synchronous invocation of an instance of the orchestration engine. When this occurs in high volume situations it could lead to a performance bottleneck.

Resilience

Given the requisite hardware and platform, Microsoft BizTalk Server Orchestration can be made resilient by using Microsoft Cluster Services. This process is described in the document "Microsoft BizTalk Server: High-Availability Solutions Using Microsoft Windows® 2000 Cluster Service."

In the COM-based business process engine solution, all COM components may also be made to fail over by using Microsoft Cluster Services.

Security

In addition to the normal considerations given to security for a Web-enabled application, through COM+ services a fine-grained security policy can be implemented for client access to the business process engine. COM+ security services allow access permissions to be assigned to roles and applied at application, component, and method call levels. This can be especially useful when integrating the orchestration hub into an existing server environment within an organization.

Solution: Web Services

Web services based on XML and SOAP provide a key way forward for integration applications, especially when those applications are separated by the Internet or by a firewall.

There are fundamentally two types of Web services:

Application integration. This uses a Web service in much the same way as current EAI techniques to provide integration between applications or to provide access to defined data from applications or other data sources.

Service-based integration. This extends the EAI model to provide a specific service which may include aggregation, transformation, or some other value-added service to the raw data.

The key considerations for using Web services to integrate applications at the services or process level are:

Finding a service. The Web service publishes itself with a UDDI (Universal Discovery Definition and Integration) directory. In its simplest form this is provided by a DISCO document that resides on the Web server and lists all the services available on that server. A more resilient approach is to provide a dedicated UDDI server that provides a directory to all Web services, which may be running on many servers within the organization.

Identifying the structure of the service. The WSDL (Web Services Description Language) standard describes the service that is being offered and the data format of all parameters that are passed into and out of the service.

Connecting to the service. Requests are sent to the exposed Web service in the form of a SOAP envelope over HTTP. The SOAP envelope consists of two parts: a SOAP header and a SOAP body. Both of these are formatted in XML. The main payload (request or response) is carried in the SOAP body. The SOAP header is extensible, and can be used to carry system- or enterprise-specific data. Because Web services use HTTP, requests can be made across a firewall without impacting the security envelope of the domain. A Web server receives the incoming SOAP message, identifies the service that is requested, and instantiates the component that fulfills the service.

The following illustration shows a typical logical structure for a deploying a Web service.

The Web server exposes the internal systems as Web services. The connection to the internal system can include:

Databases

Host/legacy systems

Other Web services

Integration hub

Where multiple internal systems are involved in providing the Web service, the connection would either be to another Web service or to an integration hub that consolidates the data from the back-end systems as described in the following section.

Exposing Legacy System Applications as Web Services

The following implementation provides the function of exposing legacy services as Web services. It assumes that the legacy services are not capable of exposing Web services without an intermediary software service — which we have been calling a hub. In our implementation variations BizTalk Server provides the hub, and hence it will expose an XML interface for the legacy services that are connected to it through adapters. It will map the XML interface to a format that the legacy system understands. It may or may not use Host Integration Server to connect to that legacy system.

This is an important area for EAI where we expect to see much activity as the GXAdrives WSDLstandards and hence more implementation options.

Web services provide a simple way of implementing a specific business function that may be required by an application. An example is a Web service that returns a foreign exchange rate when a request is made, passing two foreign currency identifiers.

To avoid modifying applications and therefore provide a non-intrusive environment, BizTalk Server can provide the interfaces to existing systems.

Web Service Security

There are two aspects of security that need to be considered:

Authentication of both the service provider by the service consumer, and the service consumer by the service provider.

Encryption of the data and/or authentication credentials while in transit.

A number of options exist to implement authentication:

Digital certificates embedded in the SOAP message. This is most useful where a PKI infrastructure already exists. This requires code in the client and server applications to handle the certificates.

Web server authentication. This can use any of the methods supported by the Web server and requires no modifications to either the server or client code other than error handling on the client.

Passing authentication credentials as part of the SOAP packet. This would typically consist of user name and password, either embedded in the SOAP header or as part of the SOAP payload. Because these credentials would typically be in plain text, the link would need to be encrypted to protect the data.

Kerberos ticket. Again, this would be embedded in the SOAP headers.

The SOAP message will usually need to be encrypted to protect the contents from being changed or revealed. There are a number of options for achieving this:

Encrypt the SOAP body and/or defined data within the SOAP header fields. This requires the SOAP message to be intercepted before being sent on the wire and before being passed to the Web service. This can be implemented within the .NET framework by using the SOAP extensions. Depending on the encryption technology being used, this can occur either before or after serialization.

Synchronous vs. Asynchronous Modes

Using the .NET implementation, Web services can be implemented in either a synchronous or asynchronous mode.

Web Service Versioning

As Web services become a core part of the application infrastructure, more and more processes will rely on them. It is important to maintain existing interfaces and use versioning of the Web services to add incremental functionality.

Handling Binary Payloads

SOAP payloads are carried as XML, which is a text-based standard. In order to carry a binary payload, such as a scanned image of a document, the binary must be encoded as text. This significantly increases the size of the payload and the performance because the XML parser has to scan through the entire encoded document.

A more efficient way is to carry the SOAP message and the binary payload as part of a MIME multipart message. This is described in a white paper published by the W3C.

Physical Implementation

The physical implementation for a Web service infrastructure is the same as for any Web-based delivery. The incoming SOAP messages are distributed among a number of Web servers in a Web farm by a load-balancing mechanism. Because many designs already exist for these, this will not be described in detail here.

UDDI – EAI Service Registry

An enterprise-scoped UDDI registry provides the ideal mechanism for the publication and discovery of the EAI services exposed within an enterprise. In combination with the supporting Web service technologies (SOAP, WSDL, etc.) this can form the foundation of an open EAI infrastructure within an organization.

Web Services Summary

SOAP and the associated Web service protocols offer a clear opportunity to create open EAI standards within an organization, thus facilitating the use of the EAI services across platforms and applications.

Appendix A: Constraints

The following constraints impact an EAI solution:

Cost. Characteristics of infrastructure and environment, such as specific platforms, or specific business or technical requirements, such as real-time integration, may have a significant impact on the cost of integration

Time to delivery. The complexity of the integration requirements; for example, complexity of business processes that need to be integrated, or the change dependency on legacy applications can significantly extend the time required to deliver an integration solution.

Agility. The fundamental principle of good integration may be compromised because of technical challenges of the environment or business challenges of the processes to be integrated.

Technical risk. Ultimately all IT change incurs some technical risk and the impact on technical risk due to the complexity of integration should never be underestimated.

The following constraints will have an impact on any number of these aspects of an integration project. There are predominantly two categories or constraints that we have focused on:

Process constraints. Constraints arising from the implementation and operation of technology. Explained in detail below.

Technical Constraints

Overburdened Applications

Many business applications that have grown beyond their original planned capacity are already overburdened with demands for resources. Opening up new interfaces can add extra burden to the limited application platform and have a serious impact on its operation. For example, if we add a database extract/load tool to a live database application that is already reaching capacity, particularly the load phase of the tool could exhaust the remaining resources of the application platform. Designers should take great care to plan the impact on performance and capacity of the integration tools they wish to implement. Whatever the technology, some existing resources will be consumed.

Secure Integration

Legacy applications and databases often have security mechanisms that were designed specifically for trusted users authenticated within a secure and trusted environment. It is often possible to extend these authentication procedures to integration technology, the integration connector being recognized as a user. However, integration technology often extends that environment beyond the secure and trusted zone. The risk is that the business assumes that the integration architecture that secures the exchange of business documents also extends to the existing application or database. If the security of the application or database cannot be stepped up, then this will represent a security hole in the entire integration architecture. In such cases it may be necessary to design a new security model and implement new security technology.

Brittle Applications

Some legacy applications are extremely brittle. Changes to the application can cause the application to fail. This can be for many reasons, for example, because the application has reached the limit of its performance capacity or the application was poorly implemented and is extremely susceptible to failure. Experience shows that businesses have often tried to address the limitations of these applications by implementing EAI. Unfortunately, this strategy alone is rarely successful unless the problems of the legacy application are either resolved or insulated by the EAI technology. Non-invasive middleware can be used to offload processes from the application.

Interface Mechanisms

Some applications or databases do not have open interfaces that can be used easily by EAI technology or its adapters. This can severely constrain the options for integrating the application or database, particularly if it is constrained by other factors in this list. In such cases it is often necessary to develop and implement invasive middleware connectors or develop and implement an API.

Process Constraints

Lack of Skills

The complexity of the implementation of EAI has been greatly reduced in recent years. However, this EAI technology often still requires some specialized skills that businesses do not have easy access to. Many of these skills may be commonly available if the technology is not proprietary and makes the greatest possible use of open standards. One of the more common standards is XML, and the market for XML skills is growing rapidly.

Management of EAI architectures is not a trivial task and much EAI technology has complex management interfaces or poorly implemented interfaces that require specialized management skills.

Poorly Documented Data Model

To integrate directly to a database or to an application with an underlying data store and at the same time maintain the integrity of the data, it is extremely important that the data model is well understood. This is often not the case and can represent a significant risk when implementing an interface. Today there are many tools that can help to build a schema of the data model, often part of the database itself. However, these tools are unable to express the business semantics of the organization of the data or explain the business relationships. Furthermore, many businesses implement business logic in the data store by using database triggers or stored procedures. The business rules of this logic also need to be understood.

Ever-Shrinking Batch Window

Many applications rely on a batch window to execute processes that cannot be executed when the application is online. These processes can be essential housekeeping or administrative tasks, or even data uploads or downloads. Integration to some legacy applications is possible only during these batch windows when the application is offline. The implementation of new integration technology can have a significant impact on the completion of tasks within the batch window.

The typical scenario is a business-critical application that was originally designed to operate during working hours of 08:00 and 18:00 but is now running live between 05:00 and 23:00. This means that the offline window has shrunk from 14 hours to 6 hours. Also, because of the increased workload during the online working day, the housekeeping activities have increased from 3 hours on average to an average of 5 hours. The design and implementation of integration technology must have a limited impact on this essential batch window. A typical solution is to prioritize parallel tasks and allocate resources accordingly.

Poorly Defined Contract

When two or more database applications are integrated and they exchange data it should be governed by an integration contract. This contract defines the interfaces and the service levels that are expected from the integration. When contracts are poorly defined, the service levels of the integration will be unpredictable and more difficult to manage. If the interfaces are poorly defined, then integrity of the data that is integrated is not guaranteed.

Unregulated Integration

When integrating any application or database it is extremely important that the interfaces and integration contracts between the business systems are well-documented and regulated. Unfortunately, because much legacy integration has evolved over time, the principles of quality management may not have been exercised, so the existing interfaces are poorly regulated.

Closed Operations Management

Because integration architectures extend the reach of applications and databases, connecting different operating domains that were previously unconnected, it is necessary to consider a management environment that spans all the applications and databases that are connected. This is often not possible because the legacy or proprietary system does not have an open management interface.

Poorly Defined Security Strategy

Integration extends the security domain of an application or database. A business that is integrating applications and databases must develop an integrated security strategy (if not an enterprise security strategy) that at least includes the applications and databases that are being integrated. Many businesses do not have such a security strategy, and integrating applications and databases that are not governed by a security strategy may compromise security.

Poorly Defined Governance Strategy

Integration extends the governance domain of an application or database. A business that is integrating applications and databases must develop an integrated governance strategy of the applications and databases that are being integrated. Many businesses do not have such a governance strategy, and integrating applications and databases that are not properly governed can introduce unpredicted operational events, making systems management as a whole extremely unpredictable.

Appendix B: Logical Services

EAI Services

Integration

Parse

After the data has been decrypted, decoded, and queued, the EAI tool is ready to start processing the business information. First the data has to be parsed so that it can be processed by the other EAI services. The Parse service takes the serialized data, an input stream from the network that has been stored by the Queue service, and constructs structured data out of the stream. To do this the Parse service firsts needs to recognize the data; this is done by inspecting the stream and identifying the data type. After the data has been identified, the schema that corresponds to the data can be found and the data structure can be constructed out of the queued data. If a corresponding schema cannot be found then the service has to raise an exception that will be managed by the Management services. A likely outcome will be to send a notification to the sender, or often to return the data to the sender.

Map

After the data has been validated then the Map service tries to map the data to the output data. This service uses the map created by using the Map service in the Configure services. The Map service knows which map to apply because the EAI tool has inspected the input data. From that inspection the EAI tool knows where the data came from, and the identity and the schema of the input data. It may also know from inspection of the input data the map identity or the identity of the output data. Depending on the intelligence of the EAI tool it may be able to work this out. After the Map service has identified the output data and found the schemas, control is passed to the next Data Manipulation service.

Filter

The Filter service provides a mechanism for users to filter out information from certain data. There may well be occasions when it is either difficult to filter the information at source or perhaps the filter is only applied for certain targets. This service is very convenient for filtering out sensitive information that a business user does not want to send to certain partners. The service will therefore use the map to identify the receiver of the data and filter on this basis.

Validate

One of the values of EAI tools that many businesses exploit is the ability to add validation to the processing of data. Legacy systems often produce data that may not have the quality that the business now requires, or perhaps changes in business context mean that data that was once correct is now incorrect. The Validate service can be used to validate all elements of the data. The complexity of the validation depends on the EAI tool itself but businesses can expect:

Format validation. A date field requires dates in the UK date format (24/03/2001)

Range validation. A field requires a number in the range from 10 to 10000

Dependency validation. A field must contain a certain value when another field contains a certain value

Mandatory validation. A field must contain a value

Transform

The Transform service uses the rules specified in the map of each data element to transform the contents of each element of the input data to the element of the output data. Today's EAI tools have extremely rich Transform services; the following are some examples:

String manipulation. The data string in a field is manipulated according to some transformation rules, e.g., capitalization of alphabetic characters, or more complex parsing of strings and replacement/modification with substitute characters.

Mathematical conversion. A numeric value is converted by using a mathematical or arithmetic equation.

Date/time conversion. Dates and times are converted to different formats.

Scientific conversion. A numeric value is converted to a scientific value, e.g., a value is converted to the cosine value.

Cumulative manipulation. A field is used as a value in a cumulative calculation such as a sum or an average, or the field itself is the value of the cumulative calculation.

Logical manipulation. Depending on the value in a field a rule is executed and the product of the execution is used; e.g., ranges of values, exact match values can be used to return TRUE or FALSE values.

Format

The Format service is the service that actually moves the content of input data elements to the corresponding elements of the output data as specified in the map.

Compose/Decompose

Some data needs to be merged together before it can be sent on. The Compose service uses the Map service to identify the data to be merged and composes new data out of elements of other input data. The map tells the Compose service where the data elements come from.

The Decompose service is the opposite of the Compose service. A particular piece of data may need to be broken up into several output data items. The Map service will specify the output data and the Decompose service decomposes the input data into the output data.

Decomposition will make use of a multicast service. This low-level service will have two important modes:

Send and Forget multicast. All messages are only sent and no receipt acknowledgement is required.

Reliable multicast. Receipt acknowledgements are required for each multicast message.

Enrich

When input data is being formatted into output data it is possible that the input data does not hold all the information that the output data requires. The Enrich service allows the business user to specify from where the EAI tool should acquire this information to enrich the output data. It is likely that this information is acquired from a data store. The EAI tool may use the Discovery service to find this information source.

Route

The Delivery service actually executes the transmission or receipt of data, but the Routing service allows the EAI service to represent a route to facilitate integration. The Routing service may use an address such as a business name. This would be something like the name of the person and the name of the company. This could then be mapped to an appropriate network address by the Delivery and Address Translate services.

Routing can be:

Explicit. The Routing service uses an explicit address supplied with the input data or in the map.

Content based. The Routing service analyzes the content of the input data or the output data to determine where the output needs to be routed. This latter form of routing is supported to different degrees of complexity by different EAI tools, but is usually based on rules for analyzing the content.

Publish

Some EAI solutions collect information from applications and publish it. The data may be pulled or extracted from applications or pushed to the EAI service by the applications. After the data arrives on the EAI node the EAI services can process it. For example, the data can be transformed as necessary into a format for publication. Part of the publication process will involve identifying a subject or a category under which to publish the data. The functionality of this service may resemble the Routing service in that the content of the data is inspected to determine the subject or category. After this has been determined the data will be published. Publication involves storing the data in a form and in a data store that can be accessed by subscribers. A subscriber is a user or an application that has permission to access the publication. They will normally subscribe to a subject or category.

Orchestration

Schedule

The Schedule service continually examines queues managed by the Queue service and checks for data whose processing needs to be scheduled for certain times. If the data is at the head of a queue but its time has not come, the Schedule service will hold back processing of this data and allow the Queue service to process data behind it.

Transaction Integrity

Transactions can be extremely complex to manage, and distributed transactions are even more so. EAI tools are rarely able to manage transactions in an XA-compliant fashion. Nevertheless, the EAI tool still needs to manage resources so that units of work are processed in an ACID fashion. For some instances of EAI, transaction integrity needs to be maintained so that events in the process are isolated and units of work are not lost.

Process Flow

Execution and management of a defined sequence of events.

Non-Delivery

When data cannot be routed to a target because the Routing service has been unable to locate the target, the Non-delivery service needs to manage the data. Typically the sender will have established some rules or a policy for undelivered data. The Non-delivery service will execute the rule for the particular data or the policy. For example, this might require the sender being notified by e-mail.

Integration Events

There are several levels of events in any system composed of EAI and Middleware services. At this level we are focusing on EAI events or integration events. For example, a schedule for a long-running transaction is due or a message on a failed delivery queue needs to be processed. This service effectively monitors integration events and invokes the correct process for handling each integration event.

Metadata

Data Models

Definition of the structure and syntax of data that is received from source applications and sent to target applications or published. The data models will also define the owners of the data.

Names

Some EAI services do not have a full Repository service but they do need to reconcile names that are used by the services to system names or network addresses. This service will support an interface for defining names and the identities of the entities to which they refer.

Discovery

The Discovery service is used to access Web services that the Data Manipulation services may require, e.g., supplementary information required by the Enrich service.

Configurations

Configurations of any service in the EAI system may need to be persisted in such a way that they can be accessed programmatically or replicated for other systems or cloned systems.

Repository

The Repository service provides a direct lookup service for specific resources that may be required by the EAI tool, e.g., routes for output data, sources of supplementary information for the Enrich service. It contains definitions of the characteristics of all entities, resources, and services within the EAI system

User Profiles

Definition of user characteristics.

Interface Profiles

When implementing an EAI service, an interface catalog usually needs to be maintained to define the characteristics of the interfaces that the EAI service needs to support. Some implementations support the definition of these interfaces through a tool. The Interface Profiles service stores this configuration information.

Subscriptions

Users and interfaces may have subscriptions to certain publications. The Subscriptions service holds the details of those subscriptions.

Message Database

Queuing is used for persisting messages during processing, messages that are waiting processing, or messages that cannot be processed. Some messages may need to be stored in another data store that can be accessed by other services. A typical example is a "publish and subscribe" service, where published messages are often stored in a data store for access by subscribers.

Message Database Index

If messages are stored in the message database it is highly probable that messages will need to be indexed. It is important that a rich indexing service is available on the data store for the messages.

Message Database Search and Query

To complement the Index service the database will also need a service that allows searching for messages and querying the data.

Transport Services

Interfacing

Dispatch

A low-level service that manages the dispatch of procedure or method calls.

Delivery

Ultimately the destination of any data will be represented by a network address. The Delivery service uses the protocol of the network to send and receive data. The network address will be determined by the network protocol that is used. However, the EAI services may not directly use the protocol and the addressing of the network. To simplify the interface between the business user and the underlying communication infrastructure, the EAI service will often provide an interface to allow the business user to assign business names to destinations. Therefore the Delivery service will require an Address Translate service that will resolve these business names into network addresses and vice versa.

There are several modes of messaging; for example:

Single cast

Multicast

Sending units of data from one node to another

Request and reply

Conversational

Send and forget

Reliable

Guaranteed delivery

Message Queue

Queuing is the management and ordering of the persistence of messages.

When data is delivered to the communication infrastructure by the Network Communications services, the Queue service persists the data to a queue. A queue is essentially a mechanism to ensure that data can be reliably stored until the EAI tool is ready to process the data further.

This service is a fundamental element of the asynchronous characteristic of the EAI tool. The Queue service allows the EAI tool to manage resources efficiently and makes the EAI tool more resilient and scalable than synchronous integration mechanisms.

Queues can be made resilient by storing the queue to disk, though many queues will be memory resident for performance reasons. Some queue mechanisms use both disk storage and memory storage systems to ensure resilience and optimize performance.

Queues feed the EAI services with data on demand. Depending on the architecture of the EAI tool, many queues can serve many EAI services to balance load. Queues can also be distributed if necessary. In a sophisticated EAI product implementation that is capable of handling multiple queues, the EAI tool requires a management service for managing the queues. This Queue service allows the system administrator to administer queues. The best service is one that allows the administrator to administer queues dynamically so that changes take immediate effect while minimizing negative impacts on the processing of data. Clearly, queue administration requires some understanding of the EAI tool architecture and its operating qualities, but the interface to the Queue service should make the task of administration much easier than administration through a command-line interface.

The service may have the following functions, but it heavily depends on the architecture of the EAI tool itself:

Create. A new queue is created, ordinarily to overcome a bottleneck, because of the ratio of data being processed to the number of queues available.

Pause. Processing of data from a queue is paused for a specified time after which the queue will restart processing automatically.

Start. Processing of data from a queue is started from the head of the queue.

Stop. Processing of data from a queue is stopped until explicitly restarted.

Flush. Process all the data in a queue and stop the queue from processing more data.

Copy. Data that is in a queue is copied to another queue, for example, when a particular queue is overloaded.

Destroy. A queue is removed because it is no longer needed, but processing of all data in the queue must be completed first.

Serialize/Deserialize

The output data structure is taken by the Serialize service and the data is serialized into a file that can be transmitted across a network. Deserialization is the reverse.

Address Translate

The service that translates between the (logical) business addresses assigned to the destination of an integration interface and the network address required by the network protocol used by the Delivery service. This address may be stored in the Repository and the Address Translate may need to use the Repository service to translate addresses.

Decode/Encode

Different operating systems use different code pages for representing characters that people can read. When the EAI tool receives data from a different operating system, the Decode service is invoked to convert the data to the same code page as the platform that the EAI tool is running on.

If the EAI tool knows the code page of the target system, the EAI tool may encode the character set of the output data into the code page of that system.

Security Services

Security

Authenticate

Validation of the identity of the user or interface that wants to access the service. Users may be authenticated by supplying a user account and password. They would probably be authenticated to access the configuration or administration services of the EAI service. The interface may be an API that is passing a token, certificate, or login data from an application or a message that contains a token, certificate, or login data. The interface would in such a case probably be the means of communicating with the application that needs to be integrated. Authentication will be required when the EAI service has to ensure that users or applications that want to access any services are who or what they say they are.

Authorize

After a user or interface has been authenticated — in other words, the EAI system is satisfied that the user or interface is who they claim to be — the system will want to manage what the user or interface is allowed to do. The permissions that the system allows the user or interface may depend on the role of the user or interface. For example, a system administrator will have permission to perform system administration tasks, which may be denied an application that is integrated to the EAI service. The permissions that the user possesses are also known as privileges.

Encrypt/Decrypt

The EAI service may need to encrypt the output data for security reasons before it is transmitted across a network. Data is usually encrypted to prevent unauthorized access to the information in the data. A cipher is used to encrypt data. There are many types of ciphers, all of varying strengths; the stronger ones will use a key to encrypt the data. The strength of the cipher will determine the ease with which the data can be "deciphered" or understood by unauthorized users or applications.

When the EAI service receives encrypted data, the Decrypt service will need to be invoked to decrypt the data so that the EAI tool may process it. The Decrypt service will use an algorithm to decipher the data, effectively reversing the cipher that was applied when the data was encrypted. Increasingly encryption services are using keys to encrypt data so the Decrypt service will also need to know the key that was used.

The use of keys, using a digital value with an algorithm to encrypt data, is becoming common. Public/Private Key Infrastructures are being introduced more and more by businesses wanting to strengthen their digital security. These infrastructures make use of encryption, certificates, and digital signatures.

Manage Certificates

Digital certificates are used to establish the credentials of a user, interface, or application. A certificate authority that is trusted by the user, interface, or application and the EAI system usually issues them. The certificate will conform to a standard that specifies the information required as credentials. X.509 is a common standard and requires information like the name of the entity whose credentials need to be verified, a digital signature of the certificate authority, and the period of validity of the certificate.

Certificates therefore need to be managed. They need to be issued and stored and the credentials verified. In EAI, certificates may be used to verify the credentials of an application that wants to integrate to the EAI system. The EAI may issue the certificate but it is more likely that a mutually trusted authority will issue the certificate.

Sign

Digital signatures may be used to authenticate the sender of a message to the EAI service or the requester of a service from the EAI service. Unlike a certificate that will be used to verify the credentials of a user, interface, or application that is connecting or initially authenticating itself to the EAI service, a signature can be sent in every communication with the EAI service so that authentication can be performed at a more granular level. If digital certificates are used then the certificate authority will issue the digital signature in the certificate that the EAI service will use to sign messages or data.

Audit

As businesses become more sensitive to security, they need to be able to see the access and activity that users and applications have had with their systems over time. The requirements for tracking the activity within EAI systems are increasing, particularly where there is a requirement for non-repudiation, where a business needs proof that an event happened such as an application sending a message via the EAI system. Some Audit services may also be capable of taking copies of input and output data for the purposes of supporting non-repudiation, for example, in order to resolve disputes between a trader who is alleged to have sent data and one who is alleged to have been sent data.

The Audit service in the Security service is focused specifically on the events that are detected by the Authenticate and Authorize services.

Management Services

Management

State Management

This service is responsible for the management of the integrity of the status of processes

There are several mechanisms that ensure the resilience of the EAI tool. The State service is one of them. The Queue service ensures that messages are not stored while waiting to be processed. Transaction Integrity ensures that the EAI system manages the units of work as atomic, isolated, and consistent units of work. The State service ensures that the status of an EAI process is known. As the EAI services process any message or data the state of that message or data will change, and also the status of the EAI system will be continually changing as messages and data pass through.

This State Management service can be used to recover processes or messages and data that are being processed. The "in-flight" process is recovered to a suitable checkpoint where processing can be replayed without losing any integrity. For example, as a data element is being transformed by the Transform service the State service will take suitable checkpoints of the transformation to ensure that if the process fails then the transformation can be rerun properly.

State Management can improve the efficiency of the EAI system. The status of any process may be stored in memory or to disk for further resilience. There is always a natural trade-off with performance when storing to disk, however.

Resource Management

A resource manager is a technical component that enlists resources for the transaction manager. Depending on the integration architecture the resource manager may be a fairly simple component that is integrated quite closely with the transaction manager. This is typical of transaction processing managers for many applications. The following diagram shows how resource managers work with the transaction processing manager in this scenario.

All the major components above may be on the same platform or on different platforms connected by a network. The transaction processing manager ensures that all the operations that are to be executed on the applications are bound within a transaction. The resource managers are therefore components that enlist the operations in the transaction but also ensure that they participate in the rules of the transaction and can be recovered if necessary. A resource manager is therefore typically a component that is adapted to the application and the transaction processing manager.

Resource Management differs from state in that the State management service is more concerned about maintaining the status of the internal processes of the EAI system. The resource manager is focused on managing the status of operations on applications that are integrated by the EAI system.

Event Monitor

There are many levels of events in any system, from low-level hardware events through operating system events to application events. The Management services will have components that monitor various system events. This monitor serves as a "catch-all" for any system event excluding integration events.

Error Raising

The Event Monitor monitors all system events. The Error Raising service, which is usually a component integrated with the Event Monitor, detects events that are abnormal and raises them as Error Events. After an event is raised as an Error Event it can be processed by the Error Handling service.

Error Handling

When the Error Raising service has raised an event as an error the Error Handling service will process the event. The rules may require invocation of recovery services and so on. Error Handling should be configurable and preferably should support programmatic access.

Notify

The Notify service is used to electronically inform a business user of a particular event in the EAI tool service through a standard communication channel such as e-mail, telephone, fax, or pager. This could be the completion of processing of particular data, normal or abnormal. The Notify service can also be used to notify system users and administrators of system events, for example, when the Error Raising service raises an event as an error the Notify service can send an alert.

Configuration Management/
Versions

Any configuration change to an EAI component or modification of EAI metadata may need to be tracked and old versions maintained. Messages can have versions. Data schemas can be versioned. This service should satisfy the requirements for version management in line with the Change Management service of the organization responsible for the EAI system. The umbrella for all these services is Configuration Management.

Track

The Track service essentially provides a means of analyzing the log created by the Audit service in order to provide a trace of a complex series of related data. This kind of facility is extremely important for businesses that depend on the EAI tool service for supporting complex business collaborations where data go back and forth between businesses.

Monitor

The Monitor service monitors all processes in the EAI tool and tracks the service levels of the processes dynamically through a Monitor interface or by recording them to a log. Through this service, traps and ranges on specific processes can be set that are monitored specifically and alerts can be sent when the rules in the trap are detected or the service level goes outside the range that was set. This alert will be detected as an event by the Event service.

Archive

An Archive service is required to periodically archive data from the Metadata services.

Report

The EAI tool should be capable of providing a number of reports to both business users and system administrators. The best Report facility is one that allows both business users and system administrators to design and implement their own custom reports. The reports will probably run principally against the log produced by the Audit service, though some custom system reports may run against the Monitor service.

Audit

A service that records EAI system and service events, normal and abnormal. This is different from the Audit service that audits security events. System events are of interest to operations staff, whereas security events will be more of interest to security staff. Most EAI systems have some facility for logging events but may not meet the requirements of an Audit service. A full Audit service provides all the logging necessary to support a rich history of EAI tool processing events. In practice the Audit service may rarely be used to its full capacity, partly because the Audit service may have an impact on the performance of the EAI tool. Nevertheless, it should be possible to create a detailed log of events.

At the level of EAI processing the most likely characteristics for a log are:

Any process that was "in-flight" and fails will need to be recovered to a previously known status that maintains the integrity of the data. The Recovery service working with the State service is responsible for recovering state.

Appendix C: Integration of Heterogeneous Data Sources

The lowest level of EAI integration is the integration of the data used within the organization. In some scenarios this type of integration will provide the entire solution; however, it does not provide for more complex EAI requirements and so is likely to be used in conjunction with application and/or process integration.

All methods of data integration are applied directly to the data store, bypassing typical application-based business logic. Sometimes business rules are imposed at the database level by the use of database triggers or stored procedures. Data integration differs from application-level integration, where the integration components communicate with business applications at a process or API level and leave the business applications to control changes to their own data stores.

The main objective of data integration is to replicate changes to data sources that are caused by the execution of a business transaction.

Examining the data integration requirements more closely, we can see that a number of distinct elements are involved in the process of integration at the data level. These elements are:

Data connectivity. Provides the basic connectivity to the data source, and enables reading and updating the data source. Data connectivity can be provided by proprietary interfaces to the data or through the implementation of industry standards such as Open Database Connectivity (ODBC), designed specifically for interoperating with SQL-accessible relational database management systems. For non-relational data stores, the OLE DB technology provides access to various semi-structured and unstructured data stores. The connectivity layer may include additional services such as code-page translation or connection pooling. Data connectivity is a fundamental element of data integration and facilitates all the other types of integration.

Transaction management. Building upon the basic connectivity to the data source, additional transaction management services can be provided to ensure that updates are applied to multiple data sources in an atomic manner. For example, an online bookstore would not want to schedule the shipment of books without doing the proper billing, and it would be equally wrong to bill a customer without scheduling delivery. Without support from the system, it is difficult to coordinate the work so that either all of it happens or none of it happens.

Transaction management facilities are provided either by the data source itself (for example, a relational database) or through an external transaction management system.

Data replication. Data replication is a common means of applying changes in one data source to other data sources. This provides an alternative approach to the use of synchronous transaction management to ensure that changes are applied across the data sources. Most proprietary database management systems support replication services. Many support replication services between different proprietary systems.

Extract, transform, and load (ETL). Data extract, transform and load (ETL) solutions were developed to add value to basic data replication services. Most database vendors provided data replication, but they did not account for those businesses that needed to change the data that was being replicated — for example, because of differences between database schemas, data organization, or data syntax. Vendors therefore introduced a bulk transformation service that allowed businesses to map data from one source to the data definition and rules of another.

There is a great deal of rich technology supporting the integration of heterogeneous data sources from the Microsoft Windows® platform and other software vendors. The extent of this support makes the Windows platform a logical design choice as the center for data integration.

As previously defined, the main objective of data integration is to replicate the changes to data sources that were caused by the execution of a business transaction. As we have seen in preceding discussions, this can be achieved through data connectivity in conjunction with transaction management, data replication, or ETL and the various products that support them. However, as previously highlighted, integration at the data layer is not always appropriate because it can bypass important application logic that requires integration to be performed at the application layer.

Data integration solutions are also typically synchronous in nature, and the failure of a single component of the solution has implications for the availability of the solution as a whole. To address this, additional logical services such as queuing must be introduced. Synchronous versus asynchronous integration was discussed in more detail in the "Conceptual Solution" section.

While the simple integration of two or more data sources in the development of a business solution may meet the immediate business need, the approach may not provide the level of business flexibility and cost savings that can be delivered by using additional logical services, typically provided by "integration brokers" such as BizTalk Server. This is not to say that data integration should not be used, but rather that the technologies and products described above are often used to provide integration with the data layer, and these are combined with other products and technologies to deliver the overall EAI services for an organization.

Data Integration: Supporting Logical Services

The following illustration shows the typical logical services that are provided by technologies and products that provide integration at the data layer. They concentrate on the data-related services around dispatch, data extraction, and transformation.

Products that provide basic data connectivity essentially provide the logical Dispatch service, providing delivery of data to and from the data source. Basic connectivity may be supplemented with transaction services provided either through the data source itself or in conjunction with a transaction monitor. However, these transactions are typically restricted to traditional synchronous two-phase commit-style transactions and do not address the issue of long-running transactions.

Products providing ETL capabilities will also provide the related logical services to support these capabilities. The products in this space typically do not implement these logical services as generic mechanisms, but rather implement them as specific to their respective data sources. For example the ETL facility provided by a database will be aimed primarily at the extraction and transformation of relational data rather than being a purely generic service suitable for any data type.

As we consider the different levels of data integration in the following sections, we will explore in more detail the relationship between the logical services and the products and technologies that provide the physical realization for each service.

Data Connectivity

Data connectivity in the Microsoft EAI pattern is addressed through Microsoft Universal Data Access. Universal Data Access is a unified framework for data access across the Microsoft platform, the core services of which are built directly into Windows.

Universal Data Access provides high-performance access to a variety of relational and non-relational information sources, and an easy-to-use programming interface that is tool and language independent. These technologies enable you to integrate diverse data sources, create easy-to-maintain solutions, and use your choice of tools, applications, and platform services.

Universal Data Access does not require expensive and time-consuming movement of data into a single data store, nor does it require commitment to a single vendor's products. Universal Data Access is based on open industry specifications with broad industry support, and works with all major established database platforms.

Universal Data Access is provided through a base set of components known as Microsoft Data Access Components (MDAC). This base set of components provides access to data sources for the Windows platform in addition to some external data sources including Oracle. MDAC is supplemented by additional drivers specific to a heterogeneous data source typically provided by the data source vendor. Microsoft also ships a number of drivers for accessing host data for IBM mainframe and AS/400 systems as part of Host Integration Server. In addition, SQL Server offers some advanced data connectivity features.

Universal Data Access also embraces the access of data from a managed code environment provided by the Microsoft .NET Framework with the introduction of ADO.NET and the native managed code data provider for SQL Server.

Logical Service Mapping

The logical services provided by MDAC are centered on connectivity to data sources; this connectivity enables data consumption by higher-level services.

Logical service

Physical realization

Integration

+

Parse

+

Map

+

Filter

+

Validate

+

Transform

+

Format

+

(De)Compose

+

Enrich

MDAC

These services are provided by MDAC in the course of building parameters for queries or stored procedures and the manipulation of data records returned from the data store.

The ADO and OLE DB models within MDAC provide rich mechanisms for the parsing, formatting, filtering, validation, composition, and mapping of data to and from the data stores.

Host Integration Server

Host Integration Server data integration services provide data access services through MDAC by exposing mechanisms to support the invocation of queries and stored procedures on the connected data stores.

SQL Server

The integration services of SQL Server are surfaced through the OLE DB and ADO component model in the MDAC components. These provide for the logical services.

In addition, the HTTP/XML capabilities of SQL Server 2000 provide a number of transformation, filtering, composition, mapping, and parsing services in conjunction with the processing of XML requests.

Orchestration

+

Transaction Integrity

MDAC

The components within MDAC do not directly provide transaction services; rather, they provide access to the transaction services of the data stores themselves and to the transaction services provided by COM+ or another transaction monitor, allowing integration into process orchestration.

The transaction services support the dispatch services, facilitating their enrollment in transactions.

These transactions are typically restricted to traditional synchronous two-phase commit-style transactions and do not address the issue of long-running transactions.

Host Integration Server

The components within Host Integration Server data integration services do not directly provide transaction services; rather, they provide access (through MDAC) to the transaction services of the data stores themselves and to the transaction services provided by the Windows platform (COM+) or to mainframe-based transaction monitors, allowing integration into process orchestration.

SQL Server

SQL Server 2000 provides transactional support for data operations carried out on the SQL Server 2000 data store. The transactions can also be extended to enroll other systems, or to enroll in transactions initiated on other systems.

MDAC components provide Dispatch services through the invocation of queries and stored procedures on the connected data stores.

Host Integration Server

Host Integration Server data integration services provide Dispatch services through MDAC by exposing mechanisms to support the invocation of queries and stored procedures on the connected data stores.

SQL Server

The integration with heterogeneous data stores through the SQL Server Distributed Query Processor provides the service of dispatching relevant queries to separate data stores and composing the results.

The Dispatch service is also supported through the XML interface to SQL Server, which allows queries to be executed in response to an XML document delivered over an HTTP connection direct to SQL Server. The results are then formatted and returned by using the same protocol.

Product Mapping

Microsoft Data Access Components (MDAC)

The Microsoft Data Access Components (MDAC) are the key technologies that enable Universal Data Access. Data-driven client/server applications deployed over the Web or over a LAN can use these components to easily integrate information from a variety of sources, both relational (SQL) and non-relational. These components include the following:

Microsoft ActiveX® Data Objects (ADO). ADO is the application programming interface (API) to data and information. ADO provides consistent, high-performance access to data and supports a variety of development needs, including the creation of front-end database clients and middle-tier business objects that use applications, tools, languages, or Internet browsers. ADO is designed to be the one data interface needed for single- and multi-tier client/server and Web-based data-driven solution development. The primary benefits of ADO are ease of use, high speed, low memory overhead, and a small disk footprint.

ADO provides an easy-to-use interface to OLE DB, which provides the underlying access to data. ADO is implemented with minimal network traffic in key scenarios, and a minimal number of layers between the front end and data store — all to provide a lightweight, high-performance interface. ADO is easy to use because it uses a familiar metaphor — the COM automation interface, available from all leading Rapid Application Development (RAD) tools, database tools, and languages on the market today.

OLE DB. OLE DB is the Microsoft system-level programming interface to data across the organization. OLE DB is an open specification designed to build on the success of ODBC by providing an open standard for accessing all kinds of data. Whereas ODBC was created to specifically access relational databases as highly structured data stores, OLE DB is designed for relational and non-relational information sources in structured, semi-structured, or unstructured formats, including mainframe ISAM/VSAM and hierarchical databases; e-mail and file system stores; text, graphical, and geographical data; custom business objects; and more.

OLE DB defines a collection of COM interfaces that encapsulate various database management system services. These interfaces enable the creation of software components that implement such services. OLE DB components consist of data providers, which contain and expose data; data consumers, which use data; and service components, which process and transport data (such as query processors and cursor engines). OLE DB interfaces are designed to help components integrate smoothly so that OLE DB component vendors can bring high-quality OLE DB components to market quickly. In addition, OLE DB includes a bridge to ODBC to enable continued support for the broad range of ODBC relational database drivers available today.

Open Database Connectivity (ODBC). The ODBC interface is an industry standard and a component of Microsoft Windows Open Services Architecture (WOSA). The ODBC interface makes it possible for applications to access data from a variety of database management systems (DBMSs). ODBC permits maximum interoperability — an application can access data in diverse DBMSs through a single interface. Furthermore, that application will be independent of any DBMS from which it accesses data. Users of the application can add software components called drivers, which create an interface between an application and a specific DBMS.

Host Integration Server

Host Integration Server extends Microsoft Windows to other systems by providing application, data, and network integration. Host Integration Server lets you rapidly adapt to new business opportunities while preserving existing infrastructure investments. Click here for further information or look in the product documentation.

Host Integration Server 2000 provides the following categories of components:

Host Integration Server management components. Provide a wide assortment of tools to manage the components of Host Integration Server. This includes tools for performing both interactive and scripted local and remote Web-based and traditional client/server management of Host Integration Server components.

The data integration features included with Microsoft Host Integration Server enable you to interact with host data sources, including AS/400 and VSAM files, AS/400 data queues, and IBM DB2 relational database systems. This access is based upon the Universal Data Access strategy and is supported by the MDAC components.

The following data integration features are included with Host Integration Server:

Microsoft OLE DB Provider for AS/400 and VSAM

Microsoft OLE DB Provider for DB2

Microsoft ODBC Driver for DB2

Microsoft Data Queue ActiveX Control for accessing AS/400 data queues

These data integration services are surfaced through the data access models within MDAC.

The following data tools and file-sharing features are also included with Host Integration Server:

Microsoft Host File Transfer ActiveX Control enables transferring files between a local machine and an OS/390, AS/400, or VSE/ESA host system.

Microsoft APPC File Transfer Protocol enables transferring files between a local machine and an OS/390, AS/400, or VSE/ESA host system using the AFTP protocol.

SQL Server

As with any other heterogeneous data source, access to data held within SQL Server is provided through Universal Data Access with supporting components provided by Microsoft Data Access Components (MDAC). In addition, SQL Server offers additional connectivity functionality through distributed queries and its XML and HTTP support.

Distributed Queries

The Distributed Query Processor (DQP) allows users to access data that resides on multiple distributed databases across multiple servers. Using DQP, SQL Server administrators and developers can create linked server queries that run against multiple back-end data sources with little or no modification. DQP enables application developers to create heterogeneous queries that join tables in SQL Server with tables held in other database systems such as Oracle or DB2. Also, DQP can be used to create SQL Server views over database tables held in other database systems so that developers can write directly to SQL Server and integrate both Windows-based and non-Windows-based data in their applications.

XML and HTTP Support

Microsoft SQL Server 2000 introduces new features to support the access of relational information by using XML functionality. The combination of these features makes SQL Server an XML-enabled database server. These new features include:

The ability to execute queries against SQL Server by using HTTP.

Support for XDR (XML-Data Reduced) schemas and the ability to specify XPath queries against these schemas.

The ability to retrieve and write XML data as follows:

Retrieve XML data by using the SELECT statement and the FOR XML clause.

Write XML data by using the OPENXML rowset provider.

Retrieve XML data by using the XPath query language.

Enhancements to the Microsoft SQL Server 2000 OLE DB provider (SQLOLEDB) that allow XML documents to be set as command text and to return result sets as a stream.

Transaction Management

Business transactions, such as ordering a book, increasingly involve multiple servers. Credit must be verified, books must be shipped, inventory must be managed, and customers must be billed. Updates must occur in multiple databases on multiple servers. Developers of distributed applications must anticipate that some parts of the application may continue to run even after other parts have failed.

Business applications are frequently required to coordinate multiple units of work as part of a single business transaction. Developing such an application is extremely complex and can be hard to scale. To streamline application development, software vendors began producing transaction-based software specifically designed to manage system-level services:

TP monitors. A software environment that sits between a transaction-processing application and a collection of system services, such as user interface, operating system, communications, and database services. Instead of writing an application that manages each independent service, you write an application to run as part of a TP monitor environment.

The main difference between transaction management methods and data replication methods is the point of intervention in the transaction. A business transaction has temporal and spatial qualities. Each one has a beginning and an end in time. In a computing environment the effects of the transaction are spread to at least one data store. In the EAI scenario the transaction is usually spread across multiple data stores. If we compare these two modes of integration we see that their architecture is different, as shown in the following illustration.

In this illustration, there are three applications that are coordinated by a transaction processing manager (TPMS). Application events that are coordinated into a transaction have to be explicitly specified. The effects of these events on data stores are managed by the TPMS from the moment the transaction is specified. The TPMS enlists the resources of the data stores by using resource managers in the transactions. Through the resource managers the TPMS can apply complete control over the updates to the data stores. The TPMS is hosted on the same platform as one of the applications, and the other applications need to have a direct network connection to the TPMS or to be managed through a TPMS gateway. Transaction management is best suited to supporting synchronous real-time integration.

Good transaction management solutions support transactions across heterogeneous platforms.

On the other hand, transaction management presents the following challenges:

Many transaction management solutions are platform-specific, requiring transaction gateways to support transactions across multiple platforms.

Transaction design is complex.

Transaction management tends to require tightly coupled integration, which increases the dependency between platforms and applications.

Transaction management services can be resource intensive, taking up significant processing power and requiring good management tools.

Synchronous processing in transaction management can block business processes.

Transaction Boundaries

One of the most important characteristics of transactions that should be supported by TP monitors is that transactions have boundaries.

A transaction has a beginning, an end, and occurs exactly once. During its execution, a transaction may call on resources to accomplish one or more tasks. Each resource, such as a database or queue, falls within the boundary of the transaction. All resources within a transactional boundary share a single transaction.

You can design transactions to span processes and computers. Thus, a transactional boundary is an abstraction for managing consistency across process and computer boundaries.

In a traditional transaction-processing application, you control transaction boundaries with explicit instructions to begin and end the transaction. From within one transaction boundary, you can begin a second transaction, called a nested transaction. The parent transaction does not commit until all its subordinate transactions commit.

Transaction boundaries can be extremely complex. The complexity of the solution required is multiplied as transactions are distributed across multiple systems. The following illustration shows a typical implementation of transactions within an application, where the transactions have not been distributed across the multiple systems.

The transactions in this illustration are "stove-piped." System A is a client where a process has been initiated that calls Operation 1 in System B. System A could be a Web browser that has sent an order request. Operation 1 is bound within a transaction to ensure that the integrity of all its operations is maintained. Operation 2 invokes Operation 3 on System C, which completes normally. Operation 4 then invokes Operation 5 that fails. Unfortunately, the update made by Operation 3 has been committed and cannot be rolled back because the transaction boundary does not extend beyond System B. Even though Operation 5 is bound within a transaction, this is not coordinated with Operation 3. Solving this problem requires the distribution of transactions and transaction processing monitors that can manage the complexity of distributing transactions not only across heterogeneous application platforms but also across heterogeneous transaction processing monitors.

Logical Service Mapping

Logical service

Physical realization

Orchestration

+

Transaction Integrity

SQL Server

SQL Server 2000 provides transactional support for data operations carried out on the SQL Server 2000 data store. The transactions can also be extended to enroll other systems, or to enroll in transactions initiated by other systems.

COM+/Windows

Transactions are exposed and utilized on the Windows platform through the COM+ transaction services.

Product Mapping

SQL Server

SQL Server includes transactional support for the coordination of transactional updates to a single database. Distributed transactional support across heterogeneous data sources is provided by Microsoft Distributed Transaction Coordinator (MS DTC). However the services of MS DTC are normally enlisted through the exploitation of COM+, which provides both a TP monitor and additional services.

COM+ Services

COM+ is the next step in the evolution of the Microsoft Component Object Model and Microsoft Transaction Server (MTS). COM+ handles many of the resource management tasks that a programmer previously had to program manually, such as thread allocation and security. It automatically makes applications more scalable by providing thread pooling, object pooling, and just-in-time object activation.

COM+ also protects the integrity of data by providing transaction support, even if a transaction spans multiple databases over a network, including support for OLE DB transactions and X\Open XA transactions, as supported by many database vendors including Oracle, INFORMIX, DB2, Sybase, and Ingres. COM+ transaction coordination also extends to non-relational resources including MSMQ and IBM CICS mainframe applications (through Host Integration Server), thereby enabling a single coordinated update to a database, a message queue, and a mainframe application.

COM+ provides a simple, declarative-based transactional programming model for transaction management, significantly simplifying the development of applications. Further information about COM+, go to the COM+ home page.

Data Replication

Data replication is a common means of applying changes in one data source to other data sources. This provides an alternative approach to the use of synchronous transaction management to ensure that the changes are applied across the data sources. Most database management systems support replication services, and many support replication services between different proprietary systems. The replication services depend on the data connectivity services to provide access to the foreign data stores. The following illustration shows a typical architecture of a data replication service.

In this illustration, the database servers will allow each of the applications to execute their changes as transactions, but each transaction is isolated until the change has been committed to the database. Therefore, for example, if Application A makes a change to its data store, Database A, then the replication manager will attempt to replicate this change to the other data stores, Database B and Database C. Data replication is therefore best suited to asynchronous integration where data changes do not have to be made in real time.

There is much debate about the value of database replication. There are many critics who focus on issues such as:

Impact on database management system (DBMS) performance

Data consistency and integrity

Recovery of errors

Security

Some argue that data replication of changes across DBMSs is a "poor cousin" of transaction management solutions. The main point of the argument revolves around the issue of the synchronicity of applying changes to multiple databases. Theoretically, a transaction management system (TPMS) binds all changes into a single unit of work, applying all changes simultaneously to each database enlisted in the transaction. In reality, the TPMS applies the changes sequentially but has the capability of rolling back changes if any one of the changes within the transaction fails.

In the following illustration a data change that happens in an application is replicated over three databases. Updates 1, 2, and 3 are bound within a unit of work managed by the transaction processing monitor. If any resource manager reports a failure to apply the update, then the transaction processing monitor asks the other resource managers to roll back the updates they have made. Only when all the data changes have been made or rolled back is the transaction complete.

Some data replication tools apply a change to one data source, then to another, and so on, often independent of the success or failure of any one of the changes. However, the more sophisticated tools are capable of participating in transactions and taking actions upon a failure to apply a change — although this would not meet the guidelines of the two-phase commit protocol. The following illustration shows an application making an update to Database A. The replication manager on Database A detects the update and replicates it across to Database B and then to Database C. Compensating for any failure to update Database B or Database C requires intelligent replication managers that are able to send alerts back to the primary source and roll back the changes.

These services are provided through a combination of SQL Server 2000 native replication services and the data connectivity provided through MDAC and used during the replication process.

Orchestration

+ Transaction Integrity

SQL Server 2000 provides transactional support for data operations carried out on the SQL Server 2000 data store. The transactions can also be extended to enroll other systems, or to enroll in transactions initiated on other systems.

These transactions are only utilized during transactional replication.

Product Mapping

SQL Server

SQL Server supports heterogeneous data source replication, provided that organizations have an ODBC driver or an OLE DB provider for that data source. There are three types of replication available today for heterogeneous databases: snapshot, transactional, and merge.

As its name implies, snapshot replication takes a picture of the published data in the database at a moment in time. Transactional replication, on the other hand, uses snapshot replication as a starting point, and then maintains the consistency between the databases by sending database modifications to the other database on a regular basis. Transactional replication uses the transaction log to capture changes that were made to the data in an article. SQL Server monitors INSERT, UPDATE, and DELETE statements, or other modifications made to the data on one server, and stores those changes in the distribution database, which acts as a reliable queue. Changes are then sent to the other database and applied in the same order.

Snapshot Replication

Snapshot replication is the process of copying and distributing data and database objects exactly as they appear at a moment in time. Snapshot replication does not require continuous monitoring of changes because changes made to published data are not propagated to the subscriber incrementally. Subscribers are updated with a complete refresh of the data set and not with individual transactions. Because snapshot replication replicates an entire data set at one time, it may take longer to propagate data modifications to subscribers. Snapshot publications are typically replicated less frequently than other types of publications.

Options available with snapshot replication allow you to filter published data, allow subscribers to make modifications to replicated data and propagate those changes to the publisher and to other subscribers, and allow you to transform data as it is published.

Snapshot replication can be helpful in situations when:

Data is mostly static and does not change often.

It is acceptable to have copies of data that are out of date for a period of time.

Replicating small volumes of data.

Sites are often disconnected and high latency (the amount of time between when data is updated at one site and when it is updated at another) is acceptable.

Transactional Replication

With transactional replication, an initial snapshot of data is propagated to subscribers, and then when data modifications are made at the publisher, the individual transactions are captured and propagated to subscribers.

SQL Server 2000 monitors INSERT, UPDATE, and DELETE statements, and changes to stored procedure executions and indexed views. SQL Server 2000 stores the transactions affecting replicated objects and then propagates those changes to subscribers continuously or at scheduled intervals. Transaction boundaries are preserved. If, for example, 100 rows are updated in a transaction, either the entire transaction with all 100 data modifications is accepted and propagated to subscribers or none of the modifications are accepted. When all changes are propagated, all subscribers will have the same values as the publisher.

Options available with transactional replication allow you to filter published data, allow users at the subscriber to make modifications to replicated data and propagate those changes to the publisher and to other subscribers, and allow you to transform data as it is published.

Transactional replication is typically used when:

You want data modifications to be propagated to subscribers, often within seconds of when they occur.

You need transactions to be atomic (either all or none applied at the subscriber).

Subscribers are mostly connected to the publisher.

Your application will not tolerate high latency for subscribers receiving changes.

SQL Server also supports transactional replication from SQL Server to heterogeneous databases, while third-party solutions are available for automating transactional replication from heterogeneous databases to SQL Server. Following are just a few examples:

Replication with Oracle

SQL Server enables bi-directional snapshot replication with Oracle, as well as transactional replication from SQL Server to Oracle.

Merge Replication

Merge replication allows various sites to work autonomously (online or offline) and merge data modifications made at multiple sites into a single, uniform result at a later time. The initial snapshot is applied to subscribers and then SQL Server 2000 tracks changes to published data at the publisher and at the subscribers. The data is synchronized between servers either at a scheduled time or on demand. Updates are made independently (no commit protocol) at more than one server, so the same data may have been updated by the publisher or by more than one subscriber. Therefore, conflicts can occur when data modifications are merged.

Merge replication includes default and custom choices for conflict resolution that you can define when you configure a merge publication. When a conflict occurs, a resolver is invoked by the Merge Agent to determine which data will be accepted and propagated to other sites.

Options available with merge replication include filtering published data horizontally and vertically, including join filters and dynamic filters, using alternate synchronization partners, optimizing synchronization to improve merge performance, validating replicated data to ensure synchronization, and using attachable subscription databases.

Merge replication is useful when:

Multiple subscribers need to update data at various times and propagate those changes to the publisher and to other subscribers.

Subscribers need to receive data, make changes offline, and synchronize changes later with the publisher and other subscribers.

Other databases on heterogeneous subscribers that comply with SQL Server ODBC or OLE DB subscriber requirements

Data Extract, Transform, and Load (ETL)

Products providing data extract, transform, and load (ETL) capabilities will also provide the related logical services to support these capabilities. The products in this space typically do not implement these logical services as generic mechanisms, but rather implement them as specific to their respective data sources. For example, the ETL facility provided by a database will be aimed primarily at the extraction and transformation of relational data rather than being a purely generic service suitable for any data type.

ETL solutions were developed to add value to basic data-replication services. Most database vendors provided data replication, but they did not account for the need to change the data that was being replicated, for example, because of differences between database schemas, data organization, or data syntax. Vendors therefore introduced a transformation service that allowed businesses to map data from one source to the data definition and rules of another. Typically database solutions would rely on stored procedures to resolve data heterogeneity. The maintenance of stored procedures can be costly and prone to change risk, so ETL solutions offered potential improvements.

With the advent of data warehouses and data marts the requirement for ETL solutions grew. However there are many different vendor solutions and there are many technical issues that an ETL solution must address. Some of the principal considerations that a designer must address when choosing an ETL solution are:

What is the organization of the source data source?

What data needs to be extracted?

What is the organization of the target data source?

What data needs to be loaded?

How does the data from the source need to be mapped to the target?

What transformations of the source data need to be applied?

How often should the extract run?

Should the extract run in a scheduled batch mode or should it be triggered by a data change in the source?

If large amounts of data are being extracted, what is the impact of the ETL solution on the performance of the source data store?

If large amounts of data are being loaded, what is the impact of the ETL solution on the performance of the target data store?

There are also some basic implementation considerations. The following illustration shows two different topologies for the implementation of ETL. Each one has different merits.

The hub-and-spoke solution is typical of pure ETL providers, vendors who are providing an ETL solution for a number of databases. The extract and load services can be performed by stored procedures or by components that are created with the ETL tool and installed on the data store. Many ETL vendors provide technology that adapts the extract and load components to most of the major DBMS vendors. All extracts are sent to the transformation hub where they are mapped and transformed to the target data store. The transformation hub then forwards the transformed extract to the target where the Load service loads the data to the data store. This solution is more suitable for heterogeneous database environments, where the extracts and loads are variable and the transformation is potentially complex. It supports central management of the ETL process.

The distributed solution is more typical of the database vendor approach to ETL. The main advantage of this solution is that point-to-point connections between data stores can be established because no transformation hub is needed since transformation can be performed on the source or target. This is desirable where the number of integration nodes is limited, where there are network issues, and where the extract and load is a repetitive task with little variation.

Another solution common to certain ETL specialists is a combination of the two topologies, as shown in the following illustration.

The advantage of centralized management of the hub-and-spoke solution is maintained while supporting a distributed ETL network. The ETL hub stores the metadata about all the extracts, transformations, and loads. The ETL Development service supports the centralized development of ETL components while the ETL Distribution service distributes these components to the right sources and targets and manages them. In operational terms, the solution is the same as the simple distributed ETL service.

Logical Service Mapping

Logical service

Physical realization in SQL Server 2000

Integration

+

Parse

+

Map

+

Filter

+

Validate

+

Transform

+

Format

+

(De)Compose

+

Enrich

These services are provided through a combination of SQL Server 2000 Data Transformation Services and the data connectivity provided through MDAC and used during the replication process.

Orchestration

+

Transaction Integrity

SQL Server 2000 provides transactional support for data operations carried out on the SQL Server 2000 data store. The transactions can also be extended to enroll other systems, or to enroll in transactions initiated on other systems.

DTS can create and orchestrate transactions as part of the ETL process.

Transport Services

+

Interfacing

+ Dispatch

DTS is capable of dispatching queries to data sources during the ETL process.

Product Mapping

SQL Server

The data extract, transform, and load (ETL) facility provided with SQL Server 2000 is known as Data Transformation Services (DTS). For further information about DTS, refer to the SQL Server Books Online or to "Appendix G: ETL with SQL Server 2000 and DTS."

DTS is a set of tools that can be used to import, export, and transform heterogeneous data between one or more data sources. Connectivity is provided through OLE DB, an open standard for data access. ODBC (Open Database Connectivity) data sources are supported through the OLE DB Provider for ODBC.

You create a DTS solution as one or more "packages." Each package may contain an organized set of tasks that define work to be performed, transformations on data and objects, workflow constraints that define task execution, and connections to data sources and destinations. DTS packages also provide services, such as logging package execution details, controlling transactions, and handling global variables.

These tools are available for creating and executing DTS packages:

The Import/Export Wizard is for building relatively simple DTS packages, and supports data migration and simple transformations.

The DTS Designer graphically implements the DTS object model, allowing you to create DTS packages with a wide range of functionality.

DTSRun is a command-prompt utility used to execute existing DTS packages.

DTSRunUI is a graphical interface to DTSRun, which also allows the passing of global variables and the generation of command lines.

SQLAgent is not a DTS application; however, it is used by DTS to schedule package execution.

Using the DTS object model, you also can create and run packages programmatically, build custom tasks, and build custom transformations.

DTS Connectivity

DTS is based on an OLE DB architecture that allows you to copy and transform data from a variety of data sources. For example:

Appendix D: Integration of Heterogeneous Applications

The basic objective of application integration is to invoke a "business transaction" provided by an existing application in response to some event. The options available for integrating with an application are essentially dictated by the application being integrated. For example, the application may provide a programmable API support; file, message, or HTTP-based interfaces; or may only support a user interface.

While the interfaces to applications vary, one common integration requirement is that the application itself cannot be modified — because it could be a packaged application provided by a third party, no skills exist to modify the application, or the risk and/or cost for modification is too great.

The foundation of any application integration is basic connectivity to the application. The connectivity will either be supporting the invocation of a business transaction provided by the application or catching an event in response to some user action within the application. Basic connectivity is typically technology based, for example, the application supports C-API, COM, CORBA, File, HTTP, MSMQ, or MQSeries interfaces. If you have the ability to connect to the application it is then a matter of making the correct method calls or formatting the data appropriate to a particular application.

From this, we can subdivide the application integration area into two smaller areas:

Application connectivity

EAI services

These subdivisions are helpful in providing a context for discussing the logical service mappings and the product mappings in the following sections.

Application Connectivity

Logical Service Mapping

The following illustration shows the logical services that are typically required in basic application connectivity. As expected, they focus around the middleware interfacing services; in addition, transaction services may be provided for interfacing transitionally with TP monitor-based applications such as COM+, CICS, or Tuxedo applications.

Logical service

Physical realization

Orchestration

+ Transaction Integrity

Within the application connectivity layer, the transaction services are accessed through COM+ and the application integration facilities of Host Integration Server (COMTI).

Many application interfaces are synchronous by their very nature, such as COM, C-type APIs, and CORBA. When linking one or more applications by using synchronous interfaces, the unavailability of one system (or interface) can lead to the unavailability of the entire system. In such cases additional logical services are required to provide an asynchronous interface through message queuing.

Product Mapping

The following table highlights some of the most common application connectivity technologies and the mapping to the Microsoft products and technologies that support them. The table is focused primarily on technology from Microsoft, and hundreds of additional third-party application connectivity solutions are available.

Classification

Technology

Product

Synchronous direct

COM

Windows platform (COM+)

C-API

Windows platform

CICS, IMS

Host Integration Server

X/Open distributed transactions, e.g., BEA, Tuxedo

Open Transaction Integrator from UNISYS

CORBA, Enterprise Java Beans

Third-party COM bridging technologies, e,g., Orbix, Actional, others

Message queue

MSMQ

Windows platform (MSMQ)

MQSeries

BizTalk Adapter for MQSeries, MSMQ-MQSeries Bridge

HTTP based

HTTP

Microsoft WinInet, MSXML, Internet Information Services

XML Web services

SOAP Toolkit, .NET Framework

File

File

Windows platform, Microsoft Services for UNIX

Screen (UI) based

IBM 3270, IBM 5250

Host Integration Server plus third-party screen scraping

VT100, VT220

Third-party screen scraping

Windows 2000 Server

The Windows 2000 platform itself provides many built-in services that support application connectivity. These include basic services supporting file-based interfaces, including support for file event notifications.

Windows 2000 provides distributed transactional support provided by Microsoft Distributed Transaction Coordinator (MS DTC), which can be exploited through the COM+ programming model.

The Windows platform also provides a native asynchronous messaging system, Microsoft Message Queuing (MSMQ) within the COM+ services. MSMQ provides a messaging service providing guaranteed once-only delivery of messages between applications and systems irrespective of the continual availability of a supporting network.

Support for Internet-based protocols is supported on the receiving side by Internet Information Services (IIS). Sending data through HTTP is supported by the WinInet component, and transporting XML through HTTP is supported directly by Microsoft XML Core Services (MSXML). Other standard transports such as SMTP and FTP are also supported through platform services and technologies.

MSMQ-MQSeries Bridge to develop applications to send messages between IBM MQSeries and Microsoft Message Queuing (MSMQ) environments. (For native support of IBM MQSeries, the BizTalk Adapter for MQSeries is also available.)

COM Transaction Integrator (COMTI)

COMTI enables Windows-based client applications to invoke mainframe-based transaction programs (TPs). COMTI provides a COM object interface to existing mainframe transactions, handling all the mapping of data types to convert from the Intel-based architecture to the OS/390-based architecture, and interacts with host TPs on mainframes.

The specific TPs supported in COMTI are the IBM Customer Information Control System (CICS) and the IBM Information Management System (IMS) TPs. All COMTI processing is done on the Windows 2000 or Windows NT® Server platform. No COMTI-related executable code is required on the mainframe; in other words, no mainframe footprint is necessary. COMTI supports SNA (APPC/LU 6.2) and TCP/IP standard communication protocols for all communications between Windows and the mainframe.

Through the import of a mainframe transaction COBOL copybook, the COMTI Component Builder reads the COBOL transaction code and generates a COMTI component library object that contains the proper specialized interfaces for the mainframe. As a generic proxy for the mainframe, the COMTI run-time environment intercepts object method calls and redirects those calls to the appropriate mainframe program in a representation understandable by mainframe TPs.

COMTI also supports full two-phase commit (2PC) transaction coordination with the mainframe and other transaction resources through the transaction services provided by COM+ (MS DTC). For example, this would support a coordinated update between the mainframe and SQL Server. Support for 2PC is supported with SNA LU 6.2 sync level 2 connections only. IBM has not implemented 2PC in the TCP/IP protocol, but for those cases where 2PC is not needed, TCP/IP can provide direct connectivity.

MSMQ-MQSeries Bridge

The MSMQ-MQSeries Bridge provides a two-way mechanism for transparently:

Accessing IBM MQSeries queues from an MSMQ environment

Accessing MSMQ queues from an IBM MQSeries environment

The philosophy behind the bridge is to expose queues in each messaging system to those in the other, in terms that are native to each messaging system. The bridge translates and maps the fields and values of the sending environment to the fields and values of the receiving environment. After mapping and conversion, the MSMQ-MQSeries Bridge then routes the message between the two messaging systems.

MQSeries queue managers and queues are exposed to the MSMQ environment by defining MQSeries queue managers and queues within MSMQ Explorer as foreign computers and foreign queues, respectively.

MSMQ computers and queues are exposed to the MQSeries environment by creating and importing MQSeries definition files into the configuration of an MQSeries queue manager. The definitions identify the transmission queue into which MQSeries should place messages that are destined for an MSMQ queue.

The MSMQ-MQSeries Bridge operates transparently, allowing MSMQ and MQSeries applications to use their native APIs to deliver messages between the two environments. Neither application is aware that it has crossed between the two environments.

The Microsoft MSMQ-MQSeries Bridge provides access to message queues located on the following IBM MQSeries systems through SNA LU6.2 or TCP/IP:

IBM MQSeries for OS/390 Version 2 Release 1 (V2.1)

IBM MQSeries for AS/400 Version 4 Release 3 (V4R3MO)

IBM MQSeries for Windows NT Version 5.1, 5.0 and 2.0

The MSMQ-MQSeries Bridge supports both MSMQ and MQSeries transactions.

XML Web Services

There are probably as many definitions of XML Web services as there are companies building them, but almost all definitions have these things in common:

XML Web services expose useful functionality to the Web through standard Web protocols. In most cases, the protocol used is SOAP.

XML Web services provide a standard way to describe their interfaces in enough detail to allow a user to build a client application to access them. This description is most commonly provided in an XML document called a Web Services Description Language (WSDL) document.

XML Web services are often registered so that potential users can find them easily. This is most frequently done with Universal Discovery Description and Integration (UDDI).

One of the primary advantages of the XML Web services architecture is that it allows programs written in different languages on different platforms to communicate with each other in a standards-based way. One difference between this and prior attempts at delivering the same result is that SOAP is significantly less complex than earlier approaches, so it is easier to create a standards-compliant SOAP implementation.

In terms of application integration connectivity, using XML Web services to expose the business transaction offers significant benefits including:

Allowing the business transaction to be exploited from any platform or from any development language.

The ability to offer the business transaction directly to internal and external organizations over the Internet.

With the widespread support of XML Web services across the IT industry, it is expected that many applications will expose their functionality as XML Web services in the future. Until that time, XML Web services wrappers can be relatively easily developed by using development tools such as Microsoft Visual Studio® .NET to expose existing functionality and realize the benefits identified above.

The development of XML Web services wrappers for existing functionality requires integration with the existing application interfaces. The provision of an XML Web service to an existing application is therefore a typical integration solution, and the tools and technologies for integration are therefore also directly applicable to the provision of XML Web services.

Microsoft provides tools for delivering XML Web services through the .NET Framework and Visual Studio .NET. Support is also provided through the Microsoft SOAP Toolkit for systems that do not possess the more advanced XML Web services run-time infrastructure.

Current Limitations of XML Web Services

XML Web services are built on XML, SOAP, WSDL, and UDDI specifications. These constitute a set of baseline specifications that provide the foundation for application integration and aggregation. Organizations are building and deploying solutions based on these specifications today.

The baseline specifications do, however, have gaps that require the implementation of higher-level logical services such as security, routing, reliable messaging, and transactions. For example, XML Web services are synchronous by nature and require the consumer of the XML Web service to be available. This is acceptable for data retrieval and user interface interaction, but may not be applicable for business document exchange. In the future, these extended services will be addressed through additional standards. For example, Microsoft and IBM have co-presented to W3C a layered stack of these services known as Global XML Web Services (GXA). Until such standards are accepted and implementations are available, the provision of these services will be by product and custom development. For example, Microsoft BizTalk Server provides an implementation for security, routing, and reliable messaging for XML-based documents built on the SOAP standard.

Third-Party Solutions

There are many hundreds of application connectivity solutions available for the Windows platform. Application connectivity solutions range from general technology-based connectivity to industry-specific ISV applications.

One example of an enterprise-scale connectivity solution is OpenTI from Unisys. This application connector provides services similar to COMTI but for X/Open Distributed Transaction-based systems including Unisys Open Distributed Transaction Processing, BEA Tuxedo, and ICL's TPMS transaction monitor. This includes full two-phase commit support with COM+ managed transactions.

EAI Services

The deployment of EAI services (or an integration broker) around basic application connectivity can deliver solution cost savings, particularly when the number of integrated applications increases. Solutions delivered with such a framework also provide an organization with greater agility in responding to future requirements, such as those imposed by competitive or legislation changes.The additional logical services provided by EAI services are centered on the integration and metadata services. For example, the EAI services typically provide integration services for parsing, validation, transformation, formatting, and enriching services independent of the application being integrated. In addition, the framework commonly provides metadata services for managing interfaces, proving indexing, and searching for interchanged information.

These common EAI services are included in Microsoft BizTalk Server. BizTalk Server includes a collection of tools and services that provide EAI logical services. In addition, BizTalk Server provides services that support integration with external organizations.

BizTalk Server consists of three major components:

A core messaging engine providing core integration services

A set of productivity tools supporting integration

An orchestration engine and design-time environment for orchestrating business process. This is described in further detail in "Solution: Integration of Business Processes."

In addition, BizTalk Server is extended through a set of "application adapters" provided by Microsoft and third-party vendors. The BizTalk Server architectural framework enables application adapters to be easily developed on a project-by-project basis by using the application connectivity solutions described earlier. However, time and cost savings can be achieved by using pre-developed adapters, such as an adapter to SAP/R3. These adapters typically include both the application connectivity element and all the associated metadata services describing the interfaces for the application.

BizTalk Server was originally released in November 2000 as BizTalk Server 2000, and is now in its second version with the release of BizTalk Server 2002 in January 2002.

Logical Service Mapping

As integration is focused on connecting two or more applications based on a predictable process definition, the need exists for many additional logical services, such as the ability to map data output from one application to the format required by another. As discussed in "Pattern Context," traditional integration methods have focused on simply linking applications together through basic connectivity, with custom development providing the additional logical services such as data mapping. However, cost savings and business flexibility can be achieved through the introduction of EAI technology (often referred to as an "integration broker") to provide the additional integration logical services through a standardized framework.

The following illustration shows the logical services that are commonly supported by core EAI technology to support the integration of heterogeneous applications. These services build upon the interfacing services and provide the additional integration services (including parsing, transformation, formatting, and routing) through a common framework. In addition, the metadata services are also typically supported, providing a more manageable and flexible integration solution.

See Appendix B for details of these logical services and their associated technology (physical realization).

Message-Oriented Integration Scenarios

Discussing BizTalk Server in the context of the EAI logical services means that we will be concentrating on the messaging capabilities provided by BizTalk Server. (The orchestration capabilities are covered in the next section, "Solution: Integration of Business Processes.")

In discussing the physical pattern for EAI services based around messaging, it is useful to consider two general messaging challenges:

Message routing

Message transformation

Message Routing

Simple message routing between two or more applications that each leverage a standard messaging transport through code written to that particular API is one of the most common methods of integration. It requires services to support the receipt of messages from source applications and forwarding of messages to target applications. The following illustration shows a typical logical configuration of a message-routing network. Three applications are connected to the message-routing network by using components that are adapted to the network and operating system characteristics of the system.

To send messages between these applications, it is necessary to create an enterprise messaging standard that all applications and systems can understand and use to integrate through the message router. The common messaging standard means that all applications can freely exchange information. Using bridging products such as the MSMQ-MQSeries Bridge can extend the messaging network to another messaging network and hence to other applications.

When a system operates as a message source (a system sending a message), data from the application — in this case Application B — is formatted, probably by the connector, into a message. The message is essentially an envelope that contains the data wrapped in information needed by the messaging network. The connector then forwards the message to the message router. Most message routers then put the message to a queue. The queue adds resilience to the messaging network by persisting messages and ensuring that they are processed in a controlled fashion.

The routing function in the message router inspects each message in sequence, determines where the message needs to be sent, and forwards it via the input/output service to the target system.

In this case, the message from System B is targeted for System C. When the message arrives at System C, the connector removes the envelope and delivers the data to the application.

Message Routing Considerations

Message routing provides the following advantages:

Message routing does not require the use of the transformation service within the router because all systems will share information in the same format.

Delivery of messages can be guaranteed by implementing store and forward mechanisms with message receipt acknowledgements.

The absence of the transformation service in a pure message routing solution means that a common standard message must be agreed on by all systems, normally requiring substantial work on the distributed applications to conform to that format.

Each system requires a connector or application adapter in order to exchange messages with the messaging network. For widely distributed and diverse applications this can present significant management overhead.

Operational EAI messaging networks are difficult to manage without management tools that operate on the specific network and its components.

Cross-Platform Message Queuing

Microsoft Message Queuing (MSMQ) is the message-oriented middleware product that supports messaging between Windows platforms. Mainframes and other platforms, on the other hand, normally use a product developed by IBM called MQSeries. IBM has extended MQSeries to other IBM and non-IBM platforms in addition to mainframes and AS/400. Although a version of MQSeries is available for Windows NT and Windows 2000, MSMQ is native to those platforms.

To support cross-platform messaging between Windows and mainframe messaging systems, Host Integration Server 2000 includes the MSMQ-MQSeries Bridge. (As previously mentioned, for native support of IBM MQSeries, Microsoft provides the BizTalk Adapter for MQSeries.) The MSMQ-MQSeries Bridge integrates the two messaging platforms and enables messages to be transferred in either direction across platforms. It provides asynchronous, messaging-based communication between heterogeneous applications. The following illustration shows this operation.

Message Transformation

By adding a transformation service to the message router, we can gain greater flexibility over the messaging network and the systems that are integrated. Such a message hub is often known as a message broker because it brokers heterogeneous application semantics. In the following illustration, a transform component is added to the message router.

Now each system can send messages in its own native message format. The systems do not have to abide by a common message standard. The connectors still forward the messages from sources to the message router, but while the messages are queued they are transformed into the message format of the target system, in this case System C.

Message Transformation Considerations

Message transformation provides the following advantages:

Message transformation solutions provide greater reach for the EAI service. Now systems can exchange information in a native format (provided that the EAI transformation service can actually parse and serialize that format).

The resolution of the different application semantics is centralized in the message router. This helps to reduce the cost of integration and can increase the quality of change management.

On the other hand, message transformation presents the following challenges:

Message transformation solutions require access to the syntax and semantics of each native message format.

Managing the responsibility for message formatting is more complex.

Simple Routing and Transformation of XML Messages

The problem of application-to-application integration is often approached initially by understanding the external interfaces that each system exposes, and the message formats and specifications used to transfer messages. Typically, quality of service issues, such as security, message encoding, and reliable delivery, must also be addressed. BizTalk Messaging Services addresses these requirements.

BizTalk Messaging Services is ideally suited for sending and receiving messages between applications within an organization. It supports the following features:

Parsing and validation of inbound messages

Tracking of inbound and outbound messages

Generation and correlation of receipts

Use of transformation maps to change the structure and format of data

Data integrity and security

BizTalk Messaging Services can be configured either by using BizTalk Messaging Manager, a graphical user interface (UI), or programmatically by using the BizTalk Messaging Configuration object model.

Product Mapping

Messaging Engine

The following topics describe the mapping of BizTalk Messaging features to the logical services they support. The following illustration shows a conceptual architectural breakdown of the BizTalk Messaging engine. The base services supported by the engine itself are supplemented by a set of user productivity tools, described later in this section.

Receive Services

BizTalk Messaging includes a built-in collection of receive services that provide a number of interfacing logical services including dispatch, deserialization, and decoding.

In architectural terms, BizTalk Server provides a native COM interface for submitting documents (or messages) for processing. In addition to the COM interface, pre-built services are provided to support the receipt of messages on the supported protocols. These services monitor their respective protocols, and on the receipt of a message they call the Submit method to send the message to BizTalk Server. The following built-in receive services are supported:

HTTP. HTTP/HTTPS support is provided natively within BizTalk Server by an ISAPI filter. HTTP support also facilitates the use of BizTalk Server to implement and orchestrate XML Web services.

SMTP. SMTP (Simple Mail Transfer Protocol) support is provided through Microsoft Exchange Server used in conjunction with a provided script file for monitoring a shared folder and submitting the received document to BizTalk Server.

File. BizTalk Server provides a file receive service that can be configured to monitor a specified directory and, when a file is deposited in the directory, to automatically submit the file to BizTalk Server.

For efficiency, BizTalk Server does not poll the directory for deposited files; it instead utilizes the "file change event notification" feature of the NTFS file system as the stimulus trigger. This has implications when integrating with alternative file systems where applications are depositing files. The simplest solution is to host the receive directory under NTFS on Windows 2000 and make this available to other systems, for example using NFS between Windows 2000 and a UNIX system. In the case where the file system is located on another system and made available to NTFS, it must support the file change notification events; for example, it is known that the latest version of SAMBA for UNIX will support this feature. Alternative mechanisms such as FTP could be used to transfer documents between the systems.

MSMQ. BizTalk Server provides a Message Queuing (MSMQ) receive service that can be configured to monitor a specified queue and, when a message is deposited in the queue, automatically submit the message to BizTalk Server.

A consideration in using MSMQ as a transport is that it has a storage limit of 4 megabytes (MB) per message that is stored in a message queue, and a total limit of 2 gigabytes (GB) for all messages that can be stored in all message queues on a single server.

MQSeries. Microsoft provides native support for IBM MQSeries through the BizTalk Adapter for MQSeries. This allows the definition and management of a receive service that can be configured to monitor a specified queue and, when a message is deposited in the queue, to automatically submit the message to BizTalk Server.

BizTalk Server is extensible, and custom components can be developed to support message receipt on alternative transport protocols. Application adapters can exploit any of the technology and products providing application and data connectivity services which were described earlier in this section. For example an application adapter can be built to connect to DB2 running on an AS/400 through the OLE-DB provider shipped within Host Integration Server.

The built-in receive services are also extensible and preprocessing of received data can be incorporated. For example, messages received via MSMQ may be compressed by the sending application using a particular propriety compression algorithm and complimentary decompression will be required before the message can be processed.

Delivery Services

Complementing the receive services, BizTalk messaging also includes built-in delivery services, supporting the following transport protocols:

COM

MSMQ

HTTP/HTTPS

SMTP

File

Like the receive services, the delivery services are also fully extensible through the exploitation of what are referred to as application integration components (AICs) or application adapters. An AIC is a simple COM component that is recognized by the BizTalk framework and provides the bridge between BizTalk Server and the application. Such adapters can exploit any application or data connectivity software described in the previous sections. For example, an AIC could exploit the COMTI component of Host Integration Server to integrate a CICS application with the BizTalk Server EAI services.

Message Queuing Services

The BizTalk Server messaging engine is highly scalable through the provision of a "Message Queuing" logical service native within the engine. This provides an asynchronous interface for application integration that supports both increased scalability and resilience. Synchronous interfaces are also supported for those scenarios where they are required.

Architecturally, the receive and processing services are separated, allowing the receive services to be purely responsible for receiving messages into the engine and the processing services responsible for routing and processing messages. This separation enables processing servers to be arranged into a "processing group," allowing the processing workload to be spread across servers in the group. This allows an organization to simply grow the deployment of servers into the group as throughput requirements increase, without further re-engineering. The processing group also provides a naturally resilient architecture. This architecture has been proven to support the processing of hundreds of millions of transacted (therefore reliable through persistence) documents per day, with 100% observed availability, in a single managed image.

Transactional Services

The BizTalk Server messaging engine supports full two-phase commit transactional services through the exploitation of COM+ transactions. This enables the interface between an application and the messaging engine to be transactional for both the receipt and delivery of messages. For example, this can be used to ensure that an update to a database in response to the processing of a message occurs only if the message is also removed from the messaging engine, ensuring once-only processing.

Transactional support is provided only if the resource or application itself is transactional; for example, interfacing to a CICS application within a transaction would require the exploitation of the Host Integration Server COMTI technology within the BizTalk Server application adapter.

Reliable Message Delivery Services

The BizTalk Server messaging engine provides a reliable messaging service that supports a protocol for the reliable delivery of messages over non-reliable transport protocols, such as HTTP or SMTP. The reliable delivery is provided through the support of the BizTalk Framework 2.0 envelope (built on the SOAP standard).

The reliable delivery is based on delivering the document and the receiver returning an acknowledgment within a given time frame. The protocol allows the sender to retry sending the document until the acknowledgement is received or until the defined retry time limit is exceeded. The receiver is required to return an acknowledgment and not to process the message again if they have previously processed the message. This protocol is natively supported by BizTalk Server.

Routing Services

The linking of applications through BizTalk Server messages provides an organization with great flexibility through the logical routing service supported by the framework. For example, sending applications do not need to be aware of recipients of their information, which allows a receiving application to be replaced in the future without impacting the sending application.

Transformation and Mapping Services

BizTalk Server messaging provides both mapping and transformation logical services, supporting transformation between any of the following source and destination formats:

XML

Flat-file delimited

Flat-file positional

Hybrid delimited and positional flat files

EDI (X12 and UN/EDIFACT)

Custom (extensions performed through the use of an included SDK)

In supporting this flexible mapping between various document formats, BizTalk Server converts all non-native XML documents to an intermediate XML format before applying the data transformation. The use of this intermediate XML format allows all the transformations to be performed by using the W3C transformation standard XSLT.

The following illustration shows the process of mapping a source specification to a destination specification. The source file is an EDI-based document, and the destination file is a flat-file document. In this example, the EDI document structure is converted to an intermediate XML format, the structure of which is represented by an XML schema specification. The final format of the data is a flat file. A data-driven parser (that uses the XML schema specification) creates an XML version of the source EDI specification. The XSL engine then transforms this source XML representation to an XML representation of the destination file format. The destination specification is later serialized to the native format of the destination file, which is a flat file in this example.

The XSLT required for the transformation is generated by using the BizTalk Mapper tool, which provides the graphical design tool for specifying the document mapping (see the "BizTalk Mapper" topic later in this document. The Mapper supports a variety of mapping scenarios that range from simple, parent-child tree relationships to detailed, complex looping of records and hierarchies.

Sometimes there is no simple mapping, and transformation is also required. For example, the contents of several source fields may need to be combined to form the contents of a destination field, or some data processing may need to be performed on the contents of the source specification field to produce the required contents of the destination field. Within the Mapper, there are two ways to introduce intermediate data processing during the mapping process:

Functoids. Functional objects that perform simple predefined operations (for example, string manipulation or mathematical operations). They can be used singularly or combined to perform arbitrarily complex and unique transformations.

Scripts. Short user-written scripts, executed by the script functoid, which allow for more complex data processing. These scripts can then become reusable objects within the mapping environment.

Functoids provide extensions to simple links that enable a data item to be transformed as it is mapped from the source to the destination, for example converting the text to uppercase. Pre-built functoids are provided in the following categories: String, Mathematical, Logical, Date and Time, Conversion, Scientific, Cumulative, and Database, together with a collection of Advanced functoids.

Metadata Services

BizTalk Server supports a number of metadata logical services including schemas, configuration, and names. BizTalk Server metadata is held centrally within a configuration database that is shared across instances deployed in the processing group. The information may also be accessed and modified programmatically through a COM-based configuration model.

Data formats are maintained as XML-based specifications, exploiting XML schema technology implemented in the XML parser that is supplied with Internet Explorer 5.0 and later versions. XML schemas are defined by using the BizTalk Editor; see the "BizTalk Editor" topic later in this document for further details. While XML schema technology natively supports XML data formats, embedded XML notations support the definition of validation and processing rules for positional and delimited flat files as well as EDI-based file formats. Predefined schemas are also provided for common EDI messages based on X12 and UN/EDIFACT EDI standards.

All interface schemas and maps are maintained within a Web Distributed Authoring and Versioning (WebDAV) repository. WebDAV is a standard of the Internet Engineering Task Force (IETF) for collaborative document editing over the Web.

Indexing and Searching Services

A document tracking service is also provided, supporting the ability to optionally track interchanges made through the messaging engine together with an associated search facility. See the "BizTalk Document Tracking" topic later in this document for further information.

Productivity Tools

BizTalk Server includes the following productivity tools:

BizTalk Editor. Supports the definition of structured business documents through a graphical tool.

BizTalk Mapper. Allows the mapping and transformation of business data to be expressed in a graphical tool.

BizTalk Messaging Manager. Provides an environment for the definition and management of application and trading partner integrations.

BizTalk Document Tracking. Provides the ability to track and analyze business data.

BizTalk Orchestration Designer. Supports the automation of distributed business processes through their definition using a Microsoft Visio®-based graphical interface. The orchestration designer is described in further detail in "Solution: Integration of Business Processes."

More details about these tools can be found in the BizTalk Server documentation or in Appendix H: BizTalk Server Tools.

Application Adapters

Encourage application ISVs to build native BizTalk Server connectivity. Unlike other EAI vendors who have built the adapters themselves, Microsoft has encouraged application and technology vendors to develop their own adapters.

Partner with adapter ISVs to deliver application and technology adapters.

Provide means for customers to easily build custom adapters themselves through the use of an Adapter Development Kit SDK included with the product.

Through the execution of this strategy, application and technology adapters are available for the following. Note that this list is continually growing.

Application adapters (as of January 2002):

Ariba

Clarus

Commerce One

Cove Systems

Eqos

FrontStep

Great Plains

J.D. Edwards

Kewill

Manhattan Associates

Mapics

McHugh

Mega

Microsoft Office XP

Navision

Onyx

Oracle

Partner Community

Peachtree

Peoplesoft

Peregrine

Pivotal

QAD

Quickbooks

Remedy

SAP

Scala Business Systems

ServiceSwitch

Siebel

Slam Dunk Networks

Staffware

Trade Power

Ultimus

VerticalNet

Visibility

Worldtrak

Technology adapters (as of January 2002):

Active X Data Objects

ADDS

ADM11/H

ADM11/P

ADM11/R

ADM11/W

AFP

Ampex 230

ANSI

ASCII

BizTalk Framework

C API

CICS / IMS

Cifer T205

Cobol

COM

CORBA

Dec VT

DG 216

Document Archival

EBCDIC

EDI

Flat File

FTP

GALILEO

HIPAA

HL7

HP 700/92/2392A

HTML

HTTP

HTTPS

IBM 3270

IBM 5250

IBM DB2/400

IBM DB2 UDB 6.1

IBM DB2 UDB 71.

ICL 7561

ICL DRS-M10/M15

IMS

IN2 SM9400j

INS SM9400g

Ingres

J2EE

Java

Java Beans

JDBC

JMS

Loopback

MDIS PRISM 8 / 9

Microsoft SQL Server 7.0

Microsoft SQL Server 2000

Microfusion MF-30

Monitoring

MQ Series

MSMQ

ODBC

Oracle 8i

Oracle 7.x, 8.X

Oracle 9i Advanced Queuing

P9 ANSI

PC Monitor

PDF

Progress Database 8.3b, 9.0

PT200

RosettaNet

SCO ANSI

SMTP

Sybase

TELEVIDEO 955

TVI 920

UML

VIDEOTEXT

WYSE 50/60

WYSE 50+

XML.

Appendix E: Integration of Business Processes

Business process orchestration is an approach to business process automation that involves coordinating activities that represent steps in a defined business process as a series of messages and communication among those activities. To perform business process orchestration, it is important to understand each piece within the system, and then understand that it is a combination of well-designed pieces acting in concert that brings the desired result.

Orchestration

Fundamental to the concept of orchestration is being able to quickly and simply change and redeploy a business process as the situation or organizational requirements change. Business processes need to be changed rapidly to adapt to changing business needs, and this is even truer as you expand outside of one business and cross over into another. People move in and out of organizations, laws change, requirements change, and feedback must be allowed to help optimize the system. The need can be stated as follows: The system needs to be flexible yet powerful enough to adapt to all of these changes, and where appropriate adapt automatically to the changes without intervention. This is a system that provides business process orchestration.

The Data Integration services focus predominantly on resolving the heterogeneity of applications at a data level. Business process orchestration services, on the other hand, are concerned with integrating applications at the process level. They are concerned with factors such as time, order, correlation, integrity, and events.

Logical service

Realization in BizTalk Server

Integration

+

Format

Through orchestration, application-specific request and response formatting can be abstracted from the client, allowing BizTalk Orchestration to communicate using an open format such as a SOAP envelope. This approach means that little or no modification should be necessary to existing enterprise applications and that third-party Web services can be accommodated.

Furthermore, message metadata can be added and handled through orchestration. This could include timestamps, process audit information, and standardized error reporting.

Orchestration

+

+

+

+

+

+

+

+

Transaction Integrity

+

+

+

+

+

+

+

Schedule

+

+

+

Process Flow

+

+

+

Non-Delivery

+

+

+

Integration Events

One of the key benefits of an orchestrated business process is the ability to branch and synchronize subprocesses. This enables data source requests to execute in parallel, reducing the complexity of implementing the required logic for each branch.

Also, an orchestrated process allows compiled logic to be leveraged in an accessible and flexible fashion that is open to change in the short or medium term.

COM-based integration with the BizTalk Orchestration Engine allows for the integration of transactions in processing orchestrations. The entire orchestration may be treated as a single transaction to be committed or rolled back as a single unit, or individual transactions may be implemented within the orchestration.

The BizTalk Orchestration Service provides scheduling services for the defined process flows.

BizTalk Orchestration services provide for the definition, execution, and management of process flow.

The BizTalk Message Service provides mechanisms for handling the non-delivery of messages, as does MSMQ.

BizTalk Messaging Services.

Product Mapping

BizTalk Server Orchestration

BizTalk Messaging Services are designed to support the receipt of messages that then flow into a business process, or to send messages that flow out of a business process. BizTalk Orchestration is designed to manage business processes. Therefore, the two services are designed to work together, with BizTalk Messaging Services providing a receipt and delivery support layer for BizTalk Orchestration Services.

BizTalk Orchestration Services can also use BizTalk Messaging Services to integrate one business process with another by sending or receiving messages between the two business processes.

To send or receive messages between two distinct business processes, you must:

Use BizTalk Orchestration Services to create an XLANG schedule that sends a message and an XLANG schedule that receives it.

Use BizTalk Messaging Services to create a messaging port. This messaging port must be configured to instantiate a new instance of the receiving XLANG schedule and deliver a message to a specified port in that schedule.

Use BizTalk Messaging Services to create a channel for the messaging port that you created. This channel must be configured to receive a message from the sending XLANG schedule.

A common scenario for integrating the two services is the correlation of messages within a single running XLANG schedule instance. That is, to have an XLANG schedule instance send a message to an internal application or a trading partner, and to expect a message in return. An example is sending a purchase order and expecting a purchase order acknowledgement in return.

New Business Processes with BizTalk Server Orchestration

BizTalk Orchestration Services are ideally suited for developing business processes. Business-process design and implementation have traditionally been performed in two distinct phases: the visual design phase and the coding phase. The visual design phase typically consisted of the analysis of an existing business process (such as corporate procurement) and the creation of a workflow diagram or an interaction diagram to describe the process. The coding phase was usually performed separately. In this paradigm, you would build an abstract visual model of a business process and then map the model to an implementation framework.

One of the important features of BizTalk Orchestration Services is the integration of these previously distinct phases within a unified design environment. This design environment provides a versatile drawing surface and a comprehensive set of implementation tools. BizTalk Orchestration Services enables you to:

Create XLANG schedule drawings that describe business processes.

Implement business processes by connecting specific actions within a drawing to ports that represent locations to which messages are sent or from which messages are received. Ports are named locations, and messages represent the data sent or received between actions and ports.

BizTalk Orchestration Services are also designed to manage business processes that might need to be altered quickly or often. In the past, developers have created COM+ components that controlled the business processes, and more traditional COM+ components that did the work. BizTalk Orchestration Services enable you to replace the business process control components with XLANG schedules. However, it is not recommended that you use BizTalk Orchestration Services to define processes at the work level. Instead, use your existing traditional COM+ components. The value of BizTalk Orchestration Services diminishes if it is used to control small portions of a larger business process.

Long-Running Transactions with BizTalk Server Orchestration

In addition to the integration of design and implementation functionality, BizTalk Orchestration Services provides another important feature: the ability to create and manage robust, long-running, loosely coupled business processes that span organizations, platforms, and applications. During an asynchronous, loosely coupled, long-running business process, a product that is ordered over the Internet might have to be built from parts that are in inventory. Some of these parts might even be temporarily out of stock. The entire business process might take weeks or months to complete. In contrast, a tightly coupled business process involves the synchronous exchange of messages. For example, when a customer withdraws money from a bank account, the debiting of the account is immediately followed by the delivery of the money.

By providing an integrated, graphical modeling environment, BizTalk Orchestration Services provides the following important benefits:

When business processes change, the implementation can be quickly and easily redefined.

Concurrent processes can be easily designed, implemented, and maintained.

Transactions (long-running, short-lived, and nested) can be easily structured and maintained.

One of the key strengths of BizTalk Orchestration Services is to manage and maintain the state of long-running transactions.

BizTalk Orchestration Services is a business process automation tool. It is not intended to be a complete workflow system replacement. In particular, it is not intended to define role-based, hierarchical escalation in person-to-person processes. For business processes that contain role-based aspects that are escalated in a no-response situation, these processes are more appropriately implemented as Microsoft Exchange workflows, which can be integrated with BizTalk Orchestration Services.

In addition to these base system services, BizTalk Server Messaging supports Public Key Infrastructure (PKI) X.509 Digital Certificates for secure interchanges between applications or trading partners. Interchanges can be secured at two layers:

Transport layer. Here the actual exchange is secured at the transport level and the document exchange can be authenticated and encrypted. In securing at the transport level, after the document has been received and secured at the receiving partner:

The sender of the document can no longer be authenticated from its source.

The document is no longer encrypted.

It is not possible to verify the document's origin after it has been received.

Document layer. Here the document itself is secured and can be encrypted and/or digitally signed. BizTalk Server implements the S/MIME standard for document integrity, authentication, and confidentiality. By using S/MIME, BizTalk Server can store, send, and receive documents that are:

Digitally signed.

Encrypted.

Encrypted and digitally signed.

Because BizTalk Server follows the S/MIME version 3 specification, you can securely exchange documents with other applications that implement the S/MIME standard.

Management

You can use Windows Management Instrumentation (WMI) and Microsoft Operations Manager (MOM) to monitor BizTalk Server. These two management tools enable you to:

Create performance statistics for running business processes (XLANG Scheduler) without the use of scripts.

Monitor the basic BizTalk Server components, such as the BizTalk Server databases and database tables, per-instance queues, and services.

Further information about WMI and Microsoft Operations Manager can be found in the Windows 2000 and MOM product documentation respectively.

To further strengthen the monitoring capabilities of Microsoft Operations Manager, the BizTalk Server 2002 Enterprise Edition Management Pack module contains computer grouping and processing rules, as well as other information that enables you to monitor events specific to BizTalk Server 2002. You can import the BizTalk Server 2002 Management Pack to monitor computers in the current configuration group.

The BizTalk Server 2002 Management Pack provides an initial set of rules and counters that include all of the Messaging and Orchestration error messages, the BizTalk Server databases, Windows NT events, and queues. You can customize the Management Pack to reflect your monitoring needs and obtain greater detail in the areas your organization is interested in monitoring.

A processing event rule for every Windows Application event that can be generated from BizTalk Messaging and BizTalk Orchestration.

Performance measurement rules for performance counters and database file sizes in BizTalk Server 2002. These support the 20 performance views in the Management Pack specific to BizTalk Server.

Performance threshold rules for performance counters and database file sizes in BizTalk Server 2002 (one for error severity and one for warning severity).

Custom Counter, Suspended Queue, and three other sample rules that show how to build rules customized to your specific BizTalk Server 2002 implementation.

Using these rules, system administrators can be notified:

When the BizTalk Messaging Service is down or how long it has been running.

When any Windows event is generated from BizTalk Messaging or BizTalk Orchestration.

Whether documents to an important customer are being suspended and how many documents to an important customer are successfully sent per period of time.

When the BizTalk Server databases are getting to a specific predetermined size.

When the number of failed schedules per unit time becomes too large.

Appendix F: ETL with Host Integration Server 2000 Data Integration

The Data Integration layer of Microsoft® Host Integration Server 2000 provides access to both structured and non-structured data stored on IBM mainframe or AS/400 computers. This data can be stored in a database or file system. In addition to data access, the Data Integration layer is also responsible for providing data transfer services between Microsoft Windows® 2000 computers and host systems. The Data Integration layer consists of components that make use of existing mainframe and AS/400 software.

The Data Integration layer can be broken down further into the following categories:

Relational database access

Record file access

File transfer

AS/400 data queue access

All of these services make use of IBM host-based products that implement the IBM Distributed Data Management Architecture (DDM). DDM is a framework or methodology for sharing and accessing data between systems. DDM defines the "how to communicate" and leaves it up to individual platform vendors to implement the DDM architecture. IBM currently supports DDM for most IBM platforms, including: OS/390 (MVS), AS/400, RS/6000 (AIX), and AS/36. By supporting DDM, application developers are freed from having to write complex communications interfaces for each platform they need to support. Instead the application and DDM handles this complexity on behalf of the application.

Relational Database Access

Much of the operational data stored on OS/390, AS/400, and RS/6000 computers is accessed via a relational database management system. The most popular database on these host systems is IBM DB2. In the case of the AS/400, DB2 is integrated with the operating system. For OS/390 and RS/6000 computers, it is common for organizations to deploy the IBM DB2 relational database management system (RDBMS).

What all of these host systems have in common is that data stored in these databases are accessible as relational tables using Structured Query Language (SQL). This allows for efficient and standardized access to the data on the local DB2 system. However, for many years, there was no common means of accessing data across systems on remote DB2 computers. To resolve this problem, IBM devised Distributed Relational Databases Architecture (DRDA) and has passed the architecture to The Open Group for publication and future extension.

DRDA offers both Remote Unit of Work (RUW) and Distributed Unit of Work (DUW) access to host data. RUW is used for read-only and simple updating of database tables using SQL statements and stored procedures. DUW is used when updates span multiple DB2 instances or computer systems and supports the two-phase commit (2PC) protocol. The 2PC protocol ensures that changes to multiple databases will either succeed or fail in their entirety.

Through its Universal Data Access (UDA) architecture, Microsoft supports two popular methods of accessing remote relational databases: the industry-standard Open Database Connectivity (ODBC); and the broader Object Linking and Embedding DB (OLE DB). ODBC is designed specifically for interoperating with SQL-accessible RDBMSs. ODBC is implemented by independent software vendors (ISVs) in the form of either a back-end data base driver, or as a front-end application (e.g., reporting or query tool). Microsoft and other vendors offer ODBC drivers for most of the popular RDBMSs. Microsoft defined OLE DB as a multi-tier distributed architecture for accessing both SQL RDBMSs and non-SQL data sources (e.g., mail folders, Internet server stores, flat file systems). In the OLE DB architecture, ISVs develop software that participates in one of three roles: (1) OLE DB provider, or back-end data source driver, (2) OLE DB service component (e.g., query processor, cursor engine), and (3) OLE DB consumer (e.g., Web service or application, GUI query or reporting tool). OLE DB is based on the Component Object Model (COM) and OLE DB providers are designed to expose a well-known set of interfaces. When a provider cannot expose specific, useful, or often-required functionality, an OLE DB service component is employed to extend and standardize the abilities of the provider. In this way, OLE DB consumers can be written to access multiple data sources without knowing any of the vagaries or limitations of a given back end provider.

The first of these methods is the ODBC Driver for DB2. It relies on an underlying DRDA application requester (AR) developed by Microsoft. The DRDA AR connects the ODBC driver to DB2 on popular platforms, including OS/390, OS/400, RS/6000-AIX, and Windows NT®, Windows 2000, Windows XP.

It provides a flexible way for developers using the ODBC API to create applications that can access DB2 records quickly and efficiently. The driver supports the DRDA Level 3 standard and ODBC 3.x interfaces, and allows application programmers to write C and C++ applications that issue dynamic SQL queries and call DB2 stored procedures.

The second method to access DB2 is through the OLE DB Provider for DB2. This component is also implemented to site on top of the DRDA AR, and therefore supports the same target DB2 systems and substantially the same DB2 access features (e.g., dynamic SQL and stored procedures, 2PC, SNA LU6.2 and TCP/IP network connectivity). Developers can use C or C++ to integrate DB2 data with Web-based and Windows-based applications. Microsoft Visual Basic® and Web developers (using scripting languages such VBScript) can use the higher-level ActiveX® Data Objects (ADO) to develop e-commerce solutions. Additionally, DB2 is directly accessible from productivity applications, such as Microsoft Office 2000 using Visual Basic for Applications (VBA) and ADO from within Excel.

Many organizations want to improve corporate decision making by centralizing data that is stored in a variety of formats in a number of different places. Database administrators can use Data Transformation Services (DTS), a feature of Microsoft SQL Server™ 2000 and Microsoft SQL Server 7, to import and export data between multiple heterogeneous sources using the OLE DB Provider for DB2. Using this tool, administrators can create a data warehouse using DB2 data, plus integrate most other data sources accessible via an OLE DB provider.

The Distributed Query Processor (DQP), another feature of Microsoft SQL Server, allows users to access data that resides on multiple, distributed databases across multiple servers. Using DQP, SQL Server administrators and developers can create linked server queries that run against multiple back-end data sources with little or no modification. DQP enables application developers to create heterogeneous queries that join tables in SQL Server with tables in DB2. Also, DQP can be used to create SQL Server views over DB2 tables so that developers can write directly to SQL Server and integrate both Windows-based and host-based data in their applications with ease.

Record File Access

Another rich source of legacy information is the large amount of data still stored in mainframe VSAM files, Partitioned Datasets, and AS/400 files. Host Integration Server 2000 supports the following services for access to non-relational host data:

The OLE DB provider for AS/400

The OLE DB provider for VSAM

The OLE DB Provider for AS/400 supports record-level access to keyed and non-keyed physical files with external record descriptions, as well as logical files with external record descriptions. Also, the provider can use an optional Host Column Description (HCD) file to describe the format of the target file, mapping the AS/400 data types to OLE DB data types, allowing the developer to access AS/400 flat data files and source files.

The OLE DB Provider for VSAM, which relies on the HCD files to define the metadata of the target data set or member, provides access to most types of mainframe based VSAM files.

File Transfer

Most 3270 emulators support the ability to transfer files between a mainframe computer and a workstation using the IND$FILE utility program. This program works in conjunction with a host operating system such as TSO or teleprocessing monitor software such as CICS running on the mainframe. This process, is often manual and is somewhat inefficient due to the need to use 3270 terminal emulation on the client and to have the host operating system act as an intermediary in the data transfer process. Host Integration Server 2000 provides several more efficient methods to perform file transfer. These methods are:

Host File Transfer

APPC File Transfer Protocol (AFTP)

AS/400 Shared Folders

The Host File Transfer utility lets developers move files between a host system and a local Windows computer. Host Integration Server 2000 provides this service through a single ActiveX Control. This extends the ability of the client application to perform file transfer operations from a large number of client development environments. Using HCD files, the Host File Transfer can access the same mainframe data set types as the OLE DB Provider for VSAM, yet it is optimized to download or upload the entire contents of the data set or member. Other supported environments include the AS/400 and AS/36.

The TCP/IP based File Transfer Protocol (FTP) is often used to move files between computer systems running under UNIX, VMS, and other operating systems. This capability is typically provided as a utility program that implements a set of commands that can be used to connect to a remote computer, log on, navigate to specific locations in the local and remote computer file systems, and then transfer a file (or multiple files) to or from that computer. Unfortunately, to use this protocol to transfer files to a host computer would require TCP/IP on the host. (Most data center managers are reluctant to support TCP/IP on a host computer due to security and performance issues.) Because of the popularity of this protocol, however, IBM has implemented a similar SNA function, the APPC File Transfer Protocol (AFTP). This allows files to be transferred between SNA systems using commands that are so similar to FTP commands that anyone familiar with FTP can easily use AFTP to perform file transfer functions. Internally, AFTP transfers files using the LU 6.2 program-to-program protocol, which is quite efficient for transferring files. AFTP software can be installed either on the Host Integration Server 2000 server or client and used to transfer files to an SNA host.

The AS/400 Shared Folders feature of Host Integration Server 2000 allows a Windows NT or Windows 2000 administrator to re-share a file on an AS/400 host as if it is a local file system directory. Because the AS/400 shared folders feature uses standard operating system file sharing, it requires no software on the client. The client simply sees the folder as a standard Windows NT or Windows 2000 shared directory. This feature is implemented in Host Integration Server 2000 using the same AS/400 PC Support software that allows workstations to access AS/400 files in a pure SNA network configuration.

AS/400 Data Queue Access

AS/400 Data Queues are used on an AS/400 to send data records between separately executing programs. Multiple AS/400 client programs can send data records to a single server program running on an AS/400. Alternatively, a single client program can send records to an AS/400 Data Queue and multiple server programs can extract the records and process the data in parallel. This feature proved so useful in developing AS/400 applications that IBM extended the use of AS/400 Data Queues to PC workstations. Host Integration Server 2000 enables Windows 32-bit applications to access data queues via the AS/400 Data Queue COM Automation Control. Host Integration Server 2000 lets developers access AS/400 data queues from a PC running Windows, so they can move part or all of their AS/400 applications from an AS/400 computer to a PC platform and still use the PC-based program to access a remote data queue on the AS/400.

Appendix G: ETL with SQL Server 2000 and DTS

Many organizations need to centralize data to improve corporate decision-making. However, their data may be stored in a variety of formats and in different locations. Data Transformation Services (DTS) addresses this vital business need by providing a set of tools that lets you extract, transform, and consolidate data from disparate sources into single or multiple destinations supported by DTS connectivity. By using DTS tools to graphically build DTS packages or by programming a package with the DTS object model, you can create custom data movement solutions tailored to the specialized business needs of your organization.

DTS Packages

A DTS package is an organized collection of connections, DTS tasks, DTS transformations, and workflow constraints assembled either with a DTS tool or programmatically and saved to Microsoft® SQL Server™, SQL Server 2000 Meta Data Services, a structured storage file, or a Microsoft Visual Basic® file.

Each package contains one or more steps that are executed sequentially or in parallel when the package is run. When executed, the package connects to the correct data sources, copies data and database objects, transforms data, and notifies other users or processes of events. Packages can be edited, password protected, scheduled for execution, and retrieved by version.

DTS Tasks

A DTS task is a discrete set of functionality, executed as a single step in a package. Each task defines a work item to be performed as part of the data movement and data transformation process, or as a job to be executed.

DTS supplies a number of tasks that are part of the DTS object model and can be accessed graphically, through DTS Designer, or programmatically. These tasks, which can be configured individually, cover a wide variety of data copying, data transformation, and notification situations. For example:

DTS Designer includes a Transform Data task that allows you to select data from a data source connection, map the columns of data to a set of transformations, and send the transformed data to a destination connection. DTS Designer also includes a Data Driven Query task that allows you to map data to parameterized queries.

Copying database objects.

With DTS, you can transfer indexes, views, logins, stored procedures, triggers, rules, defaults, constraints, and user-defined data types in addition to the data. In addition, you can generate the scripts to copy the database objects. However, there are restrictions on this capability.

Sending and receiving messages to and from other users and packages.

DTS includes a Send Mail task that allows you to send an e-mail message if a package step succeeds or fails. DTS also includes an Execute Package task that allows one package to run another as a package step, and a Message Queue task that allows you to use Message Queuing to send and receive messages between packages.

Executing a set of Transact-SQL statements or Microsoft ActiveX® scripts against a data source.

The Execute SQL and ActiveX® Script tasks allow you to write your own SQL statements and scripting code and execute them as a step in a package workflow.

Because DTS is based on an extensible COM model, you can create your own custom tasks. You can integrate custom tasks into the user interface of DTS Designer and save them as part of the DTS object model.

DTS Transformations

A DTS transformation is one or more functions or operations applied against a piece of data before the data arrives at the destination. The source data is not changed. For example, you can extract a substring from a column of source data and copy it to a destination table. The particular substring function is the transformation mapped onto the source column. You also can search for rows with certain characteristics (for example, specific data values in columns) and apply functions only against the data in those rows. Transformations make it easy to implement complex data validation, data scrubbing, and conversions during the import and export process. Against column data, you can:

Manipulate column data.

For example, you can change the type, size, scale, precision, or nullability of a column.

Apply functions written as ActiveX scripts.

These functions can apply specialized transformations or include conditional logic. For example, you can write a function in a scripting language that examines the data in a column for values over 1000. Whenever such a value is found, a value of -1 is substituted in the destination table. For rows with column values under 1000, the value is copied to the destination table.

Choose from among a number of transformations supplied with DTS.

An example would be a function that reformats input data using string and date formatting, various string conversion functions, and a function that copies the contents of a file specified by a source column to a destination column.

Write your own transformations as COM objects and apply those transformations against column data.

DTS Package Workflow

You can define the sequence of step execution in a package with:

Precedence constraints that allow you to link two tasks together based on whether the first task executes, executes successfully, or executes unsuccessfully. You can use precedence constraints to build conditional branches in a workflow. Steps without constraints are executed immediately, and several steps can execute in parallel.

ActiveX scripts that modify workflow.

Connectivity

DTS is based on an OLE DB architecture that allows you to copy and transform data from a variety of data sources. For example:

The Data Transformation Services node in the SQL Server Enterprise Manager console tree, which is used to view, create, load, and execute DTS packages, to control DTS Designer settings, and to manage execution logs.

Package execution utilities:

The dtswiz utility starts the DTS Import/Export Wizard by using command prompt options.

The dtsrun utility runs a package from a command prompt.

The DTS Run utility (dtsrunui) allows you to run a package by using dialog boxes.

Metadata

DTS includes features for saving package metadata and data lineage information to Metadata Services and linking those types of information. You can store catalog metadata for databases referenced in a package and accounting information about the history of a particular row of data for your data mart or data warehouse.

Appendix H: BizTalk Server Tools

BizTalk Editor

The exchange of structured messages is fundamental to application integration using Microsoft® BizTalk® Server. Whether those messages use XML or some other parsed text, programmers must be able to specify the structure of a message — the schema or specification, i.e., support for the Schema logical service.

BizTalk Editor is a graphical tool with which you can create, edit, and manage specifications. BizTalk Editor uses XML-Data Reduced (XDR) syntax, which provides a common vocabulary to handle overlaps between syntactic, database, and conceptual schemas. Using a common tree-structure metaphor, the editor allows the same tool to be used for specifying the message structure, whether that structure is XML, delimited flat-file, positional flat-file, UN/EDIFACT or X12. The following illustration shows the BizTalk Editor user interface.

BizTalk Editor also provides several templates that can be used as starting points for creating specifications for common documents, such as purchase orders, invoices, and advance shipping notices including common X12 and UN/EDIFACT EDI documents.

To further increase productivity, the editor supports the ability to create a schema from a well-formed XML instance or XML DTD by using an import facility.

BizTalk Mapper

BizTalk Mapper supports the Mapping and Transformation services offered by the Messaging Engine. The mapper increases user productivity by enabling the mapping and transformations required between application and document formats to be expressed using a graphical tool, as shown in the following illustration.

The mapping between two application data formats is expressed in the mapper by selecting the source and destination schemas (as previously created using the editor) and then graphically representing the mapping by dragging the source to the destination field. Additional transformations, for example concatenating two fields, can be expressed as functoids.

BizTalk Messaging Manager

BizTalk Server provides two methods for configuring BizTalk Messaging Services to manage the exchange of messages/documents between applications or trading partners. You can either use BizTalk Messaging Manager, which is a graphical user interface (UI), or directly access the BizTalk Messaging Configuration object model.

The messaging manager provides the environment for managing all interchanges, including the configuration of applications, organizations, document formats, document processing steps, and application/organization connectivity. The definition of processing steps applied on an exchanged document is referred to as a channel definition. The channel definition includes transformation, encryption, digital signing, and tracking requirements.

The following illustration shows the BizTalk Messaging Manager user interface.

Whether or not you use the user interface depends on the amount and type of information available to you in your database. The BizTalk Messaging Configuration object model application programming interface (API) enables you to automate all or part of the configuration process, rather than entering the data for each individual entity into BizTalk Messaging Manager. In general, the more interfaces you have, the more it benefits you to use the API to configure your messaging service.

The BizTalk SEED Wizard, introduced with BizTalk Server 2002, enables companies to package their BizTalk Server configurations into a SEED package and make it available to trading partners through the Internet. Companies that create a SEED package still have to manually configure BizTalk Server to receive documents. However by creating a SEED package, trading partners are able to configure BizTalk Server, test their configuration, and begin exchanging documents with the initiating company. A SEED package helps companies rapidly start conducting business with partners.

BizTalk Document Tracking

BizTalk Server provides built-in tracking capabilities to record document exchanges. You can use BizTalk Document Tracking to do the following:

Fulfill legal and/or standards requirements to keep copies of all electronic business transactions.

Answer questions from partner organizations quickly and easily. For example, if they ask "When did we send this set of clinical records?", you can locate the date, time, and whether they returned a receipt.

The Messaging Manager provides the ability to configure the requirements for tracking message data, which can be controlled at a number of levels based on the server group, document definition, or processing channel.

Displaying Tracking Data

A pre-built query tool is provided for viewing the contents of the BizTalk Tracking database.

The three standard query parameters included in BizTalk Document Tracking are date range and time zone, source and destination organization/application, and document type. You can find interchange and document records by defining one or more of these criteria in a query. For example, you can search for all document types in a specified date range. Or you can find interchanges and documents that are a certain document type and that match selected source and destination organizations.

The following illustration shows the Document Tracking user interface.

The following illustration shows the format of the output from the query tool. From the user interface you can select a particular interchange and view the metadata, any user selected fields to be extracted, and the document itself.

The content of the Tracking database can also be accessed programmatically through a COM interface. The Tracking database is simply implemented as an SQL database and can therefore also be accessed through custom SQL queries, or SQL data analysis tools can be applied to its content.

BizTalk Administration Console

BizTalk Server Administration — also called the administration console — is a Microsoft Management Console (MMC) snap-in that provides a visual representation of the Microsoft BizTalk Server components that a system administrator can manage.

The Administration console supports the administration of the following system components:

The BizTalk Server Group and all servers participating within the group

A Shared Queue database that persists all documents until they are successfully processed

The Tracking database that is used to log document and interchange activity and to run reports

Receive functions

Appendix I: Microsoft Architecture Patterns

The Microsoft Architecture Reference Model provides a standard approach to describing the business scenario and the architectural and design characteristics of a solution. Each layer in the reference model represents a set of information that describes particular attributes of the solution. The following illustration shows the Reference Model.

This model mirrors the normal system development process of the following steps:

Find out the business requirements.

Conceptualize the IT solution.

Identify in detail the logical services required (by either infrastructure or application services).

Identify the consequent physical services required.

Define the implementation topologies.

The Pattern Reference Model will therefore describe:

Business problem. What is the business trying to do?

In this we define the scope of the problem by decomposing the business topic into logical business areas from which the requirements for business services can be defined. This allows us to identify the business services needed to solve the business problem.

Conceptual solution. What is the shape of the IT solution?

From these services, the first sets of IT services can be deduced and defined in a conceptual solution, and their business service levels can be described in the requirements.

Logical solution. What IT services do we need to realize the solution?

In this we refine the IT services into more granular logical components, and mechanisms that are required to create the logical solution. In doing this we evaluate alternative ways of expressing the solution and choose the one we will take forward to implement.

Physical solution. With what infrastructural services will the solution be created?

Now the logical solution is fully converted into a hardware and software topological diagram, with products and connections defined. This does not take much account of non-functional business requirements at this time, because it is a generic pattern. However it may identify variations (different options) that would be driven by a loose description of non-functional requirements (such as "very scaleable, very resilient" versus "small, inexpensive").

Implementation solution. How should the Microsoft technology be implemented in the solution?

Finally we show more detail about how the Microsoft technology should be implemented in the particular physical configuration.

Experience shows that as more and more solutions are developed to address what is essentially the same problem in different industry sectors, a set of common logical services will emerge in the architecture. These common logical services will have common characteristics with regard to their deployment, scalability, etc., leading to commonality in the lower (physical and implementation) layers of the solution architecture. It is this commonality that the EAI pattern seeks to capture.