The Challenge of Information Integration

Over the past twenty years, enterprises have created many diverse systems to manage their information and data. Individual systems combine a myriad of hardware configurations, operating systems,
databases, and applications. Often, individual enterprises have found themselves with several disparate information systems among their divisions and departments, especially after mergers or
acquisitions have broadened the scope and depth of the enterprise.

As the world, not to mention the enterprise, networks more completely, the enterprise needs to integrate its diverse systems to operate and analyze its resources more effectively. Numerous external
sources, from partner information resources to real-time data feeds, have become available. The enterprise needs to marshal and integrate these disparate systems. At the heart of the systems
integration challenge lies an information integration challenge.

Model-driven integration differs from the programmed integration. Programmed integration relies upon hard-coding a finite, and inextensible, solution to a particular challenge. Model-driven
integration focuses on abstracting the information content into a model that describes the enterprise’s information resources. This model captures the nature of the information the enterprise
has within its systems and the way the enterprise uses data in its daily operations.

The data model does not rely on a particular hardware or operating system platform; instead, it contains standard constructs that show the data entities and operations. Once an
enterprise captures its information resources in a model, it can easily integrate this information using a middleware server.

Model-driven integration offers a complete solution because the data model can demonstrate:

Platform independence, as the contents of the model and the systems modeled can represent one or more different types of actual physical systems.

Singularity, as the model can contain as many information systems as needed.

Using model-driven integration middleware, such as the MetaMatrix Information Integration Server, also offers:

Real-time access, because the model-driven solution does not manipulate copies of the data. Data consuming applications can report from or update the native systems.

True integration. Data-consuming applications can access all data systems the enterprise has modeled.

Model-driven integration represents an evolution of the common and accepted practice of developing applications and representing systems using standard constructions in a model. The most common
example of this practice, and the precedent for model-driven integration, is the use of Unified Modeling Language (UML).

The Historical Precedent: UML

As development languages and environments proliferated in enterprises, developers accustomed to varied, and often proprietary, systems and languages found themselves lacking a common way to express
concepts such as workflow, relationships between entities, interaction, and other abstractions inherent in development.

In 1997, the Object Management Group (OMG) established a standard language for expressing these concepts. UML contains many types of models that represent standard presentation for a type of
information. Hence, UML represents a metamodel, a language for constructing and relating diverse models.

The Unified Modeling Language (UML) gained rapid acceptance and set off a new trend in application development. Several companies created software tools that implement the UML standard to create
graphical tools that enterprises of all types can use to “design” applications. Figure 1 contains a UML class diagram depicting a portion of an application.

Figure 1: UML Class Diagram

This model contains standard constructs; anyone versed in UML can review the model and understand its contents. By modeling applications in UML before writing actual code, the enterprise garners
many benefits:

Using UML, an enterprise can build a Platform Independent Model (PIM) that captures the design, business logic, and data requirements of each particular application. Independent of any platform,
the model describes the goals of the application without referring to a specific operating system, hardware configuration, or even programming language.

The Platform Independent Model could describe an application written in Java, running on the Sun Solaris operating system on an Alpha or the PIM could refer to a VisualBasic application running on
the Windows XP operating system on a quad Pentium IV server.

Automatic Code Generation through Tools

Certain new UML modeling tools can generate platform-specific code from a UML model. Using these tools and UML saves many enterprises a great deal of programming effort—and expense.

Simplified Application Migration

When an enterprise needs to migrate an application from one platform to another, a UML model makes the process straightforward. The UML model documents workflow and objects outside of the comments
within the code and the memories of developers. Without the model, enterprises can find both occasionally unreliable for a complex task.

PIMs and System Integration

While enterprises have applied UML models successfully to application development, UML has not yet significantly affected the problem of systems integration. While Platform Independent Models
offers excellent return on investment (ROI) for large enterprises, enterprises have not yet applied the lessons learned with PIMs and UML to the challenge of systems integration.

The Model in Information Integration

The success of the Platform Independent Model in application development leads to the use of PIMs for enterprise information sources. Enterprises use PIMs to ease their systems design and
application design; in the same way, enterprises can use model-driven integration to solve their information integration challenges, which lies at the heart of their systems integration.

To integrate their information sources, enterprises can produce PIMs of diverse information stores. However, different enterprise information systems have different methods of storing information.
A single metamodel, the abstraction that describes the structure of models, cannot accommodate all possible variations. With that, the meta-metamodel was born.

Differences in Metamodels

When constructing PIMs for software applications, enterprises needed only one metamodel, provided with the UML specification, to describe the models the enterprise created. However, because
information sources, including relational databases, hierarchical databases, object databases, files, streaming information, and many others, can have radically different structures, enterprises
will need more than one way to describe the models they need to create. These enterprises need more than one metamodel.

With that end in mind, the Object Management Group created the Meta Object Facility (MOF) standard, extending UML to apply it to modeling diverse information systems. The Meta Object Facility
standard describes diverse metamodels, essentially abstracting a form and structure to describe metamodels. Figure 2 describes the Meta Object Facility’s structure.

Figure 2: MOF Standard

The Object, Relational, File, and XML information sources have individual structures, described in the model (M1 in the figure). Each information source has a metamodel, which determines not only
the structure, but also the relationships between the entity types in a model (M2 in the figure). The meta-metamodel (M3 in the figure), then, is a metamodel that describes the contents of
metamodels, in this case the types of entities shared by information systems of all types. The MOF standard describes the methods of describing all information system types, and can extend to
include systems beyond the four described above.

Modeling Data with Metadata

When modeling an information system, the enterprise captures the essence of the information within its systems—including technical aspects of the data, which describe the structure of the
data, and business aspects of the data, which describe the way the enterprises uses the data. This captured essence is called metadata, data about the data. The Platform Independent Models include
both the technical and business metadata, but remain platform independent because their contents remain descriptive in nature.

The Platform Independent Models contain metadata that describe the data in the physical sources. For example, in the relational metamodel, this includes table and column names, data types, keys,
and foreign keys. This metadata is called physical metadata.

Figure 3: Physical Metadata Model

To simplify application development and to achieve information integration, enterprises need to describe, in a common way, the metadata in the disparate physical sources. For instance, system 1 has
a column named cust_id with the data type of string. System 2 has a column of equivalent information named cust_num with the data of integer. The metadata model that can represent
a column named CustomerNumber, which transforms and maps the relationship of cust_id and cust_num, would go a long way to solving the information
integration problem. This metadata, which describes the data as the enterprise, or its data-consuming applications, use it, is called virtual metadata. Figure 4 displays a portion of a Platform
Independent Model containing virtual metadata.

Figure 4: Virtual Metadata Model

The Tool for Modeling Metadata

A graphical tool for modeling diverse information sources through different metamodels should:

Use the MOF standard to integrate Platform Independent Models.

Support importing metadata from information systems and assembling structure from the information systems into a model.

Recognize multiple metadata models.

Support creation of relationships between models using the abstractions created within the MOF standard. For example, the tool can create relationships between columns in a spreadsheet and
attributes in an object-oriented database.

Have a repository that can recognize multiple metamodels. Within this repository, the enterprise can store its models.

However, modeling data as metadata only presents a partial solution to information integration by offering the conceptual integration. Ultimately, the enterprise needs a common way to access all of
its systems using the information contained in the Platform Independent Models.

True Model-Driven Integration

To achieve integration at the information level, the enterprise must take the abstract and make it real, much as it takes the UML diagram and actualizes it into an application, created and compiled
in a particular programming language. The enterprise needs to use the information within the metadata Platform Independent Models and create a specific, platform-based means of data access.

From PIM to PSM

Remember, the Platform Independent Models describe the information sources. Platform Specific Models (PSMs) must couple the design-time metadata constructions with platform-specific information
that contains actual parameters and connection information for the enterprise information systems described in the PIMs.

Platform Specific Models contain technical information from the PIMs coupled with actual connection details for a particular data source and its system. Deployed within an integration
platform, these PSMs provide the key for information integration through a Model-Driven Architecture.

The Tool for Integrating Information

The ideal information integration server requires several features:

A security model to limit or grant access, flexibly, to the enterprise information systems.

A standard interface language, such as SQL, to hasten application development. The enterprise does not need to train its developers in a new Application Programming Interface.

Extensible connector technology to enable the enterprise to connect to any and all possible enterprise information systems. The enterprise can develop its own connectors or use third-party
connectors.

Standard client interface options, such as XML and JDBC, to offer easy access for the enterprise’s information-consuming applications.

An enterprise models its existing systems into Platform Independent Models that describe the nature and structure of those existing systems. These PIMs contain the physical metadata models.

The enterprise models its information using PIMs that describe how that enterprise uses the data. These PIMs contain the virtual metadata models.

The enterprise creates the transformations and mappings that describe the methods by which it derives the data it uses from the data stored in its systems.

The enterprise transforms the PIM(s) into PSMs for use in runtime information access and integration. This transformation process adds platform-specific information that the MetaMatrix
Information Integration Server. An enterprise can transform a PIM into many PSMs, which means the enterprise only has to manage one PIM set but can convert/transform that metadata into PSM metadata
for different uses.

The enterprise distills the technical metadata needed for actual communication with the enterprise information systems and combines it with connection properties to create the Platform Specific
Models. Deployed on an integration platform, these PSMs show a direct route to access data in its physical sources based on the business needs modeled in the metadata.

The enterprise’s information-consuming applications request information from the integration server. This integration server uses these PSMs at runtime to access the data sources when the
enterprise’s information-consuming applications request it.

The Benefits

Model-Driven Architecture for information integration offers a number of benefits for the enterprise that impact the enterprise’s bottom line. These benefits include:

Adherence to standards, which assure the enterprise that its preferred solution works with other standards-compliant tools and technology.

Platform Independent Models (PIM) that capture the enterprise’s information assets and can present this information in an organized, graphic fashion.

Design-time models independent of runtime models enable the enterprise to modify and model more information without impeding data access through the integration server.

Extensibility that enables the enterprise to model, integrate, and access legacy, existing, and even future data storage technologies.

A Model-Driven Architecture Solution, by using a modeling and implementation process that many enterprises already use for application design, extends a familiar methodology in a new direction.
Enterprises can integrate their existing information systems easily without costly or confusing conversions and can access the information in those systems in real-time.

Michael A. Lang, Sr. - Michael A. Lang, Sr. is Co-Founder and Chairman of Revelytix, Inc., a semantic technology company that has developed an ontology-based collaboration framework for vocabulary development and community
knowledge management. Prior to founding Revelytix, Michael co-founded MetaMatrix, an enterprise information integration company that was sold to Red Hat in 2007. Earlier on, he was
President of NSSI, a CAD software company purchased by Network Imaging. Prior to NSSI he worked in the financial information industry for Bridge Information Systems and Reuters Information
Systems, where he ran strategy for financial information and analytic products. He is a noted consultant in the areas of data integration and semantic technologies. He as assisted in
various areas of product strategy for such companies as BEA Systems. He is currently Deputy CTO for Vitria, a BPM and exception management software company. He is a graduate of Washington
College, with a BS in Chemistry.