Abstract

Methods and apparatus, including computer program products, for generating a name for a business data component in an electronic business process use a received textual description of the business data component. One or more proposed names are generated in accordance with a predefined naming format. The proposed names are generated using a matching algorithm to select terms from a library of available terms based on the textual description. Each proposed name includes multiple terms, and each term in the library of available terms defines an object class, a property, a representation class, or a qualifier.

Description

BACKGROUND

[0001]

The present invention relates to data processing by digital computer, and more particularly to using a controlled vocabulary library to generate business data component names.

[0002]

Companies have conventionally exchanged electronic business information using Electronic Data Interchange (EDI). While EDI has allowed companies to communicate more efficiently than through the use of traditional paper-based communications, smaller companies face challenges to participate in electronic business (or electronic collaboration). These companies need to invest in complex and expensive computer systems to be installed at local computers, or to register with marketplaces at remote computers accessible through the Internet. In either case, the companies are bound by the particulars of the local or remote computer systems. Changes lead to further costs for software, hardware, user training, registration, and the like.

[0003]

More recently, the development of the Extensible Markup Language (XML) has offered an alternative way to define formats for exchanging business data. XML provides a syntax that can be used to enable more open and flexible applications for conducting electronic business transactions, but does not provide standardized semantics for messages used in business processes. Initiatives to define standardized frameworks for using XML to exchange electronic business data have produced specifications such as the Electronic Business Extensible Markup Language (UN/CEFACT/ebXML) Core Components Technical Specification (CCTS) and ISO 11179. The UN/CEFACT/ebXML CCTS generally provides a methodology for describing reusable building blocks (“core components”) for business transactions, creating new business vocabularies, and storing core component definitions in central registries. ISO 11179, which is incorporated in the UN/CEFACT/ebXML CCTS, provides a naming convention for standardizing the structure and semantics of core components.

SUMMARY OF THE INVENTION

[0004]

The present invention provides methods and apparatus, including computer program products, that implement techniques for generating business data component names.

[0005]

In one general aspect, the techniques feature receiving a textual description of a business data component and generating one or more proposed names for the business data component based on the textual description. Each proposed name is generated in accordance with a predefined naming format using a matching algorithm to select terms from a library of available terms. Each proposed name includes multiple terms, and each term in the library of available terms defines an object class (and possibly at least one additional object class qualifier), a property (and possibly at least one additional property qualifier), and/or a representation class.

[0006]

The invention can be implemented to include one or more of the following advantageous features. Each proposed name includes no more than one term corresponding to each of an object class, object class qualifier, a property, property qualifier, and/or a representation class. Context information for defining the business data component is received, and a predefined business process model is identified based on the context driver information, which is based on a context category and a context value. A request to add the business data component to the business process model is received, and the matching algorithm uses a context defined by the context information and/or the predefined business process model to select terms from the library of available terms. The proposed names include a business data component name included in a business process model for a different context. A topic map defines associations between a set of business process models that include the predefined business process model and the business process model for the different context. The business process model for the different context is identified based on a relationship with the predefined business process model as defined in the topic map. The business process model is modified to include a selected one of the proposed names.

[0007]

The textual description includes a description of an object class (and possibly at least one additional object class qualifier), a property (and possibly at least one additional property qualifier term, and/or a representation class. The library of available terms defines associations between terms and the proposed names for the business data component are generated based on the defined associations between terms. The proposed names include an object class term, a property term, and a representation class term. The proposed names can further include one or more qualifier terms associated with the object class term, the property term, and/or the representation class term. The library of available terms includes a topic map of terms included in predefined business data component names. The topic map defines associations between terms and predefined business data component names included in a set of business process models. A business process model is modified to include a selected proposed name for a component added to the business process model. The matching algorithm selects terms using the topic map to combine terms to generate each proposed name. In addition, the matching algorithm selects terms based on a constraint, a characteristic, one or more valid values, and/or a specified context for the business data component.

[0008]

The terms included in the name semantically describe the business data component. The terms are selected based on a correspondence between the description and a semantic meaning of the selected terms. A topic map defines the available terms and associations between the available terms. Each term in the topic map corresponds to a topic and each topic is associated with at least one other topic. Each topic corresponding to a term includes elements defining an occurrence of the term, another topic of which the term is an instance, and/or a scope associated with the term.

[0009]

The invention can be implemented to realize one or more of the following advantages. A controlled vocabulary library can be used to propose component names that include preferred terms, which can help maintain consistency in naming components. In other words, the controlled vocabulary library can help ensure that components with the same or highly similar semantic meanings consistently use the same terms. For example, the controlled vocabulary library can help ensure that similar components in different contexts (e.g., address components in the automobile and chemical industries) use consistent naming terminology. Proposed names can be automatically generated based on requirements that are semantically defined by a user using human readable (e.g., English, German, and the like) sentences, phrases, or other descriptions. The controlled vocabulary library can be used to identify synonyms of words used in the human readable description to help find preferred terms. The proposed names can be based on names for existing components and can include names that exist in other contexts or new names not previously defined that may be modeled after an existing name in the same or another context. The proposed names can also be based on relationships between terms that are defined in the controlled vocabulary library (e.g., using a topic map contained in the controlled vocabulary library in which each term is a topic and relationships are defined between topics). Proposed names can include terms that provide an easy to understand semantic meaning for the corresponding component. Proposed names can be generated so as to comply with the naming requirements of UN/CEFACT/ebXML CCTS, Web Ontology Language (OWL), and/or ISO 11179. The user can select from among multiple proposed names and is not necessarily restricted to the proposed name but can modify a selected name, if desired. New component names can be created for use in an LN/CEFACT/ebXML CCTS registry and/or in an intermediary structure that is used for mapping components between different electronic business processes. Existing components from which new component names are generated can be used to provide a model for the structure of the new component. Additional advantages include avery close relationship between the documentation of BIEs and the Dictionary Entry Names; reuse of component parts of sentences, which are already stored as associations, for the automatic completeness of documentation; categorization of topics, associations, and occurrences by the context driver mechanism to get a more precise result in Dictionary Entry Names; and searching of already defined terms through the usage of topic maps.

[0010]

Implementations of the invention can provide one or more of the above advantages.

[0011]

Details of one or more implementations of the invention are set forth in the accompanying drawings and in the description below. Further features, aspects, and advantages of the invention will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012]

FIG. 1 is a block diagram illustrating a process for adding a business component to a repository.

[0013]

FIG. 2 illustrates a process for defining a business context.

[0014]

FIG. 3 is an inset view of an aggregate business information entity (ABIE) in a Unified Modeling Language (UML) class diagram.

[0015]

FIG. 4 illustrates the use of a component definition user interface for defining a new component.

[0016]

FIG. 5 illustrates a UML class diagram of a topic map that can be used for the controlled vocabulary library.

[0017]

FIG. 6 illustrates a user interface window for selecting a proposed component name and adding the selected component name to an ABIE.

[0018]

FIG. 7 is a flow diagram of a process for generating business data component names.

[0019]

FIG. 8 is a block diagram illustrating an example data processing system in which a system for generating business data component names can be implemented.

[0020]

FIG. 9 is a block diagram illustrating an example of a topic map concept.

[0021]

Like reference numbers and designations in the various drawings indicate like elements.

DETAILED DESCRIPTION

[0022]

In general, electronic business communications can be conducted using electronic documents. An electronic document does not necessarily correspond to a file. A document may be stored in a portion of a file that holds other documents, in a single file dedicated to the document in question, or in multiple coordinated files. Electronic documents can be constructed using business information entities. A business information entity (BIE) is an element of business data or a collection of business data with a unique business semantic definition and can include a Basic Business Information Entity (BBIE), an Association Business Information Entity (ASBIE), or an Aggregate Business Information Entity (ABIE). A BBIE represents a characteristic (e.g., a street address) of a specific object class in a specific business context and corresponds to a data type that describes valid values for the BBIE. An ASBIE represents a complex characteristic of a specific object class in a specific business context and is used to associate BIEs with one another (e.g., to associate a person with an address). The ASBIE is based on an ABIE. An ABIE represents an object class and is a collection of related pieces of business information (e.g., an address that includes a street address, a city, a postal code, and a country) in a specific business context. In general, an ABIE includes one or more BBIEs and one or more ASBIEs. Core components provide more generic building blocks from which BIEs can be created. For example, an aggregate core component provides a structure for creating an ABIE in a specific business context.

[0023]

Each BIE, core component, business context, data type, or other component in an electronic business framework typically includes a unique name, which can include multiple concatenated terms that describe characteristics of the component. For example, ISO 11179 defines a naming convention in which each data element is described by a name that includes three primary terms: an object class term, a property term, and a representation class term. The object class term identifies a basic concept underlying a data element (e.g., address or party). Generally, the object class term describes an ABIE, which includes multiple properties and/or representations. The property term identifies a characteristic (e.g., street or company) of the object class. The representation class term categorizes the format (e.g., text or code) of the data element. In some business contexts, a particular element may have only one representation, in which case the name for the element does not need to include a representation class term. The object class term, property term, and representation class term can each have an associated qualifier that further refines the base term. For example, an object class term “address” can be refined by the qualifier “buyer” and a property term “company” can be refined by the qualifier “parent.”

[0024]

FIG. 1 is a block diagram that illustrates a process 100 for adding a business component to a repository. Initially, a business context in which a user wishes to view, modify, or add one or more business information entities is defined (105). A user can select from predefined sets of context categories and context values, displayed on a context definition user interface 110, according to the requirements of a business component to be added. For example, the context can be defined using context drivers defined in UN/CEFACT/ebXML CCTS. The user can specify a particular business process classification, product classification, industry classification, geopolitical context, legal or contractual constraints, business process role, supporting role, and/or system capabilities.

[0025]

The defined business context is used to identify one or more business process models 120 from a components library repository 115. The components library repository 115 stores definitions of components that model business contexts, business messages, business objects, data types, BIEs, core components, associations between business objects, and the like. Thus, some components can represent a singular business characteristic (e.g., a BBIE or a data type) while other components can represent an aggregation of other components (e.g., an ABIE or a business message, which can include multiple ABIEs, ASBIEs, and a structure within which they are used). Each component can be defined by a particular structure and can include various elements, such as context categories, dictionary entry names (i.e., unique names for each component), properties, BIEs, elements, annotations, unique identifiers, data types, and associations between elements. The components library repository can include UN/CEFACT/ebXML CCTS registries, repositories of components for standardized business process frameworks, and/or repositories of components for proprietary business process frameworks.

[0026]

Business process models 120 are generally defined using XML metadata but can be translated using XML Metadata Interchange (XMI) and presented to a user in the form of a Unified Modeling Language (UML) class diagram. If more than one business process model 120 is identified from the components library repository, a user can select a particular business process model 120. In many cases, the defined business context can allow a single business process model 120 to be automatically selected. A user can select an option 125 to add an element or component for satisfying additional requirements using a user interface that displays a UML class diagram for the selected business process model 120. In the illustrated example, the user selects an option to add an element to a party details object class 130. The element or component to be added can be, for example, an ABIE, a BBIE, or an ASBIE. The added element or component will be represented only in a specific context, which is defined by the context categories and their context values.

[0027]

A semantic description for the describing the business requirements of the element to be added is received (135) from the user through a user interface 140. The semantic description of the business requirement can be in the form of a natural language sentence (i.e., a sentence that at least nominally complies with the rules of grammar for a particular language (e.g., English) or can be in the form of text that, although not using proper grammar, provides a semantic description of the element, such as a proposed name for the element in which the terms included in the name are selected from a natural language, such as English or German. A matching algorithm 142 uses the semantic description to identify terms contained in a controlled vocabulary library 145 and to assemble the terms to generate (150) one or more proposed UN/CEFACT/ebXML CCTS based dictionary entry names 155.

[0028]

The terms in the controlled vocabulary library 145 are categorized according to type, such as object class terms, property terms, representation class terms, and qualifiers. Some terms in the controlled vocabulary library 145 can have more than one type. For example, the term “party” can in some situations be used as an object class term and in other situations be used as a property term. In addition, the terms in the controlled vocabulary library 145 include associations with other terms. For example, the controlled vocabulary library 145 associates terms that can be used together to form a dictionary entry name. The associations of terms can be based on terms that have been used together to form a name for a previously defined component in another business context (i.e., a component that exists in the components library repository 115). The associations of terms can also be based on predefined links between terms that have some commonality of subject matter, more general object classes, and the like. For example, an object class term for a particular object class might be linked to a property term used in another object class because both object classes are instances of related higher level object classes.

[0029]

The terms in the controlled vocabulary library 145 can be represented as topics in a topic map architecture. Each term corresponds to a topic and the topic map defines relationships between terms. A topic map can be stored in XML format and represented using UML class diagrams. Topic maps make it possible for a machine to navigate among terms and their occurrences in the components library repository 115. The topic map for the controlled vocabulary library 145 can include additional information about terms, such as synonyms, definitions, and how terms relate to various business contexts. Each topic can be an instance of a topic type. Each topic corresponds to a term type in the ISO 11179 standard. Topics within a topic map can also play different roles in different associations and can include references to external sources, such as web pages, that provide additional information about a topic. Incorporating the controlled vocabulary library 145 into a topic map allows matching algorithms to identify terms that are most likely to correspond to a meaning of the semantic description.

[0030]

Topic maps can be implemented according to ISO/IEC 13250:2000, which provides a standardized notation for representing the structure of information resources used to define topics and relationships between topics. Each topic in a topic map that represents the available terms can specify a term type (e.g., object class, property, representation class, or qualifier) of which the term is an instance, identify the subject of the term or topic, specify occurrences of the term or topic (i.e., in the components library repository 115), reference other topics or terms that are combined in an existing dictionary entry name, and define the scope and context of the term or topic. The topic map includes associations between topics or terms. Associations can include elements that specify an association type, member topics or terms in the association, and a role played by each topic or term in the association.

[0031]

Once a proposed dictionary entry name 155 is generated (150), the user can revise (160) the dictionary entry name as necessary. A tag name can be generated (165), and a business data component 175 corresponding to the dictionary entry name 155 can be constructed (170). In some cases, the structure of the business data component can be constructed in at least a partially automated manner by using the structure of similarly named components in other contexts.

[0032]

FIG. 2 is a more detailed illustration of a process 200 for defining a business context by the context categories and their context values. A user selects from available options for one or more context drivers 205 using drop down menus 210 in a context definition user interface 215. The user selects options based on the specific requirements for the business data component or components to be viewed, modified, or added. Once the user submits the selected options, a repository of business data components 225 is queried (220) to identify one or more models that correspond to the selected context options. The repository of business data components includes, for example, components that can be combined to form aggregate components and aggregate components that can be combined for use in business processes.

[0033]

FIG. 3 is an inset view of an ABIE 300 in a UML class diagram 305. The ABIE 300 is identified from a repository of business data components based on submitted context information. The ABIE 300 includes multiple BBIEs 310, some of which maybe applicable only in specific contexts. For example, as indicated in chart 315, the BBIE “End Date” is limited by the context categories and their context values to only certain business processes, system constraints, and official constraints. In addition to buttons 320 that allow a user to change or delete the ABIE 300 or one or more BBIEs 310, an add button 325 allows a user to add a new BBIE 310 to the ABIE 300. All these features can be performed in the defined context. When a user opts to add a new component, the user is presented with another user interface for describing the new component.

[0034]

FIG. 4 illustrates the use of a component definition user interface 400 for defining a new component. In the illustrated implementation, a user can select an option 405 to add either a BBIE or an ASBIE. In some implementations, the user may be able to select an option to add other components, such as an ABIE or a data type. As an alternative to the illustrated implementation, the user interface 400 can be an HTML editor, and the user can be presented with a template based on XHTML. The user describes the component to be added in a component description text entry field 410. The component description is typically in the form of one or more human readable sentences that semantically describe the component to be added. The component description should include a description of at least an object class and a property for the component to be added. In some cases, the object class can be assumed based on, for example, the ABIE to which a new BBIE is being added. The component description can also include a description of a representation class and one or more qualifiers for the component to be added. The description need not include the exact terms that will be used in the subsequently generated dictionary entry name. Instead, as further discussed below, a controlled vocabulary library 440 and possibly other available libraries, such as code lists, qualifier lists, electronic word dictionaries, and/or synonym libraries, can be used to identify preferred terms that have the same or a similar semantic meaning as the description.

[0035]

The user can also add a comment in a comment text entry field 415. For example, the user can add a comment that explains how the component will be used or what other elements are relevant to the added component. The user can also define constraints on the component to be added in a constraint entry field 420. The constraints describe on which business circumstances or relationships the component can be used and/or not used. For example, the value of this component may be valid only if some other components satisfy particular requirements (e.g., a maximum value.)

[0036]

The user can define other characteristics of the component to be added in a characteristics definition box 425. The characteristics can include a data type, cardinality, length, included values, excluded values, and/or a pattern for the component. A code/identifier box 430 allows the user to define lists of valid code values or identifier values in cases where the component to be added is associated with a code type or an identifier type (i.e., as defined using a type drop-down menu in the characteristics definition box 425).

[0037]

Once the user defines the component to be added through the component definition user interface 400, the user submits the component definition by selecting a submit button 435. The textual description of the component to be added from the component description text entry field 410, along with values and/or other data from the component definition user interface 400, along with values and/or other data from the component definition user interface 400, is compared with data from entries in the controlled vocabulary library 440 to identify possible terms for constructing one or more proposed component names. The comparison between the various fields can be weighted differently. Thus, the definition field can have a higher weight and will have a higher probability during the matching procedure. The other terms are more or less weighted and have more or less of a probability during the matching procedure. The entries in the controlled vocabulary library 440 can include words or phrases that can be used to semantically describe a concept. Each entry can be associated with one or more terms in the controlled vocabulary library 440.

[0038]

The controlled vocabulary library 440 can organize data using different tables for different types of terms. A property term table 445 includes a list of property terms, and each listed property term can include associated data, such as phrases that might be used to semantically describe the same concept as the property term, links to existing dictionary entry names in which the property term appears, one or more data types associated with the property term, contexts in which the property term can be used, and links to terms in other tables with which the property term can be used. An object class term table 450 includes a list of object class terms, and each listed object class term can include associated data, such as phrases that might be used to semantically describe the same concept as the object class term, links to existing dictionary entry names in which the object class term appears, instances of object classes corresponding to the object class term, valid contexts, and links to terms in other tables with which the object class term can be used.

[0039]

A qualifier term table 455 includes a list of qualifier terms (e.g., adjectives), and each listed qualifier term can include associated data, such as words that might be used to semantically describe the same concept as the qualifier term, links to existing dictionary entry names in which the qualifier term appears, one or more other term types with which the qualifier term can be used, and links to terms in other tables with which the qualifier term can be used. A representation class term table 460 includes a list of representation class terms, and each listed representation class term can include associated data, such as phrases that might be used to semantically describe the same concept as the representation class term, links to existing dictionary entry names in which the representation class term appears, a data type associated with the representation class term, possible code values, identifier values, or other constraints that can be used with the representation class term, and links to terms in other tables with which the representation class term can be used.

[0040]

The one or more sentences from the textual description of the component to be added can be separated into sentence fragments manually (e.g., through a user interface) or automatically (e.g., by searching for matching phrases from the controlled vocabulary library 440 and/or using a rule set that defines how to separate sentences into subject, object, and predicate parts). The sentence fragments can be compared with entries in the controlled vocabulary library 440 to identify possible terms for use in proposing component names. In addition, a synonyms library 465 can be used to identify terms in the controlled vocabulary library that are synonymous or have similar meanings as words in the textual description. The synonyms library 465 can also be incorporated into the controlled vocabulary library 440 (e.g., by including synonym data corresponding to each listed term in the tables 445, 450, 455, and 460). The use of synonym data makes it possible to identify preferred terms for use in component names even when the user uses alternative phraseology.

[0041]

To generate proposed component names, other information can also be used. A code list and identifier scheme library 470 can be used to identify code types and identifier types based on information provided through the user interface 440 (e.g., data provided in the code/identifier box 430). This information can be further used to identify terms that are appropriate for generating proposed component names. Alternatively, the code list and identifier scheme library 470 can be used to identify possible code values or identifier values that correspond to the component to be added. The code list and identifier scheme library 470 can also be incorporated into the controlled vocabulary library 440. Information from one or more repositories of business data components 475 can be used to search for existing component names in the same or other contexts and to determine how terms are used in preexisting components and how those preexisting components relate to other components. This information can be used in generating proposed component names that are identical to existing component names in other contexts and/or that are modeled after existing component names.

[0042]

The controlled vocabulary library 440 can be organized according to a topic map in which each term listed in the controlled vocabulary library 440 represents a topic. Topic Maps (TM) are an ISO standard (ISO/IEC 13250:2000) that provides a standardized notation for representing information about the structure of information resources used to define topics and the relationships between topics. A set of one or more interrelated documents that employs the notation and grammar defined by the ISO/IEC 13250 International Standard is called a “topic map.” In general, the structural information conveyed by topic maps includes groupings of addressable information objects around topics (occurrences) and relationships between topics (associations).

[0043]

Therefore, topic maps describe knowledge structures and associations with information resources. A topic map is a map of the knowledge that can be found in a document base, such as a library of BIEs and core components. It shows the relevant concepts and the relationships between them in a way similar to that of a thesaurus or an index. It also gives the definition of concepts like a glossary. It arranges the concepts in an ontology and a taxonomy. Topic maps make the structures machine processable and possible to navigate. Topic maps also provide advanced techniques for linking and addressing the knowledge structure and the document base.

[0044]

Knowledge about dictionary entry names can be expressed in the form of a topic map. This topic map may consist of as many topics as necessary to describe the terms. The number of topics determine the size and complexity of the topic map.

[0045]

Topics within a topic map can be in a relationship (association) with each other. In addition, topics can play different roles in different associations. Therefore, it is possible to build associations between the relevant terms of a dictionary entry name. Topics can also contain any number of external references, such as web pages, which elaborate on a specific topic to provide further information about the topic.

[0046]

Topics have three kinds of characteristics: topics, occurrences, and associations. The characteristics can be effectively used for defining a model and architecture for navigating, linking, searching, and investigating terms of dictionary entry names. All three characteristics of the topic map can be used in specific contexts as defined by the context values and context categories. This model and architecture can be used for automatic searching of appropriate terms after analyzing definitions of a BIE to be added and automatic generation of complete dictionary entry names after finding the appropriate terms. Thus, topics represent the terms of a dictionary entry name. To identify the relevant terms in an entered definition, the components of the sentences and the corresponding context are considered. The definition contains fields that form a set of potential candidates for topic types. Moreover, by looking at the context, basic associations between topic types can be identified. For example, in the context of the industry classification: “Aviation”, the associations “destination city of a flight connection” or “arrival of a flight connection” can be identified.

[0047]

An occurrence is a link to one or more real information objects for the terms, like a report, a comment, a video, or a picture. Generally, an occurrence is not part of a topic map.

[0048]

Topic associations describe the relationships between terms. FIG. 9 is a block diagram illustrating an example of a topic map concept. Knowledge about the terms 905 and the relationships 910 between the terms 905 is expressed in a knowledge layer 915. Each term 905 is linked to one or more occurrences 920 in an information layer 925.Generally, topic associations are not one-way relationships. They are symmetric as well as transitive and thus, they have no direction. Association types can be used to group term associations and the involved terms.

[0049]

The terms, component parts of a sentence, and context values can be organized in columns of the tables 445, 450, 455, and 460. For example, the property term table 445 can include a property term that represents a topic within a topic map, a component part that represents an association that can be used to construct a dictionary entry name into the right order and a context category and context values can be represented by a scope element of the topic map. Associations between the definitions and dictionary entry names can be realized by the topic maps mechanism. The associations, terms, and scope, which can be defined in the correct order by the topic map mechanism helps generate a dictionary entry name in the correct manner. Each term in the tables is an instance of a topic type that defines a term type (e.g., object class term, property term, representation class term, or qualifier). Terms that can have different term types in different component names (e.g., the term “party” can be used as an object class term or as a property term) can be represented by different topics corresponding to each term type. In addition, different instances of a term with the same term type can be represented by different topics corresponding to each instance. The topic map also includes data identifying occurrences of each term, associations of the term with other terms, and scope information for each term instance.

[0050]

The topic map of the controlled vocabulary library 440 can be described using XML and can be represented using UML class diagrams. FIG. 5 is a UML class diagram 500 of a topic map that can be used for the controlled vocabulary library 440. Each topic is represented by a topic identifier 510 (e.g., a numerical identifier) that includes (or refers to) a number of elements. The elements can include a “base name” element 515 (i.e., the term that corresponds to the topic), zero or more “occurrence” elements 520 (i.e., information resources that are relevant to the topic), zero or more “instance of” elements 525 that specify a category (e.g., object, property, representation, etc.) of which the term is an instance, zero or more “subject identity” elements 530 that refer to subject indicators 535 and/or resources 540 (e.g., for use in identifying synonyms), and zero or more “scope” elements 545 (e.g., for defining context categories and context values in which the term can be used). Each “occurrence” element 520 can also have a scope as defined by one or more scope elements 545. A topic reference element 550 provides a URI reference to another topic, which will be another term value of the dictionary entry name. The target of a topic reference link must resolve to a topic element child of a topic map document. The target topic need not be in the document entity of origin. A topic reference element 550 will be used for the completion of dictionary entry names or for referencing to other topics, which will be necessary for the complete understanding of a term value. The topic reference element 550 could also reference to other information in other XTM-based documents.

[0051]

Terms can be classified according to their term-types of the dictionary entry name. In a topic map, any given term is an instance of zero or more term-types. Term-types are themselves defined as topics. A term type would be “ObjectClassQualifier”, “ObjectClassTerm”, “PropertyQualifier”, “PropertyTerm”, “RepresentationTerm”, “AssociationTerm”, “DataTypeQualifier”, and “DataType”.

[0052]

Each topic can also include one or more “association” elements 555, which define an association with one or more other topics. The topic map uses associations to describe relationships between the terms of a dictionary entry name. A topic association asserts a relationship between two or more topics. Examples might be as follows:

“This name is the departure city of a flight connection”

“This code specifies the departure country of a flight connection”

“This is the local date and time of the arrival of a flight connection”

“This is the duration of a flight of a flight connection”

“This is the duration of a duration in date of a flight connection”
The association type for the relationships mentioned above are “this”, “this_is”, “is_the” “of_a” etc. In topic maps, association types are themselves defined in terms of topics.

[0058]

The ability to do typing of topic associations makes it possible to group together the set of terms of a dictionary entry name that have the same relationship to any given topic. This feature is useful for navigating large pools of information in generating dictionary entry names.

[0059]

It should be noted that topic types are regarded as a special (i.e., syntactically privileged) kind of association type; the semantics of a topic having a type (for example, the Airport of a Flight Connection) could equally well be expressed through an association (of type “type-instance”) between the topic of the object class term “Flight Connection” and the topic of the property term “Airport”. The reason for having a special construct for this kind of association is the same as the reason for having special constructs for certain kinds of names (indeed, for having a special construct for names at all): The semantics are so general and universal that it is useful to standardize them to maximize interoperability between systems that use the dictionary entry names.

[0060]

While both topic associations and normal cross references are hyperlinks, they are different: In a cross reference, the anchors (or end points) of the hyperlink occur within the information resources (although the link itself might be outside them); with topic associations, links (between topics) are completely independent of whatever information resources may or may not exist or be considered as occurrences of those topics.

[0061]

Associations between terms (topics) are created as instances of the association element. The element has only the sub-element “member” 560, which specifies instances of the members. The member element 560 is used to define each member role of the association and the terms (topics) which play that role. Each topic that participates in an association plays a role in that association, which can be expressed by the term types of a dictionary entry name. In the case of the relationship “Departure City of a Flight Connection”, expressed by the association between “Departure City” and “Flight Connection”, those roles might be “PropertyTerm” and “ObjectClassTerm”. Associations are multidirectional.

[0062]

Different types of associations are possible. For example, a term having a property type can be associated with one or more terms having an object class type. The association can be based on object class terms with which the property term is used or can be used in a component name. Similarly, a term having a qualifier type can be associated with one or more other terms having one or more term types.

[0063]

The topic map model allows three things to be said about any particular topic: what names (terms) it has, what associations it participates in, and what its occurrences of information are. These three kinds of assertions are known collectively as topic characteristics. Assignments of topic characteristics are generally made within a specific context based on the context values and their context categories, which may or may not be explicit. For example the term “Flight Connection” is expect in the context value “Aviation” within the context category “Industry Classification”.

[0064]

The scope element 545 specifies the extent of validity for a topic characteristic. A topic characteristic is the context value from a context category, in which each term value (base name), occurrence, or association will be used. The scope element 545 includes one or more of a topic reference element 550, a subject indicator 535, and/or a resource 540. Each topic reference element 550 references a topic element 510 (“scoping topic”) whose subject contributes to the scope. Two topic reference elements 550 can be used for the representation the context category and context value. Each resource element 540 references a resource that contributes to the scope. It is possible to define the context values and context categories by an URI. Each subject indicator element 535 references a resource that indicates the identity of the subject that contributes to the scope. A declaration of a topic characteristic is generally valid only within a scope, if specified. When a topic characteristic declaration does not specify a scope, however, the topic characteristic is valid in an unconstrained scope.

[0065]

As an alternative or in addition to implementing separate libraries 440, 465, 470, and 475, the information from the various libraries 440, 465, 470, and 475 can be incorporated into the topic map. For example, the topic map can link each term in the controlled vocabulary library 440 to phrases that might be used to semantically describe the same concept as the term, to existing dictionary entry names and components in which the term appears, to one or more data types for the term, to other terms with which the term can be used, to synonyms for the term, and to code values or identifier values with which the term may be used. The topic map can also include information defining associations between business process models in a repository of business data components 475. The associations between business process models can be explicitly defined or can be derived from associations between topics and/or names.

[0066]

When the user submits the component definition through the user interface 400, a matching algorithm conducts a search for terms that can be used to generate one or more proposed component names. The matching algorithm searches (480) the various libraries 440, 465, 470, and 475 for terms that can be combined into a component name having the same or a closely related semantic meaning as the component description and having any constraints, characteristics, and other limitations provided in the component definition. For example, the matching algorithm can search a topic map (e.g., a topic map based on the class diagram 500 shown in FIG. 5) that incorporates the information from the various libraries 440, 465, 470, and 475. The matching algorithm can include one or more of a tetragram analysis, an alpha-beta-pruning strategy, a Levinstein editing measure distance, fuzzy matching, matching tools within W3C Semantic Web, and Text Retrieval and Information Extraction (TREX, Linguistic Matcher, Type Matcher, Structural Matcher, and Match Learning Machines). Other algorithms capable of searching topic maps can also be used. TREX is included in a number of software products available from SAP AG of Walldorf (Baden), Germany, such as SAP Netweaver Knowledge Management. TREX provides a wide spectrum of intelligent search, retrieval, and classification functions. Among other things, TREX incorporates a Levinstein editing measure distance, fuzzy matching, and a topic maps search algorithm.

[0067]

The matching algorithm can perform the search to identify at least an object class term and a property term and, in some cases, a representation class term and/or one or more qualifier terms for each proposed component name to be generated. In addition to using a textual description of the component to be added, the matching algorithm can also use context information, characteristics, constraints, valid values, and/or other limitations defined by the user to identify appropriate terms. The search may be conducted for similar or identical components in other related contexts (e.g., using information defining associations between business process models). For example, the matching algorithm may use the defined context for the component to be added to identify similar or identical components in similar contexts (e.g., using the scope and occurrences of terms as defined in scope elements 545 and occurrence elements 520 shown in FIG. 5).

[0068]

In addition, the search may be conducted for terms that are defined in the controlled vocabulary library 440 as corresponding to a fragment of the textual description and/or one or more of the defined limitations. For example, the topic map may define particular terms as referring to a particular semantic meaning and also as implying particular limitations. Typically, terms are defined in the topic maps based, at least in part, on semantic meanings and limitations associated with existing component names. In other words, definitions of terms and combinations of terms are derived from instances of the terms. A particular implementation of a matching algorithm therefore can be designed to identify terms and combinations of terms that most nearly correspond to the component definition provided by the user. The matching algorithm can also use information about associations between terms to identify appropriate combinations of terms to form the proposed component names. For example, the topic map may include associations between a particular property term and multiple object class terms. These associations define object class terms with which the particular property term can be used. The matching algorithm processes the results of the search to generate one or more proposed component names.

[0069]

FIG. 6 illustrates a user interface window 600 for selecting a proposed component name and adding the selected component name to an ABIE 605. The proposed component names 610 can include existing component names 610(1) (e.g., “Account.Valid_From Date.Date”, where “Account” is the object class term, “Valid” is a property qualifier, “From Date” is the property term, and “Date” is the representation class term) from a different context and/or new component names 610(2) and 610(3) (e.g., “Account.Valid_Start Date.Date” or “Account.Validity_From Date.Date”). The new component names 610(2) and 610(3) can be constructed from terms in the controlled vocabulary library 440 that have not previously been combined to form a component name or, in some situations, can include a term or terms not previously included in the controlled vocabulary library 440. The new component names are constructed in accordance with the ISO 11179 framework, Web Ontology Language (OWL), RDF (Resource Description Framework), and/or UN/CEFACT/ebXML CCTS requirements.

[0070]

For existing component names, a button 615 can be selected to display a semantic description of the component and/or other attributes, characteristics, context definitions (e.g., context categories and context drivers), or other definitions of the existing component. The user can also modify a proposed component name (e.g., to add a qualifier or to change a term) and can select a proposed name 610(1) to be added to the ABIE 605 using a user interface selection element 620. The user can then select an accept button 625 to accept the selected component name 610(1). As a result, a new dictionary entry name 630 for the new component is generated and added (635) to the ABIE 605.

[0071]

The structure of the new component can be modeled after an existing component from which the new component name is copied or can be modeled after existing components that include terms from which the new component name is constructed. The existing components can be used in generating XML schema, JAVA classes, ABAP Objects, database tables, XML schema structure, and/or a user interface structure for the new component. The new component can also be added to the repository of a repository of business data components (see FIGS. 2 and 4) for use in business processes and generating additional new component names. The structure of the new component and the addition to the repository of business data components can be performed automatically or semi-automatically (e.g., by providing the user with access to relevant parts of existing components). The new component generally has a limited scope in that it can be used only in the defined context (e.g., as defined in context definition user interface 215 of FIG. 2). In this case, for example, the new component is limited to use in a particular combination of context categories: business process, process role, industry classification, system constraints, geopolitical, official constraints, and owner limitations (as indicated in the context chart 640). The new component can subsequently be added to other contexts as well, which results in a removal of some of the context limitations.

[0072]

FIG. 7 is a flow diagram of a process 700 for generating business data component names. Context information for defining a business data component is received (705). The context information can be received at a processor from a user interface. A predefined business process model is identified based on the context information (710). The processor can use a search algorithm to identify a business process model that matches the context information. A request to add the business data component to the business process model is received (715) by the processor through a user interaction with a user interface. The user also provides a textual description of the business data component, which is received (720) by the processor (See fields 410-420 in FIG. 4). One or more proposed names for the business data component are generated (725). The proposed names are generated in accordance with a predefined naming format, in which a name generally includes an object class term, a property term, and a representation class term. In some cases, the name can include a qualifier term for one or more of the other terms. The processor uses a matching algorithm to select terms from a library of available terms based on the textual description. In addition, the matching algorithm can use the context information, information from the business process model, and/or information from one or more other business process models to generate the proposed names.

[0073]

The invention and all of the finctional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structural means disclosed in this specification and structural equivalents thereof, or in combinations of them. The invention can be implemented as one or more computer program products, i.e., one or more computer programs tangibly embodied in an information carrier, e.g., in a machine readable storage device or in a propagated signal, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program (also known as a program, software, software application, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file. A program can be stored in a portion of a file that holds other programs or data, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

[0074]

The processes and logic flows described in this specification, including the method steps of the invention, can be performed by one or more programmable processors executing one or more computer programs to perform functions of the invention by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus of the invention can be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).

[0075]

Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, the processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. Information carriers suitable for embodying computer program instructions and data include all forms of non volatile memory, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

[0076]

To provide for interaction with a user, the invention can be implemented on a computer having a display device, e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor, for displaying information to the user and a keyboard and a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well; for example, feedback provided to the user can be any form of sensory feedback, e.g., visual feedback, auditory feedback, or tactile feedback; and input from the user can be received in any form, including acoustic, speech, or tactile input.

[0077]

The invention can be implemented in a computing system that includes a back-end component, e.g., as a data server, or that includes a middleware component, e.g., an application server, or that includes a front-end component, e.g., a client computer having a graphical user interface or a Web browser through which a user can interact with an implementation of the invention, or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication, e.g., a communication network. Examples of communication networks include a local area network (“LAN”) and a wide area network (“WAN”), e.g., the Internet.

[0078]

The computing system can include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

[0079]

FIG. 8 is a block diagram illustrating an example data processing system 800 in which a system for generating business data component names can be implemented. The data processing system 800 includes a central processor 810, which executes programs, performs data manipulations, and controls tasks in the system 800. The central processor 810 is coupled with a bus 815 that can include multiple busses, which may be parallel and/or serial busses.

[0080]

The data processing system 800 includes a memory 820, which can be volatile and/or non-volatile memory, and is coupled with the communications bus 815. The system 800 can also include one or more cache memories. The data processing system 800 can include a storage device 830 for accessing a storage medium 835, which may be removable, read-only, or read/write media and may be magnetic-based, optical-based, semiconductor-based media, or a combination of these. The data processing system 800 can also include one or more peripheral devices 840(1)-840(n) (collectively, devices 840), and one or more controllers and/or adapters for providing interface finctions.

[0081]

The system 800 can further include a communication interface 850, which allows software and data to be transferred, in the form of signals 854 over a channel 852, between the system 800 and external devices, networks, or information sources. The signals 854 can embody instructions for causing the system 800 to perform operations. The system 800 represents a programmable machine, and can include various devices such as embedded controllers, Programmable Logic Devices (PLDs), Application Specific Integrated Circuits (ASICs), and the like. Machine instructions (also known as programs, software, software applications or code) can be stored in the machine 800 and/or delivered to the machine 800 over a communication interface. These instructions, when executed, enable the machine 800 to perform the features and finction described above. These instructions represent controllers of the machine 800 and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. Such languages can be compiled and/or interpreted languages.

[0082]

The invention has been described in terms of particular embodiments, but other embodiments can be implemented and are within the scope of the following claims. For example, the invention can also be used for semi-automatic mapping between different business communication schemas. If a business entity of a schema cannot be mapped to already stored BIEs, the semi-automatic mapping system can use the techniques of this invention for generating a new BIE by using the definition of the business entity. Other embodiments are within the scope of the following claims.

Claims (25)

1. A computer program product, tangibly embodied in an information carrier, the computer program product being operable to cause data processing apparatus to:

receive a textual description of a business data component; and

generate, in accordance with a predefined naming format, at least one proposed name for the business data component using a matching algorithm to select terms from a library of available terms based on the textual description, each proposed name including a plurality of terms and each term in the library of available terms defining at least one of an object class, a property, a representation class, or a qualifier.

2. The computer program product of claim 1 wherein the computer program product is operable to further cause data processing apparatus to:

receive context information for defining the business data component;

identify a predefined business data model based on the context information; and

receive a request to add the business data component to the business data model, wherein the matching algorithm uses a context defined by at least one of the context information or the predefined business data model to select terms from the library of available terms.

3. The computer program product of claim 2 wherein the at least one proposed name includes a business data component name included in a business data model for a different context.

4. The computer program product of claim 3 wherein a topic map defines associations between a plurality of business data models including the predefined business data model and the business data model for the different context, the computer program product being operable to cause data processing apparatus to identify the business data model for the different context based on a relationship with the predefined business data model defined in the topic map.

5. The computer program product of claim 2 wherein the computer program product is operable to further cause data processing apparatus to modify the business data model to include a selected one of the at least one proposed name.

6. The computer program product of claim I wherein the textual description includes a description of at least two elements selected from the group consisting of an object class, a property, a representation class, and a qualifier.

7. The computer program product of claim 6 wherein the library of available terms defines associations between the available terms and the at least one proposed name for the business data component is generated based on the defined associations between terms.

8. The computer program product of claim 1 wherein at least one proposed name for the business data component includes an object class term, a property term, and a representation class term.

9. The computer program product of claim 8 wherein at least one proposed name for the business data component includes a qualifier term associated with at least one of the object class term, the property term, or the representation class term.

10. The computer program product of claim 1 wherein the library of available terms comprises a topic map of terms included in predefined business data component names, the topic map defining associations between each term and one or more predefined business data component names included in a set of business data models.

11. The computer program of claim 10 wherein the topic map defines associations based on component parts of sentences.

12. The computer program product of claim 10 wherein the computer program product is operable to further cause data processing apparatus to modify at least one business data model in the set of business data models to include a selected one of the at least one proposed name in a specific context.

13. The computer program product of claim 10 wherein the matching algorithm selects terms using the topic map to combine terms to generate each proposed name.

14. The computer program product of claim 13 wherein the matching algorithm selects terms based on at least one limitation for the business data component selected from the group consisting of a constraint, a characteristic, one or more valid values, and a specified context.

15. A system for generating business component names, the system comprising:

means for receiving a description of a business data component;

means for defining available terms and associations between the available terms; and

means for generating, based on the description and using terms from the available terms, at least one proposed name for the business data component in accordance with a predefined naming format, the predefined naming format defining a name as including a plurality of terms for semantically describing a business data component, wherein the plurality of terms include at least two terms from the group consisting of an object class term, a property term, a representation class term, a qualifier term, a context category, and a context value.

16. The system of claim 15 wherein the means for generating at least one proposed name is operable to select, for each proposed name, a plurality of terms from the available terms based on a correspondence between the description and a semantic meaning of the selected plurality of terms and a relationship between a context of the at least one proposed name and a context of each of the selected plurality of terms.

17. The system of claim 15 wherein the means for defining available terms and associations between the terms comprises a topic map with each term corresponding to a topic and each topic associated with at least one other topic and with a component part of a sentence.

18. The system of claim 17 wherein each topic corresponding to a term includes a plurality of elements defining at least one of an occurrence of the term, a topic of which the term is an instance, or a scope associated with the term.

19. A method for defining a business data component name, the method comprising:

receiving a description of a business data component;

generating a name for the business data component, the name including a plurality of terms semantically describing at least two of an object class, a property, and a representation class for the business data component, wherein the plurality of terms are selected from a library of available terms, the library of available terms defining associations between the available terms and predefined business data components and the name being generated based on the associations and on a correspondence between the description and at least one predefined business data component.

20. The method of claim 19 further comprising receiving a context definition for the business data component, wherein generating the name for the business data component is further based on the context definition and a context associated with each of the at least one predefined business data component.

21. The method of claim 19 wherein the library of available terms comprises a topic map defining associations between the available terms and generating the name for the business data component is further based on the associations between the available terms.

22. The method of claim 19 wherein generating a name for the business data component comprises using at least one of a synonym library, a code list, or a qualifier list.

23. The method of claim 19 wherein the associations between the available terms define dictionary entry names in different contexts, the name including a dictionary entry name in a different context.

24. The method of claim 19 wherein the textual description includes a description of at least two elements selected from the group consisting of an object class, a property, a representation class, and a qualifier.

25. The method of claim 19 wherein generating a name comprises selecting terms based on at least one limitation for the business data component selected from the group consisting of a constraint, a characteristic, one or more valid values, and a specified context.

System and method for semantic normalization of source for metadata integration with etl processing layer of complex data across multiple data sources particularly for clinical research and applicable to other domains

System, method and computer program product for performing unstructured information management and automatic text analysis, including an annotation inverted file system facilitating indexing and searching