View online

More options

Rating

Abstract

IBM® Content Analytics with Enterprise Search (ICAwES) provides an enterprise search capability that enables enterprise-wide search across multiple content repositories and different repository types. With ICAwES enterprise search solutions, you can integrate fields from multiple content repositories to create a single, integrated user search experience. In addition, the enterprise search solutions can use fields and facets in various ways to create diverse views of your search result set, thus helping you identify the hidden meaning of your unstructured content. This IBM Redbooks® Solution Guide explains, from a high level, how to build enterprise search solutions using ICAwES.

Related Publications

Contents

IBM® Content Analytics with Enterprise Search (ICAwES) addresses two categories of use cases: content analytics and enterprise search. Content analytics focuses on the analysis of a set of content to find patterns, trends, and anomalies in that content. Enterprise search focuses on the discovery and retrieval of documents by using various query and visual navigation techniques. The ICAwES enterprise search solutions can integrate fields from multiple content repositories to create a single, integrated user search experience. In addition, the enterprise search solutions can use fields and facets in various ways to create diverse views of your search result set, thus helping you identify the hidden meaning of your unstructured content. This IBM Redbooks® Solution Guide explains, from a high level, how to build enterprise search solutions with ICAwES.

The following figure shows how enterprise search solutions help you identify the hidden meaning of your content.

Note: For ease of reference, we use "enterprise search solutions" to refer to the enterprise search solutions that are built on top of ICAwES.

Did you know?

Enterprise search solutions can automatically identify a wide range of date formats and provide a single interface for viewing, sorting, and retrieving documents by a document's creation or modification date, or any date that appears in your documents.

You can set up your own synonym dictionary to expand search queries to include any number of word variations.

You can write custom annotators to extract concepts and add them to your search.

You can integrate different types of repositories in a single, integrated search.

Business value

Enterprise search solutions that are created with ICAwES add value performing the following tasks:

Creating a single interface when there is a need to create logical connections over multiple repositories.

Extracting discrete entities (such as personal names, telephone numbers, or addresses) from unstructured content. This is also important for linking non-structured content with structured data.

Overcoming repository inconsistencies and lack of organization, such as disk drives with unplanned folder structures, and increasing accessibility to these resources.

Enterprise search solutions provide a unified interface to diverse structured and unstructured sources by creating a single index. Users perform searches in a fully integrated search environment. Such an approach reduces the investment in restructuring, cleaning, and maintaining multiple repositories, which might result in systems that no longer function optimally. Enterprise search can link fields and concepts from different sources to a single search field and filter inappropriate content and outdated records, all without any changes to the original repositories.

The ICAwES enterprise search capability can extract elements according to the type of data source (as shown below) and map the results to a single index:

HTML metadata

XML elements, according to the entity name, attribute name, and value

Relative database fields (IBM DB2® and any database with a JDBC connection)

IBM Content Manager attributes

IBM FileNet® Content Manager (FileNet P8) properties

Enterprise search can also crawl Microsoft Exchange Server, IBM Case Manager, IBM Connection, IBM Quickr® for Domino®, and IBM WebSphere® Portal. Each content source has its own properties or metadata that can be added to the index.

Choosing which extracted elements to index, and how the index is used in search, gives you control over the solution functions.

An enterprise search solution is configured by using the Content Analytics Administration Console. There are three stages to setting up an enterprise search solution:

Crawler: Chooses which repositories to crawl.

Parse and Index: Chooses which crawled entities to map to index fields. Configures different annotation stages to extract facets and elements from the free-text document.

Search: Customizes the enterprise search experience by expanding users' queries by using synonyms existing in the content, through dictionaries and rules.

The enterprise search solution interface provides components for displaying different aspects of the search results. These components provide not only a meaningful overview of the content, but they are also interactive components for drilling down into the result set, refining the results according to the properties that are chosen for the user. Each component lists the result document count per element and allows you to add the elements to the drill-down search.

The search results are displayed with the following components:

Facet Dialog: Provides a list of elements that are discovered by content analytics annotators.

Category Tree: Displays categories that are defined by configurable rules.

Dynamic Facet Chart: Displays date ranges.

Solution architecture

An enterprise search solution consists of two servers: an controller server and a search server. The controller server collects information from the crawled sources and builds an index for the search server. The search server carries out the actual search based on user queries.

The following data sources can be crawled and collected by the controller server:

Content management sources

Database systems

IBM Case Manager sources

Email, file system, and web sources

WebSphere Portal sources

IBM Connections sources

IBM Lotus® Domino sources

The following figure shows the enterprise search solution architecture.

Figure 2. The enterprise search solution architecture

The administration console is responsible for configuring both servers. The search application can also define many search parameters, especially the query expansion and search results document ranking rules.

In the first scenario, a financial institution needs an enterprise search solution that enables its workers to efficiently and easily perform search across the following items:

Multiple repositories

Different repository types

Over geographically different locations

The ICAwES enterprise search capability enables the institution to create such a solution.

The financial institution used to have uniformed content management systems with only one type of content repository. Over the years, it has acquired many other financial institutions, and each institution has its own set of content and its own set of business rules and operations. To consolidate and unify business rules, operations, and content repositories among all these acquired institutions will take many years of planning, designing, and implementation. For now, the financial institution wants to keep the same operation for many of the acquired companies, yet be able to search content over these various content repositories. In addition, the financial institution is interested in searching over the external websites that contain a vast number of consumer comments and feedback that might help the company to gain business insights.

In this scenario, ICAwES enterprise search is used to integrate multiple repositories that, despite being physically separate and of different repository types, are linked according to the business need. ICAwES enterprise search allows the company to access and link the existing systems and newly acquired repository systems without needing to convert the data among the repositories. Some systems present special challenges, such as inconsistent data format, inconsistent ways of naming fields, and inconsistent ways of using words in unstructured content.

The ICAwES enterprise search can help solve problems of data inconsistency by placing constraints at both the indexing stage and at run time (search) and by providing additional access that was not available in existing systems. For example, when moving from one relational database design to another one, a single index field can access both sources, each with its own table and field design.

Scenario 2: Domain specific searching

In another scenario, a medical insurance company wants an efficient search of medical records for patient care and business analysis. The insurance company wants to capture relevant elements in the medical records during searches, including diseases, medications, medical procedures that are performed, the outcome results, and so on.

Using ICAwES enterprise search, the insurance company can build custom dictionaries, parsing rules, and create medical domain-specific custom annotators to identify relevant metadata, text, and extract their values. Such information can then be retrieved by search, along with the structured database query, for a particular patient.

Special Notices

This material has not been submitted to any formal IBM test and is published AS IS. It has not been the subject of rigorous review. IBM assumes no responsibility for its accuracy or completeness. The use of this information or the implementation of any of these techniques is a client responsibility and depends upon the client's ability to evaluate and integrate them into the client's operational environment.