IntraFind: empowering KM

At the Information Today Enterprise Search Summit in London in the spring, I met Franz Kögl, managing director of sales and marketing for IntraFind, a German open source search vendor. In what is proving a banner year for open source search, Kögl surprised me with the statement that his company started in 2000. In terms of commercial open source companies, IntraFind ranks with Compass (Elasticsearch) as one of the more seasoned competitors.

Along with co-founder Bernhard Messer, Kögl wanted to give users a simple and quick way to access information in the enterprise. In the last 12 years, IntraFind has refined its approach. Kögl says, "After more than 11 years doing enterprise search business, it is still an exciting challenge for us to present our customers with a wide range of use cases when working with our search and retrieval products. Two years ago, we started with solutions for metadata generation, which we found to be a new and exciting approach in expanding the possibilities of search."

With organizations crumpling under the pressure of increased flows of digital information, IntraFind has enjoyed robust growth. One differentiator is that the company is rooted in the German language. Kögl explains, "Our language, German, is very complex and full of irregularities and multiple-word terms. Stemmers do not deliver the quality we need for German. Therefore, we came up with the idea to develop a morphologically enhanced and semantically based system, which removes the complexity of the language from the user when querying our system."

Personalization

Organizations worldwide have learned that brute force search does not meet most users' needs when looking for information in an organization's digital files. IntraFind recognized that now obvious point more than a decade ago. Kögl says, "We do research for our complete text analytics stack, including semantic technologies, ontologies, machine learning, natural language processing, identifying entities and relations and sentiment analysis/opinion mining."

The IntraFind system allows a user to personalize his or her information experience. In addition to a traditional results list option, IntraFind provides a "dashboard" interface. The dashboard, says Kögl, "allows users to see new documents matching ‘query agents' or topics the user defined. Self-profiling makes use of our text analytics functions. Personalization is reflected in individual favorite repositories. In addition, individualized widgets on the user interface provide the user with personalized hot-linked facets. The user has control over what the interface displays in response to a user query or a mouse click."

The dashboard approach is becoming a standard feature of solutions from such competitors as PolySpot and OpenSearchServer, both French open source search vendors.

Kögl says, "We believe that the future is to improve the quality of unstructured data with structured information. So we give the user an ‘artificial' structure to support him. We are putting a lot of effort into a research of metadata extraction and classification of unstructured data to enable a more structured search experience from the user point of view. Our TopicFinder and Named Entity Recognition technologies are mostly making the difference in many procurements. We are delighted these technologies are now in demand. We provide both traditional search outputs and the mash-up style that is in demand."

Accessing content

Some open source systems have limited capabilities for accessing or connecting to file systems or file types. The IntraFind solution supports three types of content access. Kögl lists the three technological approaches for processing data from various sources:

The first approach is a classical crawling or poll approach where the data are crawled from the source system and indexed. The poll approach also supports incremental updates of changed data on the source system.

The second approach is "push." The source system automatically triggers indexing whenever a source item is changed

The third approach is an event-based indexing where the source system needs to provide an ability to trigger events in case of changes. Special connectors are then connected to those events and processing any change on the fly.

Kögl says that with the push approach or event triggering based approach, IntraFind can say that it provides instant updates to the indexes.

Long track record

IntraFind's product lineup includes a traditional on-premises license. However, the company has experienced success with its software development kit (SDK) tailored to the needs of a company requiring search and content processing. The SDK makes it possible for the IntraFind technology to be embedded in third-party products. That is a relatively new development for open source search vendors. For many years, the use of a proprietary system such as Autonomy (an HP company) as an embedded search system in products like Oracle WebLogic or OpenText RedDot was the standard approach. Now open source vendors are shouldering their way into that type of OEM deal.

IntraFind is one of a relatively small number of enterprise search vendors with 12 years of experience in search. Autonomy, Endeca (now Oracle) and FAST Search (now Microsoft) date from the late 1990s, but open source search is a more recent development. The company's technical expertise is an important factor. Kögl suggests that its 15-minute installation process is one factor that appeals to some customers. He says, "Customer focus, flexibility, deep know-how and our best-of-breed technology are the strongest arguments to work with IntraFind. We have 27 full-time professionals. We have well-known references. These provide high investment security. Our reputation is an important part of our company's track record."

Melding functions

What is interesting about IntraFind is that the company's approach blends several functions. Those functions—search, personalization and text analytics—were once separate. A decade ago, an enterprise would license a search system such as Convera (sold to FAST Search) or Endeca. The licensee could work with the vendor or use in-house programming to tailor the system, facets and alerts to meet the organization's requirements. If text analytics were required, the licensee would turn to specialists such as Xerox PARC spinoff Inxight Software (bought by Business Objects) to get access to robust entity extraction and other advanced text processing functions. Today, open source vendors bundle those features in one system that can be, according to IntraFind, installed in 15 minutes.