Documentum, SharePoint, OpenSearch and cross-repository searching

July 22, 2011

Because clients often leverage SharePoint as their company Intranet, many consider configuring SharePoint as a search portal for external repositories. This approach allows users to use one central interface for searching and retrieving content. While Documentum provides the MyDocumentum for SharePoint product to provide search access to Documentum, we are seeing more clients wanting a combined search of both Documentum and SharePoint content. This post will discuss the various alternatives as well as present a hybrid open source solution based on our experience.

Issues with “one search to rule them all”

Often times, clients can get caught in a trap of looking for a single search vendor to solve the problem. Solutions include:

SharePoint – have SharePoint crawl the Documentum repository to expose Documentum content and SharePoint content from a single search.

Third Parties – whether FAST or Autonomy or open source tools like Lucene, clients look for third party tools to index all content as part of an enterprise search effort. This solution can be accessed either from within SharePoint or outside.

Issues with the “one search” approach include:

Just a “Google” Search? – ECM users might think they want a “Google” search but often want more than just text searching. As we mentioned in a previous post, the search of “show me all documents that contain this word” is very different then “show me all SOP’s for this product and this location that contain this word”. Combination of meta-data and content is a typical ECM search requirement.

Security – Documentum users get concerned about crawlers indexing their content. Often times, Documentum users want to restrict world users to approved content. In addition, if security recently changed in Documentum, and the index hasn’t been updated, restricted documents may still be displayed in the results.

Crawler – Documentum support resources can rightly be concerned about how often the search crawler will be crawling/accessing Documentum content to refresh the content for the search tool. Since Documentum support is used to indexing on their own (via FAST or xPlore), the overhead of two search tools can be a concern.

Searching External Data from SharePoint leveraging OpenSearch

In SharePoint 2010, there a two “out of the box” methods for searching external data:

– Index in SharePoint – As mentioned earlier, SharePoint can crawl the external system and add its data to the SharePoint search engine’s index; this can be extended with the Business Data Connectivity Service. This allows for metadata searching, but at the cost of the security and crawling issues mentioned above.

– OpenSearch – SharePoint 2010 can also query directly against the external system, leveraging a Federated search location and OpenSearch. OpenSearch is the industry standard for performing “Google-like” searches against external systems and displaying results. This solves the crawling and security issues, but not the “Google” search limitations.

Example of creating a new Federated Search location in Central Administration

One significant issue with leveraging OpenSearch is, as mentioned earlier, it is just a “Google” like search that doesn’t allow querying metadata. Additionally, OpenSearch queries are not supported by Documentum.

A Hybrid OpenSearch/SharePoint Solution

In order to create a search solution to address the aforementioned issues and limitations, we developed a search solution for SharePoint. The solution is composed of the following:

A customizable form to build a keyword and metadata query in SharePoint

A custom servlet that translates the search criteria into an OpenContent query and returns the results

A customized XSL template to style the search results and expose metadata

Leveraging TSG’s OpenContent API, we are able to run OpenSearch searches against any external repository such as Documentum to return our search results. The OpenSearch standard allows us to pass in additional metadata values into our search criteria not exposed in the SharePoint implementation, and OpenContent is able to translate each of these metadata requests into a proper query for Documentum, and return the results in an extended RSS XML result (per the OpenSearch specifications). The final piece of the user experience is to customize the xsl stylesheet for the search results to display our additional metadata from the target system in a tabular format that users expect. As shown below, the federated results can be placed alongside SharePoint results within the page.

One of the more interesting usages for machine learning is the potential to speed up and add efficiency to the indexing of documents. At TSG, we are currently adding this capability to our document indexing application. This post will describe the current methods of indexing from the major vendors and how an ECM 2.0 solution […]

Too often, migrating to Alfresco can be seen as a massive undertaking where the migration effort means moving all the content, integrations and people to the new platform in a migrate all at once, “Big Bang” approach. Given the effort to move all the different components, along with training the users on a new system, […]