We’re working with a large Pharmaceutical company to install Documentum xPlore as a replacement for FAST. We’ve just finished the QA environment deployment, and we’re planning for the Production deployment in mid-April. For this post, we are going to discuss the cutover strategy as well as some lessons learned from the project.

Cutover and Reindexing Strategy

The client’s current environment contains one Content Server serving five repositories, and one full text server running FAST indexers for each repository. The upgrade plan is to stop FAST, install xPlore to the existing full text server and re-index all of the repositories. We’re planning on starting the installation on a Friday evening after business close. Accounting for time to upgrade the content server to the latest patch and actually do the xPlore installation, we need to make sure that the re-indexing operation is complete before Monday morning. To increase the likelihood that re-indexing will be complete, we decided to install an extra Content Processing Server (CPS) instance on the full text server. The full text server has enough memory and CPU to handle the extra instance, so that is the best way to increase our indexing throughput. We recently executed the upgrade in the client’s QA environment, and our average re-indexing throughput was 70 documents per second. Assuming we see a similar average in production, this should be more than enough to complete the re-indexing operation over a weekend.

Wildcards and Fragment Searching

Since users can become frustrated with how the search results “should” work based on prior experience with the system, it is important to analyze the differences between FAST and xPlore when it comes to wildcards and fragment matching. A good place to start is the xPlore 1.2 Administration guide starting on page 175. Note that in all text below – when discussing a search term, it’s in single quotes. This is to denote where the search term starts and ends, but the user would not have actually put the quotes in his or her search term. Here’s how xPlore works out of the box:

Full text searches do not support wildcards or fragment matching. This means that a search for ‘car’ does not return a document containing ‘careful’. However, a document containing ‘blue car’ is returned. xPlore treats wildcards as literal values. Searching for ‘car*’ still does not return ‘careful’.

Metadata searches in Webtop’s advanced search work in a similar way, even when using the begins with, ends with, or contains modifiers. For example, searching for titles that contain ‘car’ will return titles that contain ‘blue car’, but will not return titles containing ‘careful’. However – xPlore does support wildcard characters in metadata searches. Therefore, if the user executes the search as ‘*car*’, then documents with titles of both ‘careful’ and ‘blue car’ will be returned. When executing a contains search on metadata, think of it as a ‘contains word’ search rather than a true contains search.

This last point is very important, since for most users it will not make sense. They may say – “I searched for documents containing ‘123’ in the name, xPlore is broken because I’m not getting document ‘1234’ in search results.” However, with the change to fragment matching, this user would have to search for ‘123*’ to return document ‘1234’.

If this is a problem for your users, you can turn on FAST compatibility mode. Although FAST compatibility mode will slow down search times, it may be worth it depending on how users are used to searching. This mode does a number of things:

Wildcards are supported from simple search and full-text searches. Searching for ‘car*’ does return ‘careful’. Note that fragments are still not returned. Searching for ‘car’ will not return ‘careful’.

Wildcards are implicitly placed in metadata searches. For example, if you search for documents containing ‘123’ in the object_name, the actual search would be for ‘*123*’, and in our previous example, would return document ‘1234’ as expected by the user.

Metadata Searches, Implicit Wildcards and Our Version of Webtop

For our client, we decided to turn on FAST Compatibility. Users of the system are too used to a contains metadata search to be a true contains search rather than a ‘contains word’ search as described above. However, we ran into one problem. The client currently has Webtop 6.5 SP2 installed. In our development environment, implicit wildcards in metadata searches were not added as documented in the xPlore Administration Guide. This means that in the above example, searching for ‘123’ in a object_name contains search does not implicitly search as ‘*123*’. As a workaround, the users are being instructed to add wildcards manually in metadata searches when the results are not as expected.

For example: say the user is searching for a document named ‘ABC-XYZ-1234.pdf. If the user executes a contains search in the advanced search for ‘1234’, the results will not contain the document. The user would need to search for ‘1234*’. This occurs because xPlore indexes the “word” 1234.pdf, and Webtop is incorrectly not adding the implicit wildcards in metadata searches.

Special Characters

In xPlore, certain characters are indexed as white space in the Lucene index. The default character list, as defined in the indexserverconfig.xml, contains the following characters: @#$%^_~‘*&:()-+=<>/[]{} For example, say a document’s text or attributes contain ‘PX-SOP-1234’, the CPS will index three tokens ‘PX’, ‘SOP’ and ‘1234’. According to the xPlore administration guide, a search containing a special character should be treated as a phrase search. This means that if a user were to search on PX-SOP-1234, the document would be returned. However, in our development environment, we were not seeing that behavior. In the previous example, a search for PX-SOP-1234 was returning 0 results. We didn’t get to the bottom of why this wasn’t working as described in the documentation. Perhaps it was due to our version of Webtop – 6.5 SP2. In any case, the business users are used to including special characters in their search queries, so we decided to remove the dash and underscore from the list of special characters. This ensures that a search for PX-SOP-1234 returns the correct results.

xPlore Admin Interface – IE Settings for Reporting

xPlore’s admin interface is definitely a big upgrade over FAST. While testing out the admin interface in the client’s development environment, we noticed some odd behavior around reporting in Internet Explorer: every report, no matter what settings we used, would only return a # character. No results were coming back for any report. To fix the issue, you need to add the xPlore Admin website to the list of IE’s trusted sites, and then set the security level to medium-low:

Navigate to the xPlore admin website

In IE, choose Tools -> Internet Options -> Security tab

Click on Trusted sites and click the ‘sites’ button to add the xPlore admin website to the list

Below, set the security level to Medium-low.

After restarting IE, all reports should work as you would expect

Overall, we feel that xPlore is a great upgrade and much better than the FAST search engine. If you haven’t upgraded, the process is fairly straightforward. Hopefully the lessons learned above will help you in your upgrade. If you have executed the upgrade in your environment, please comment below regarding your lessons learned!

Comments

Agreed, the behavior of ‘contains’ on metadata is not optimal and might confuse users especially when searching for a reference, a part number or an identifier. As you noticed the behavior of ‘starts with’ and ‘ends with’ in the compatible mode improved in the latest DFC. We plan to make things simpler to understand in the next version of xPlore.
Note that soon we will release on the EDN a custom indexing annotator that would help normalize and simplify searches for ID/references like PX-SOP-1234. Stay tuned.

Final comment. The admin interface does work and is certified with Firefox – I’m using it.

We installed xPlore a few months ago. We were having an issue with FAST just shutting off once a week. xPlore has had no issues with staying up and is faster than FAST but is also on a new faster server. We have a lot of hyphens and underscores in batch numbers and etc. From the documentation I thought the default would be these are treated as white space but they where in the indexserverconfig.xml out of the box, no complaint just seemed like a discrepancy between the documentation and the install. One problem we did have was some of our documents though English were getting indexed as “it” Italian, “pt” ? EMC informed us to en to index-default-locale in the indexserverconfig.xml. Most of out properties are alphanumeric, when I put an English word in xPlore would index the document as English. Overall it was an easy install and is working well.

Did you restart the main xPlore server and any secondary instances? The index agents don’t really do anything with search, so restarting those won’t make a difference. If restarting xPlore doesn’t do the trick, try restarting Webtop – it may cache the setting somehow, but I’m not sure.

The solution that I reached in this scenario when documents do not get searched in when value of some of its attribute is like ‘PX-SOP-1234’ is Enforcing Default Language to English. This can be done by editing indexserverconfig.xml. In case CPS won’t recognize language, only in that case document get indexed in English language.

One of the more interesting usages for machine learning is the potential to speed up and add efficiency to the indexing of documents. At TSG, we are currently adding this capability to our document indexing application. This post will describe the current methods of indexing from the major vendors and how an ECM 2.0 solution […]

Too often, migrating to Alfresco can be seen as a massive undertaking where the migration effort means moving all the content, integrations and people to the new platform in a migrate all at once, “Big Bang” approach. Given the effort to move all the different components, along with training the users on a new system, […]