It's ok. Be different.

On a project I’m currently on, we had a scenario where we needed to support being able to quickly remove potentially many documents from the FAST Search index. Unfortunately, the FAST web administration only allows you to delete one document at a time, which would definitely not be suitable for our scenario. We had a couple of ideas on how we were going to tackle the problem. One of the ideas we tossed around was using the FAST Content API. Although we didn’t end up using this technique for the project, I still believed that using the Content API along with Powershell to be a very useful and powerful combination. So today, I spent a little bit of time working on a Powershell cmdlet that can remove many items from the FAST index.

Visual Studio 2010 Project Setup

The first thing to do is to create a Class Library project in Visual Studio and add a reference to the Esp-Contentapi.dll from the ESP SDK. You’ll also want to add a reference to both System.Management.Automation.dll (found in C:\Windows\assembly\GAC_MSIL\System.Management.Automation\1.0.0.0__31bf3856ad364e35) and System.Configuration.Install.dll (in C:\Windows\Microsoft.NET\Framework\v2.0.50727).

After adding the three dlls, you want to add two class files to the project, a Powershell snap-in class and a class for the cmdlet. In my project, my snap-in class is PointBridge.FAST.Cmdlets.PointBridgeFASTSnapIn and the cmdlet class is PointBridge.FAST.Cmdlets.Content.RemoveContentItem. The code and explanation of these classes follows.

PointBridge.FAST.Cmdlets.PointBridgeFASTSnapIn

This class derives from PSSnapIn and is used to register all the cmdlets in the assembly. When deriving from PSSnapIn, you need to override the following three properties: Name, Description, Vendor.

The class also is decorated with the RunInstaller attribute, in order to be able to install the assembly using installutil.exe.

1:using System;

2:using System.Collections.Generic;

3:using System.Linq;

4:using System.Text;

5:using System.ComponentModel;

6:using System.Configuration.Install;

7:using System.Management.Automation;

8:

9:namespace PointBridge.FAST.Cmdlets

10: {

11: [RunInstaller(true)]

12:publicclass PointBridgeFASTSnapIn : PSSnapIn

13: {

14:publicoverridestring Name

15: {

16: get { return"PointBridgeFASTSnapIn"; }

17: }

18:

19:publicoverridestring Description

20: {

21: get { return"Various cmdlets to help with FAST management."; }

22: }

23:

24:publicoverridestring Vendor

25: {

26: get { return"PointBridge"; }

27: }

28: }

29: }

PointBridge.FAST.Cmdlets.Content.RemoveContentItem

This class, which derives from Cmdlet, is the main class than handles the processing. When building cmdlets, you decorate the class with a Cmdlet attribute. This attribute is used to indicate the verb-noun pair used to invoke your cmdlet. In this instance, because of this attribute, my cmdlet is invoked as ‘Remove-ContentItem’ from the shell.

The RemoveContentItem class has three Powershell parameters:

ContentID – the ID of the content to delete from the FAST index.

Collection – the name of the collection in FAST where the item is in.

ContentDistributor – the server/port of the FAST ContentDistributor.

In the BeginProcessing() method (overridden from the Cmdlet base class), I set up an instance of an IDocumentFeeder object to be used later, when processing each record. The IDocumentFeeder is an interface that allows you to work with a FAST ESP collection for adding/removing/updating documents within that collection. You can get an instance of an IDocumentFeeder by calling the static CreateDocumentFeeder method of the Com.FastSearch.Esp.Content.Factory class.

In the ProcessRecord() method, I call the RemoveDocument() method of the IDocumentFeeder object to queue up the removal of the content item. The ProcessRecord() method is called for each ContentID passed into the cmdlet from the pipeline.

Finally, in the EndProcessing() method, I take care of reporting and clean up. The call to IDocumentFeeder.WaitForCompletion() is used to make sure all deletes that were submitted are complete (successfully or not) before we continue. After the deletes have been processed, I used the IDocumentFeederStatus object returned from IDocumentFeeder.GetStatusReport() to build up a report of the deletes that failed or executed with warnings.

Line 1 just adds the snap-in for use in my current session. Line 2 sets up an array of the content ids I want to delete from my collection. This array (or set of records to process) can be read from a file, database, wherever. Here, I just set it up directly as an example. The third id in the example above is a fake id that doesn’t actually exist in my collection. Lastly, I take my content id array, pipe it to my remove-contentitem cmdlet and the results are sent to an output file (not necessary to push to an output file but I always like to, instead of everything dumping on the screen).

The nice thing about wrapping this up in a cmdlet is that I can reuse this cmdlet in my Powershell scripts so that I can easily remove any unwanted content from my collections.

So here is the shameless plug - If you want a copy of the Visual Studio solution, use the Tweet link below to tweet this post. Then send me an email (btubalinal@pointbridge.com) and I’ll send you a copy of the solution.

For a custom application we’re building, we decided to use SSL to communicate between our .NET application and the FAST query server. Setting up SSL to enable secure communication between a .NET application and a FAST query server is fairly simple but the FAST documentation doesn’t exactly explain it a straightforward manner. These were the steps I took to enable and use SSL.

The first step is to request a new certificate.You can use the openssl executable found in c:\esp\bin to request the new certificate. Open up a command prompt and change the working directory to c:\esp\bin. Enter the following command:

The above command creates a new certificate and should be fairly easy to understand. The request file will be saved as client.pem and the certificate generated (later) will be valid for 365 days. The configuration file specified by the –config parameter is the file that specifies the fields that you will be prompted for as identification to be included in the certificate request. When you run the command, you will be asked to enter and verify a passphrase. You will then be asked for the fields specified in the config file (note that you can just leave the ‘extra’ attributes being asked for blank). The command also outputs the private key to a file called privkey.pem. The exact output of the command is below:

This exports out the PKCS12 file to a file called client.p12. It uses the client.pem file (which is the certificate) and the privkey.pem file (which is the private key) and merges them into the client.p12 file. The client.p12 file needs to be imported into your personal certificate store on whatever machine you’ll use to access the FAST query server. So copy the p12 file to your machine and double click the file to start the certificate import wizard. The wizard will ask you for the password/pass phrase you gave when creating the original client.pem certificate. The rest of the wizard is straightforward.

The last file we need to generate is a .cer file. This file is what is required to be used by your .NET application to issue the queries to the query server. Here is the command to create the .cer file:

openssl x509 -in client.pem -out client.cer -outform DER

This command basically converts the client.pem certificate (which is in PEM format) to a certificate that is in DER format and saves it to the client.cer file. Copy this cer file to a location on your machine where your app has access to.

Most of the code should be easy enough to understand. The cert file is used in the call to get the view from the instance of the HttpSearchEngine. The FAST ESP API will take care of reading that certificate file and using it to communicate securely to the FAST query server.

With the vast amounts of information available, it can be a challenge for information workers to find in a timely manner the information they need to perform their daily tasks and make decisions. Information can reside in their company’s intranet portal, it can be in various databases, or it could be completely outside of their organization. The time it takes to search each content source individually is time wasted. With various different search platforms (like FAST ESP and SharePoint 2010), content that resides in these various different data sources can certainly be indexed and made available for searching. You can even create a federated search experience. Federated search is essentially the ability to issue a search query across multiple search engines. For example, I can create a federated search in SharePoint so that whenever a user issues a search, the search will be executed not only with SharePoint’s search engine but Bing Search as well and the search results from each engine would be displayed to the user.

Even though there is the ability to query against different search engines, the results from each engine are not combined into a single result set. On a search results page, each search result set would typically have it’s own area where the results would be displayed. For instance, the search results from SharePoint might be displayed in the center area of the page and the search results from Bing might be displayed on the right-hand side of the page.

But what if you wanted to build a search experience so that all the results were displayed in a single result set (i.e., intermingled with each other)? Along with other capabilities, FAST Search for Internet Sites (FSIS) can be used to build this type of search experience. As an example, I will show how to build a FSIS federated search that will execute the user query against FAST ESP, Bing, and YouTube.

FSIS System Overview

Before building the search experience, it is probably best to get a brief overview of the different components that comprise the FSIS system. FSIS is a platform built on top of FAST ESP (Enterprise Search Platform) for creating search-driven experiences and applications and consists of the following parts:

Content Transformation Services (CTS)– CTS extends ESP’s content processing capabilities and sits between ESP and the content sources. In the typical processing workflow, content will be processed by CTS first before being processed by ESP. It is not necessary for content to be processed by CTS. Content can be fed directly into the ESP content processing pipeline.

Interaction Management Services (IMS) – Whereas CTS is an extension of ESP’s content processing, IMS extends ESP’s query and results processing capabilities. IMS sits between ESP and your application. In the typical query flow, the query will be sent from the application to IMS which will then process it and transmit it to the query server. After the query server retrieves the results, IMS can then potentially process the results further before delivering it to the application.

FAST Search Designer for Visual Studio – The search designer is an extension for Visual Studio that is used to graphically model the processing jobs (called flows) that CTS and IMS runs.

IMS UI Toolkit – The toolkit is a set of UI components that can be used to build the search front-ends.The UI components are wired to work with IMS flows and can be used as-is or can be further extended to meet custom needs. The UI components are built using Microsoft AJAX libraries, although some of the components have been packaged as ASP.NET controls.

Search Business Manager – The Search Business Manager is a web-based application that allows you to both manage the IMS and CTS system (e.g., configuring hosts and nodes and managing IMS/CTS flows) and for managing the wiring between the application front-end and the IMS flows in IMS-driven search applications.

Figure 1 is an example architecture diagram of a typical FSIS-based search application and shows the different paths content can flow from the source to the application. For example, content that needs to be pre-processed before being sent to FAST ESP (where it would be processed further) would go through the CTS module. Content that doesn’t need any pre-processing can be directly sent to FAST ESP. Likewise, IMS can send queries to FAST ESP or bypass ESP completely and send the query directly to the content source (another search engine).

In the example federated search experience we will build, we will use IMS to go to both FAST ESP and to Bing and YouTube directly.

Building the IMS Flow

In order to build the search experience, the first thing we need to do is build the IMS flow. CTS and IMS flows are essentially workflows that define how content or queries and results will be processed. The content or query/results passes through a set of operators that perform some task on it before handing it over to the next operator where it is processed further. Eventually, the processing of the content or query/results completes and is handed off to the consumer.

Flows are built using the FAST Search Designer for Visual Studio extension. To get started, create a new Visual Studio Class Library project (I called my project MyFlows). In Solution Explorer, right-click on the project and select Add –> Flow. In the Add New Item dialog, choose the Interaction Management Subflow item template (under the FSIS category) and name the flow FederatedFlow.flow. When the new flow is added to the project, the designer canvas should be nothing more than a blank page. We need to drop operators from the Interaction Management section in the Visual Studio Toolbox to our canvas to define our flow.

All IMS flows must start with a Flow Input operator and end with a Flow Output operator. IMS flows all work on what is known as the Context object. This context object is a special object that encapsulates the query and the results and other information about the current context (e.g., user information). The context object passes from the UI layer to IMS where it is processed with an IMS flow and returned back to the UI layer where the various components in the UI toolkit use the information inside the context object. The Flow Input operator specifies where the context object enters the flow and the Flow Output operator specifies where the context object exists the flow, after it’s been processed by the other operators.

Drag a Flow Input operator from the toolbox to the canvas. Double-click the operator to bring up the Properties dialog box. The most important part of this operator is defining the context object. Under the Schema section, you should see an empty table. Add a line in this table with a name of ContextObject and select the ContextObject type from the dropdown under the Type column. When finished, the Schema should look like this:

Next, drag a Flow Output operator to the canvas. There is nothing really to change with the Flow Output operator so there’s no need to bring up its Properties dialog.

The next thing you’ll want to add is a Pre Search Result Mixer operator. This operator defines the type of mixing you want the results to have in a federated search. There are five mixing options:

Round Robin – results from each system are mixed in turn.

Random – randomizes the results from each system.

Relevancy – sorted by the relevancy scores returned from each system.

Sort By – sorts by the values in a specified field.

Stacker – displays N number of results from each system in a stack.

After you’ve got the Pre Search Result Mixer operator on the canvas, select the Connection item from the toolbox and drag a connection from the Flow Input operator to the mixer operator. Double-click the mixer operator to bring up the operator’s properties. Notice under the Schema section how the ContextObject that was defined in the Flow Input operator is already added. Like I mentioned earlier, the context object is what will be passed throughout the IMS flow’s operators.

For this example, we will use Round Robin mixing so select RoundRobin for the Mixer Type. When using this type of mixing, we need to specify what order we want the results from the various systems to display in. For this example, I want the ordering to be a FAST ESP result, a Bing result, and finally a YouTube result. I specify this in the Source Order property. Note that the names you specify here are arbitrary but will need to match the names you give each source with the Result Source Setter operator (which we will cover later). All other properties for this operator can be left to its default. When completed, the properties should look like this:

Next drag an ESP Lookup operator to the canvas and create a connection from the Pre Search Result Mixer to this operator. The ESP lookup operator sends the query to FAST ESP and retrieves the results from ESP. To configure this operator, you need to specify the ESP QR server and the search view that IMS will send the query to. Everything else can remain the same (in fact, don’t change anything else but the QR server and the Search view properties).

Now add a Result Source Setter operator to the flow and connect it to the ESP lookup operator. This operator is used to name the source where the result is coming from. We will need to add one of these operators for each source system we have in our flow (in this example, we have three: ESP, Bing, and YouTube). For this operator instance, specify the source name in the Properties dialog to whatever value you specified in the Source order for the Pre Search Result Mixer. For this example, I named the ESP source as ‘esp1’.

Now we want add operators to handle Bing searching. Add the following three operators to the flow: IMS to Bing Search Adapter, Bing Lookup, Bing to IMS Results Adapter, and another Result Source Setter and connect these operators in that order. You also want to connect from the Pre Search Result Mixer to the IMS to Bing Search Adapter. When all of these operators are added to the flow, this part of your flow should resemble this:

The IMS to Bing Search Adapter operator transforms the search request from the internal search query format to a query format that the Bing search engine understands. There is nothing you need to change with this operator.

The Bing Lookup operator actually executes the query and returns the results back to IMS. In order to use the Bing service for your custom applications, you will need to create an App ID using the ‘Create an App ID’ link found here: http://www.bing.com/developers. Once you’ve got a Bing App ID, set the App ID property of the Bing Lookup operator to this ID.

The Bing to IMS Results Adapter operator transforms the results from Bing to the common IMS search results format. Here is where you need to map fields from a Bing search result to the fields in the IMS search results format. By default, the following fields are already mapped for you when you drop this operator on the canvas (and we’re not adding any more mappings in this scenario):

Bing Search Results Field

IMS Search Results Field

Title

title

Description

teaser

Url

url

DisplayUrl

displayUrl

Finally, set the Result Source Setter for Bing to the same name you gave in the Pre Search Result Mixer (‘bing’, in this example).

To add YouTube federated searching, follow the same pattern that was used for Bing but instead of using the Bing-related operators, use the OpenSearch-related operators. After you drop these operators on the canvas and connected them, this part of the flow should look like this:

For the Open Search Lookup operator, configure the URL template as: http://gdata.youtube.com/feeds/api/videos?&q={searchTerms?}&max-results={count?}&start-index={startIndex?}&v=2&alt=rss&fields=item(title,author,atom:link). The URL template specifies the search engine-specific url that is used to execute the search and the query parameter values that are specified with open and closed brackets are dynamically replaced by the IMS engine during query-time. In the template specified, the query parameter values will be replaced at runtime with the following:

URL Parameter

Description

{searchTerms?}

The keywords entered in by the user.

{count?}

The number of results to retrieve from the search engine.

{startIndex?}

The starting offset from the entire result set to begin the retrieval from.

For instance, if I have 100 results and I specify the count to be 20 and the start index to be 21, that means I want records 21-40 from the search engine.

To explain a little bit more about the URL template specifically for YouTube, I found that I needed to specify the data format I wanted back as RSS (using the alt parameter) and I needed to specify very specific fields to return back (&fields=item(title,author,atom:link). If I just allowed the YouTube data service to return back all of the fields, my flow would always error out because the OpenSearch to IMS Results Adapter couldn’t handle the structure of the XML being returned from the service. Check out the following resources for more information regarding the YouTube data service: http://code.google.com/apis/youtube/2.0/developers_guide_protocol.html.

For the OpenSearch to IMS Results Adapter, I specified the following mappings:

YouTube Results Field

IMS Results Field

title

title

author

teaser

link

url

Lastly, you’ll want to set the Result Source Setter to the same name you specified in the Pre Search Result Mixer for YouTube results (set to ‘youtube’ in this example).

Now that we’ve specified the query and results processing for each of the federated search systems, we need to add operators to our flow that will merge the results and mix them together. The two operators we want to add are the Search Result Merger and the Search Result Mixer. Drop one of each to the canvas and connect each of the Result Source Setter operators to the Search Result Merger. Then connect the merger operator to the Search Result Mixer operator and finally, connect the mixer to the Flow Output operator. The good news is that there’s nothing that you need to change with any of these operators’ configurations. So the complete flow should look like:

MyFlows.FederatedFlow

After you’ve designed the flow, go ahead and build the project. Building the project will deploy the flow to the FSIS server where it can then be used by your web applications.

Building the Web Application

To build the web application, create a new web application project in Visual Studio. A good starting point to use for any IMS-based application is the sample application that’s part of the IMS UI Toolkit (located at C:\Program Files\FAST IMS UI Toolkit). From this sample application, you’ll want to copy over to your app the web.config, and the Components, images and styles folders. You also want to copy and add references to all of the Microsoft.Ceres.*.dll assemblies from the sample application’s bin folder to your own application’s bin folder.

First, open up the web.config and under the AdminNodes element, make sure you have a reference to your FSIS admin server and port. Everything else can be removed.

Under the flowAliases element, you’ll want to add an alias for the flow we just created. The key is the name I want to use to refer to my flow and the value is the full name of my flow. In the appSettings section, there is a setting for the default flow to use. This specifies the default flow that will be used by the IMS search toolkit if a specific flow isn’t specified via code. You can choose to specify the flow we created as the default but in this example, I will specify the flow to use in code.

The relevant sections of my web.config now look like this:

Web.config

1:

2:<appSettings>

3:<addkey="defaultFlow"value="esp"/>

4:<addkey="requireFlowAlias"value="false"/>

5:<addkey="preprocessingFlow"value="Microsoft.DefaultPreprocessing"/>

6:<addkey="debugFlow"value="false"/>

7:<!-- add key="defaultNodeSet" value="NodeSet1"/-->

8:<!-- add key="prefetchNodeSets" value="NodeSet2;NodeSet3"/-->

9:</appSettings>

10:<AdminNodes>

11:<NodeHost="pbfast01"Port="17004"/>

12:</AdminNodes>

13:<flowAliases>

14:<addkey="esp"value="MyFlows.ESPLookupFlow"/>

15:<addkey="federated"value="MyFlows.FederatedFlow"/>

16:</flowAliases>

After you’ve modified the web.config, using the Visual Studio Add New Item dialog, add a new Generic Handler item to the website root folder and call it Search.ashx. Delete everything that’s pre-created in this file and instead add the following:

This handler is basically a facade that is used by the different IMS UI components to send the query to the IMS system and work with the results that are returned from IMS.

Next, you’ll want to add a web page to the application. On this web page, we are going to add different IMS components that will comprise our search application. The Components folder that we copied over from the sample application to our new application contains all of the UI components that are part of the IMS UI Toolkit. All of these components are implemented using Microsoft AJAX and ASP.NET client templates. There are many components in the toolkit but for this example, we’ll only be working with the SearchForm, NavigationBar and the HitList components. Note also that for some of these components, an equivalent ASP.NET server control exists but not for all of the components.

Below is what the ASPX page should look like (I will go over the more important pieces of the ASPX page):

Like previously mentioned, the UI components use ASP.NET client templating capabilities to render the output. Client templates are defined using a combination of HTML elements and CSS styles along with placeholders where the data will go when the records are bound to the template. Each component you use should have a client template. It is easiest to make a copy of the out of the box client templates that come with the toolkit and modify them to suit your needs. The templates for this application are defined in the external files ‘/Components/Hitlist/MyHitList.html’ and ‘/Components/NavigationBar/MyNumberedPagedNavigationBar.html’. To use these templates, we use the #include directive on lines 85 and 86.

When the page is first rendered, however, you need to make sure the HTML and CSS that are part of the templates are not displayed since they’ve not been filled with data yet. That is what the Common.html file in line #7 does, it sets the visibility of all elements marked with the class ‘sys-template’ (you must mark your templates with this class) to hidden.

Lines #9-22 add the various scripts and stylesheets needed by the IMS UI components on the page. Lines 9-14 must always be included. Each component in the UI toolkit has its own script file. You only need to add the scripts for the components you’re using on the page. In this case, I’ve added the component scripts for the Search Form, the Navigation Bar, and the Hit List.

Lines #26-41 is where I define the layout of the page. There are different divs on the page that act as placeholders for my various components. Lines #29-30 define my search box and search button, line #34 is for my navigation bar and line #36 is for my hit list where the search results will be placed.

Lines #44-83 is the javascript needed to make the page functional. In the pageLoad() function, I create an instance of an ISearchExecutor object through the SearchExecutorFactory (passing in the name of the handler that we created earlier). I then set the executor to use the federated flow when processing the query (line 51).

In lines #54-61, we are configuring the search textbox and search button as part of the search form. First, at #54, we are defining the text box as a search input (you can have multiple inputs for a search form). Then, in lines #57-61, we create a SearchBuilder object and point its searchInput property to the textbox and its submitButton to the search button. When a search term is entered into the textbox and the search button is pressed, this SearchBuilder will take care building the query based on the inputs and firing off the query using the executor.

Line #64 is where we create an Ims.HitList object and tie it to the hitlist div element on the ASPX page.

Lines #67-77 creates an Ims.NavigationBar object and ties it to the navbar div element on the page. For the NavigationBar, we specify what HitList object it works with, how many pages to show at a time, and the element ids of the elements in the MyNumberedPagedNavigationBar.html template.

There’s really not too much to discuss with these files. With the hitlist template, I check to see what the hitSource is for each item and create the right elements and styles for each type of result (esp, bing, or youtube). Notice that the fields of a result item are available via the fields object (e.g., fields.teaser, fields.url, etc). Also, I created a javascript function that creates the object element necessary to display the videos from YouTube. The reason why you want to build the object element dynamically is because if you just added the object tag directly, like the other elements in the template, then Internet Explorer tries to load it even if there’s no video specified yet for the object (in other words, it will try to load something for the object element when the page is first loaded, before any search takes place).

The Final Product

Once the IMS flow and the application is built, this is the final result (you can also view a video demo of the application here):