SharePoint 2010 and FAST Search for SharePoint (FS4SP). The stuff you don't find on TechNet

Thursday, January 12, 2012

FAST Search for SharePoint Relevancy - Part 1

Topic: FAST Search for SharePoint 2010 Relevancy.

Subject: Understanding Relevancy and FS4SP.

Problem: When I perform a search the items are not returned in the order I expect?

Response: Relevancy is fairly complex but FS4SP has some very powerful capabilities once a user understands how relevancy can be tailored to meet a specific organizations requirements.The rank of any item under index is made up of 2 distinct parts:

1.Static Rank

2.Dynamic Rank

Total Rank = Static Rank + Dynamic Rank

Relevancy is the order which items will be displayed based on their individual rank when queried from the index.

Static Rank:

The static rank is determined at CRAWL time and will not change unless an item is re-crawled and environmental factors have changed since the last crawl.The static rank is calculated from 4 components.

1.Urldepthrank: Rank points given to boost shorter URLs.

2.Docrank: Rank points given based on the number of and relative importance of links pointing to an item.

3.Siterank: Rank points given based on the number of and relative importance of links pointing to the items on a site.

The dynamic rank is determined at QUERY time and can change for any item retrieved from the index depending on how the item is retrieved. The majority of dynamic rank is calculated from several components.

Many items within an index will have the same Rank.When rank is calculated many items may have the same criteria which when calculated will have the same rank number.

2.Rank is Dynamic

A single item will not have the same rank for every search performed.Rank is dynamic meaning an item’s rank will change depending on how the item is retrieved from the index.Example: With OBB settings, if an item is retrieved from the index based on a hit on the title it will have a higher rank than if it is retrieved by a hit on the body or other managed properties.

The good news is these are all adjustable to meet the needs of different organizations.

In the solution\example section I will focus on #1 Managed Property Context for a couple of reasons.1) Brevity. Trying to write a blog showing a hands-on example of how relevancy works completely would be extremely long (I will try to follow up on other relevancy topics),2) there happens to be an OOB issue with one setting Managed Property Context settings.

Solution\Example:

1.Let's take a look at the Managed Property Context and the relationship to rank and relevancy.

2.Open SharePoint Central Administration

a.General Application Settings

b.Search -> Farm Search Administration

c.Select the FAST Query SSA

d.Left hand Navigation Select FAST Search Administration

e.Select Managed properties

f.Select the first managed property.

i.In my case I select “Account”

g.Scroll to the bottom and select “View Mappings”

3.This is the OOB Full-text Index Map used by the Managed Property Context.

a.As can be seen the map is broken into 7 levels.

b.Any managed property can be added to this map.

c.Dynamic rank points are added based on the search performed.

Example:A search is performed and 2 items are returned from the index.Item A was returned because the search term was found in the title managed property and Item B was returned because the search term was found in the body managed property.When calculating the total rank Item A would receive more rank point than Item B based on the Full-text Index Map.Remember there are several components that comprise the total rank but assuming the two items had identical scores from all other rank components Item A would outrank Item B in terms of relevancy and therefore be presented first in the search center.

4.Let’s look at how the Managed Property Context calculates Rank and contributes to the total rank.

5.Setup Content Source

a.I have setup a content source which contains 3 documents:

RELEVANCE SAMPLE – ONE.pdf

RELEVANCE SAMPLE – TWO.doc

RELEVANCE SAMPLE – THREE.docx

*** Side Note: For the purpose this example you want to make sure the content of each file will not interfere with the search criteria. “LEVELTEST” or “SAMPLE” except in title and metadata.

b.I have made sure that everything about the 3 Files are the same (Modified Dates, etc) and how I crawl them (by putting them in the same content source) are identical to ensure that all the relevancy will be the same except for the Full-Text Index Mapping

c.I created 3 crawled properties to associate with the Documents when crawling.

i.LEVEL1

ii.LEVEL2

iii.LEVEL3

***Side Note: I am using a custom crawler to crawl my content source but I could have just as easily created a document library with 3 custom fields. If you use a document library you will end up with more data in the index but you should still be able to achieve the same end results.

d.I created 3 Managed properties using the same name as my crawled properties. I have mapped them to the crawled properties and added each of them to the Full-Text Index Map.

***Side Note: If you are not created the crawled properties via PowerShell or another tool you may need to populate all the custom fields and crawl first before being able to setup the managed property.

New Properties:

New Full-Text index Map:

e.For this pass I have only populated 1 crawled property for each item in my content source.

c.With this search all 3 of my items where returned based on the “Title” property hit.With the freshness date and static rank properties being equal all 3 items have the same Rank of 1991 and therefore the same relevancy.Do not be surprised if your results display in a different order.This is the order in which my items made it into the index. With the same rank the order is the position the item went into the index. FIFO.

8.Change the Search Term to “LEVELTEST”

a.From what we have talked about in the Full-Text Index Map the order shouldn’t change.Each will be retrieved but from their associated Managed Property: LEVEL1, LEVEL2, and LEVEL3.We expect LEVEL3 to have a higher priority than LEVEL2 and LEVEL2 higher than LEVEL1.

b.The search Results reveal a different story.We should have seen exactly what we expected.We can see that our Rank for each individual item has changed based on how they matched to the Full-Text index map.Seeing the different ranks between the 2 search terms on the same items shows how rank and relevancy is truly dynamic.

9.Next we will take a look how Rank is calculated and why we didn’t get our expected results.

22.It is apparent where the problem is. The Weight on level 1 is set equal to the weight on level 4.

23.Create a new Rank Profile

a.It is always recommended you create and use a new RankProfile when making changes.This gives you the ability to compare differences between Profiles when making changes and before implementing any changes in production.

b.From the FAST Command Shell Execute:

New-FASTSearchMetadataRankProfile -name default1

c.Any new Rank Profile will inherit from the default Profile unless specified

i.You can execute the commands from #20 and #21 replacing “default” with “default1”.

ii.The two rank profiles should be identical

24.Update the “default1” profile to set Importance Level1 to an appropriate value

a.I choose to set the Level 1 weight to 5. I could have just as easily changed them to 10 through 70 but for this example I want to change as little as possible.

a.On the Search Center edit the “Search Action Links” web part and enable the new default rank profile.

26.Change the Sort by to the new “default1” profile and execute the Search “LEVELTEST”

a.Notice quickly that the results are now as expected.

27.Rank Log Results

Hit: 1

Title: RELEVANCE SAMPLE - THREE

Query term: 'leveltest'

Context score.................: 164

Number of hits/score................: 1/20

Importance level/score.............: 3/144

Total Rank score............: 1479

############################

Hit: 2

Title: RELEVANCE SAMPLE - TWO

Query term: 'leveltest'

Context score.................: 92

Number of hits/score................: 1/20

Importance level/score.............: 2/72

Total Rank score............: 1407

############################

Hit: 3

Title: RELEVANCE SAMPLE - ONE

Query term: 'leveltest'

Context score.................: 56

Number of hits/score................: 1/20

Importance level/score.............: 1/36

Total Rank score............: 1327

############################

28.I used a pretty simple example to show how the Managed Property Context map works so I will use another quick example.

29.I populate all three Crawled properties with the value of “LEVELTEST” for all three documents and re-crawled.

30.The follow are the search results and rank.

31.Note the difference in the Rank Calculation.

a.All Levels where hits matched the Full-Text Index map contribute to the Rank.

b.This will not always be the case.There are some advance situations where not all levels will be applied depending on the search term and how many items are under index.I will leave the advanced settings and how, when, and why for a follow-up post.

Hit: 1

Title: RELEVANCE SAMPLE - THREE

Query term: 'leveltest'

Context score.................: 278

Number of hits/score................: 3/26

Importance level/score.................: 3, 2, 1/252

Total Rank score............: 1593

############################

Hit: 2

Title: RELEVANCE SAMPLE - TWO

Query term: 'leveltest'

Context score.................: 278

Number of hits/score................: 3/26

Importance level/score.................: 3, 2, 1/252

Total Rank score............: 1593

############################

Hit: 3

Title: RELEVANCE SAMPLE - ONE

Query term: 'leveltest'

Context score.................: 278

Number of hits/score................: 3/26

Importance level/score.................: 3, 2, 1/252

Total Rank score............: 1593

############################

32.Let’s re-visit some the Weight Properties whether from the content level or the individual importance levels.

FullTextIndexReference : content

ProximityWeight: 140

ContextWeight: 50

a.As you probably noticed weights to not directly relate directly to points.The weights within individual Dynamic Rank calculations are based on how important they are to other rank calculations in the overall Rank calculation.If we changed the “ContextWeight” from 50 to 100 we would see the Rank Points produced from the Managed Property Context double meaning it would become more important in the overall rank calculation.

33.Final Important Note:The Managed Property Context is considered part of the dynamic portion of relevancy but it does have a static portion to it.The Full-Text Index Mappings are static.If you want to add or re-arrange the Map you must re crawl the content for it to take effect.

Conclusion: The Managed Property Contextis one portion of how Rank is calculated and relevancy is determined.I never tell people what they should do but it is pretty obvious that the Importance Level 1 is not set correctly OOB.The Managed Property Context can be tailored to help an organization improve relevancy from the stand point of what managed properties are added to the Full-Text index map and how much the Managed Property Context itself and individual levels should weigh against other relevancy factors.It is extremely difficult to adjust relevancy with 20 million items in the index but it is possible. If I had tried to look at the examples I provided with 20 million items I probably would not have noticed the erroneous setting.Fortunately the Managed Property Context is a part of the dynamic ranking calculation and multiple Rank Profiles are available so trying adjustments and comparing results most times does not required re-crawling content.