Friday, May 20, 2011

Improve SharePoint Search Relevance

Introduction:

I recently developed a SharePoint 2010 solution which includes an advanced search web part which allows the users to perform enterprise searches and view the returned results in a graphical rich representation.

As an architect I want to ensure that the search results that are returned to the user match what the user wanted to find and that the results that are returned on the first page are the most relevant, so the user does not have to look through several pages of results to find the best matches for their search. This is called Search Relevancy.

It is very important to realize the difference between Sorting and Ranking. In my own words I would describe Sorting as the process or arranging objects according to a specific attribute of the item. An example would be the books in a library.

Ranking is where an item takes precedence over other items based on a combination of attributes. Examples of this would be how tools in a workshop are arranged, how equipment is arranged in an E.R. room, how individuals are ranked based on the role they play in the military, or even how food items are arranged in the supermarket. One realize from these examples that there is no single property which can be used to determine the ranking of such items and that Ranking is based on the importance an item has in a given situation.

The Role of the SharePoint Architect:

Architects and developers place a lot of focus during development to ensure that search engine works well. Performance and Accuracy of search results are normally the main focus point.

Providing a comprehensive search solution requires that the architect considers the search experience from the end-user point of view.

clip_image002

The end-user is not really concerned about what happens on the server and will most likely not appreciate the effort which goes into building robust search engines.

Management spend up to 27% of their time looking for information and when using the Search functionality in your SharePoint solution only one question will be asked; “Did I find what I was searching for on the first page of the results?” If the item the user searched for is not part of the first 10 search result items then then user will most likely believe the item does not exist in the system.

The SharePoint Search Ranking Model:

Relevance is about how closely the search results that are returned to the user match what the user wanted to find.

SharePoint uses a formula to determine which items are most relevant but the end-users have a different view on what should be considered the most relevant items. Even different users who use the same solution might have different opinions about this. (To be honest I can relate to this as I will sometimes at home arrange our DVD collection in a particular order and another family member would prefer a completely different order)

As an example when the user performs a search using the keyword ‘Demo’ the SP Search Engine returns the items in the following order:

clip_image003

SharePoint evaluates all properties and even the contents of the documents returned in the search results and used a complex algorithm to determine which items are most relevant and which items are least relevant.

Users are not interested in algorithms and might not even consider the fact that the search engine also takes into calculation less obvious dynamics like “Click Distance”, “Document Location (URL Depth)”, “Document Popularity”, “Language Detection”, “File Type”, etc. In actual fact the algorithm is very complex and it is unfortunate that very few end users appreciate at how clever the search engine is.

Some users might feel that the search results should be ranked in a way so that the Title of the item is considered the most important determining factor.

In this example we executed the same search we tested earlier but because a different preference of ranking were specified two of the result items were allocated a lower ranking order and items where the Title field are closer to the search criteria used moved up in the ranking.

clip_image004

There might even be a requirement to rank the search results based on a custom field like ‘Client Name’. You can see from the same search results below the item where the search criteria was not found as part of the ‘Client Name field moved down in the ranking order and items which contains the search criteria in the ‘Client Name’ field moved up in the rank.

clip_image005

SharePoint Enterprise Search includes a ranking engine developed in collaboration with Microsoft Research. It is specifically tuned for the unique requirements of searching enterprise content.

The great news is that it is possible for IT Professionals to customize the way SharePoint rank search results. This can be done through creating a custom model (ranking model) and instructing SharePoint to use the custom model in a particular area of the solution or even to set the new ranking model as the default for SharePoint Search.

This article will provide you with the necessary background SharePoint Ranking Models and will guide you through the process of creating and implementing your own custom ranking models.


Let’s Get Started!!

Structure

As a developer I used to hate this part of an article… the mumbo-jumbo theory stuff which can be very boring, but believe me, if you really want to become one of the best SharePoint professional you have to consider the ground-rules, even if they are not always applicable to the immediate problem which you are trying to solve. So here we go….

Ranking Model are based on a XML schema which contains an unique identifier, name, description and specifics of the components as part of the formula when calculating numeric scores which indicate which search results are more relevant than others. (For those who are interested in the formula please research the BM25 rating model)

The formula to calculate relevance uses two areas of ranking called Static and Dynamic ranking.

Dynamic Ranking (query-dependent ranking) is where the property values or content of an item affects the ranking score. As example the ‘Title’ field can be evaluated against the search criteria and the more important we consider the field to be, the higher the ranking score will be for items where the value in the ‘Title’ field have a closer match to the search criteria.

Static Ranking (query-independent ranking) is where the content or property values of an item do not determine the ranking of the item.

Dynamic Ranking contains the following components:

  • Property Weighting is used to assign a weighting to a property so that they are weighted more heavily in the relevance calculation. Configure this setting to a value between 1 and 75.
  • Property Length Normalization – because properties of an item vary in length, evaluation of the values cannot be treated equally and we need to adjust the rank of a content item, based on the length of the property, and the length normalization setting. This only applies to properties that contain text.The range of possible values for this setting is 0 to 1. For long text-managed properties, you usually want to set this to a value near to 0.7, which is the approximate setting for the body property. For properties that contain a small amount of text use a value of approximately 0.5.
  • Title Extraction - only performed on Microsoft Office files. In scenarios where the ‘Title’ field of an Office file does not accurately reflect the contents of the item (example when a title of a file is ‘Presentation 1’ or ‘Document 1’), Enterprise Search detects another candidate for title within the body of the content item, and includes this value with the actual title when calculating relevance. (I think this is really cool!)

Static Ranking contains the following components:

  • Click Distance is the number of links between a content item and an "expert" page linking to the content item. The more links that the crawler must travel from an authoritative page to the content item, the lower the relevance score. If there are multiple paths to a content item, relevance is calculated based on the shortest path.
  • URL Depth refers to how many levels deep within a site the content item is found. The level is determined by reviewing the number of slash ("/") characters in the URL; the greater the number of slash characters in the URL path, the deeper the URL is for that content item. A large URL depth number can lower the relevance of that content.
  • Automatic Language Detection determines the user's language based on "Accept-Language" headers from the browser they are using and content that is retrieved in the user's language is considered more relevant than content in other languages, with the exception of English language content.
  • File Type Biasing - in most search scenarios, certain file types are more relevant than others. For example, HTML pages and Word documents are usually more relevant to a user's search than an Excel spread sheet or a plain text file.

Tutorial (…the exciting part)

What you need:

  • SharePoint Central Administration
  • SharePoint Site with a document library
  • 5 Sample documents (no other content in site)
  • A Search Scope to the particular SharePoint site which you want to search in.
  • Understanding of Manage Metadata Properties (only if you want to search on custom site columns.
  • PowerShell
  • Utility or source code to test our ranking model

Test Harness:

Before we build a new custom ranking model let us first develop a small utility to help us test our ranking model. (You can also export the standard SharePoint Core Search Results Web Part and modify the xml to specify the ranking model to use, but in most cases I prefer to use a test harness instead of making changes to SharePoint until I have finished quality assurance of my custom code).

Create a new Visual Studio console application and add the following code to your Program.cs class:

using System;
using System.Collections.Generic;
using Microsoft.SharePoint;
using System.Diagnostics;
using Microsoft.Office.Server.Search.Query;

namespace SP2010SearchSample
{
  class Program
  {
  static void Main(string[] args)
  {
  SPSite objSite = new SPSite("http://ltp-21:14111"); //Replace with yours
  SPWeb objTargetWeb = objSite.OpenWeb("/");
  ResultType resultType = ResultType.RelevantResults;
  //VERY IMPORTANT TO USE DEFAULTPROPERTIES instead of *
  //Replace 'SPSearchDemoSearchScope' with your Search Scope Name
  string strQuery = "SELECT RANK, Filename, Title, FROM SCOPE() WHERE
  FREETEXT(DefaultProperties, '*demo* ') AND \"SCOPE\" = 'SPSearchDemoSearchScope'       ORDER BY \"Rank\" DESC ";
  FullTextSqlQuery fullTextSqlQuery = new FullTextSqlQuery(objSite);
  fullTextSqlQuery.QueryText = strQuery;
  fullTextSqlQuery.ResultTypes = resultType;
  //This is where we will later specify the custom ranking model to use.
  //fullTextSqlQuery.RankingModelId = "";
  ResultTableCollection resultTableCollection = fullTextSqlQuery.Execute();
  ResultTable resultTable = resultTableCollection[resultType];
  while (resultTable.Read())
  {
  Console.WriteLine("Rank:" + resultTable["RANK"].ToString() + " Title:" + resultTable["TITLE"].ToString());
  }
}
}
}

Please note the following:

  • Your namespace will be different.
  • Replace http://ltp-21:14111 with your site URL
  • Replace 'SPSearchDemoSearchScope' with your Search Scope name.
  • I included a placeholder (//fullTextSqlQuery.RankingModelId = "";) to later specify our custom ranking model to use.

 

Compile the application and run it. If there are no problems you should see the search results returned and ranked using the default SharePoint Ranking model.

You can also refer to: Best Practices: Writing SQL Syntax Queries for Relevant Results in Enterprise Search

clip_image003[1]

Now that we have a test harness to use for testing different ranking models we can proceed to Build and implement a new custom ranking model

We need to following values in order to construct our xml:

Item

Description

How to find value

Name

The name of your custom ranking model

Any value you want to use.

Id

The unique identifier for the ranking model

Use GUIDGen Tool to create a new GUID.

Description

A description of your ranking model

Any text you want to use.

One or more queryDependentFeatures

   

pid

The property ID of a managed property in the search schema.

Run the following PowerShell command to export all the PIDs to a text file:

Get-SPEnterpriseSearchServiceApplication | Get-SPEnterpriseSearchMetadataManagedProperty >> C:\PID.txt

name

The name of a managed property

Run the following PowerShell command to export all the Property names to a text file:

Get-SPEnterpriseSearchServiceApplication | Get-SPEnterpriseSearchMetadataManagedProperty >> C:\PID.txt

weight

The weight setting for a managed property.

Any value between 0 and infinity. If the value is 0 the property is ignored. Normally the value is between 1 and 75.

The xml schema for a ranking model is structured in the following way:

<rankingModel name=”string” id=”GUID” description=”string” xmlns=”http://schemas.microsoft.com/office/2009/rankingModel”>
  <queryDependentFeatures>
    <queryDependentFeature pid=”PID” name=”string” weight=”weightValue” lengthNormalization=”lengthNormalizationSetting” />
  </queryDependentFeatures>
  <queryIndependentFeatures>
    <categoryFeature pid=”PID” default=”defaultValue” name=”string”>
      <category value=”categoryValue” name=”string” weight=”weightValue” />
    </categoryFeature>
    <languageFeature pid=”PID” name=”string” default=”defaultValue” weight=”weightValue” />
    <queryIndependentFeature pid=”PID” name=”string” default=”defaultValue”  weight=”weightValue”>
      <transformRational k=”value” />
      <transformInvRational k=”value” />
      <transformLinear max=”maxValue” />
    </queryIndependentFeature>
  </queryIndependentFeatures>
</rankingModel>

The following link describes the different elements in the XML: http://msdn.microsoft.com/en-us/library/ee558793.aspx

So, in order to create a new ranking model, open a xml editor and use the following example:

<?xml version="1.0" encoding="utf-8"?>
<rankingModel
name="DemoRankingModel"
id="302A9E0E-F8B9-4b21-8180-C327ECCBBA94"
description="Demo Custom Ranking Model"
xmlns="http://schemas.microsoft.com/office/2009/rankingModel">
<queryDependentFeatures>
<queryDependentFeature pid="56" name="Filename" weight="75" lengthNormalization="10" />
<queryDependentFeature pid="2" name="Title" weight="25" lengthNormalization="10" />
</queryDependentFeatures>
</rankingModel>

Next, run the following PowerShell command to upload your new custom ranking model:
Get-SPEnterpriseSearchServiceApplication | New-SPEnterpriseSearchRankingModel –RankingModelXML ‘{your xml pasted as a string}’

Using the new Ranking Model:
There are two ways of using the custom ranking model
1. Search logic implemented through the object model (shown in this tutorial)
2. Extending the Core Results Web Part
http://msdn.microsoft.com/en-us/sp2010devtrainingcourse_extendingsharepointsearchlab_topic4#_Toc280092619

You can use the test harness described in the beginning of this tutorial to test your new ranking model. To do that, uncomment the following line and supply the GUID of your new ranking model:

//This is where we will later specify the custom ranking model to use.
fullTextSqlQuery.RankingModelId = "302A9E0E-F8B9-4b21-8180-C327ECCBBA94";
//remember to replace the ID value with the GUID of your ranking model

You will notice a change in the way your test harness return the results.

The following diagram shows how your code and the SharePoint Foundation will use your custom Ranking Model to provide different results:

clip_image007

Additional PowerShell Commands:

List all of the Managed Properties in SharePoint Search:
Get-SPEnterpriseSearchServiceApplication | Get-SPEnterpriseSearchMetadataManagedProperty >> C:\PID.txt

Add a new ranking model to SharePoint :
Get-SPEnterpriseSearchServiceApplication | New-SPEnterpriseSearchRankingModel –RankingModelXML ‘{Ranking model XML PASTED AS A STRING}’

List the ranking models:
Get-SPEnterpriseSearchServiceApplication|Get-SPEnterpriseSearchRankingModel

Delete a ranking model:
Remove-SPEnterpriseSearchRankingModel -Identity ‘xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx’ -SearchApplication Get-SPEnterpriseSearchServiceApplication

 

I hope that you will make full use of all the SharePoint capabilities like custom ranking models to unlock your full potential to providing world-class SharePoint solutions.

Enjoy !

11 comments:

Veronique Palmer said...

Admittedly I didn't understand half of it, but LOVE the post! I am sending it to all my clients right now. It explains perfectly in techie terms what I have been trying to explain for a long time.

Paul Culmsee said...

This is a well rounded, fantastic post and was great learning for me.

SharePoint Engine said...

I like this concept. I visited your blog for the first time and just been your fan. Keep posting as I am gonna come to read it everyday!!
Sharepoint Staffing

Dave said...

Great article! Can this only be accomplished in Enterprise Search?

Rafferty Uy said...

Hi!

I followed your steps and I noticed that I still get hits from properties which used to be in the original DEFAULTPROPERTIES, but is no longer defined in my custom ranking model. Is this expected?

Seluc said...

Amazing post!!!! Thank you soo much

Birthday Gift service said...

I suggest this site to my friends so it could be useful & informative for them also Great effort.

Perry White said...

Use the list of scopes beside the Search box to adjust the range of your search. Narrowing the scope of a search lets you focus on likely sources for the information that you need.

Gerald Kandulu said...

Nice post, can this approach work as well on SharePoint foundation and could you be interested to assist. We may talk further via Email: geraldkandulu@gmail.com

Tal said...

A colleague and I are going to try this out soon. This post is really great and I understand SharePoint search a little bit better now!

Thanks!

Anonymous said...

Great Post, but whenever I try New-SPEnterpriseSearchRankingModel I get this error:

New-SPEnterpriseSearchRankingModel : The 'http://schemas.microsoft.com/office/2
009/rankingModel:RankingModel' element is not declared.
At line:1 char:78
+ Get-SPEnterpriseSearchServiceApplication | New-SPEnterpriseSearchRankingModel
<<<<
+ CategoryInfo : InvalidData: (Microsoft.Offic...rchRankingModel:
NewSearchRankingModel) [New-SPEnterpriseSearchRankingModel], XmlSchemaVali
dationException
+ FullyQualifiedErrorId : Microsoft.Office.Server.Search.Cmdlet.NewSearchR
ankingModel

Any ideas ?

Post a Comment