Question

I am using full text searching on a SQL Server database to return results from multiple tables. The simplest situation would be searching a persons name fields and a description field. The code I use to do this looks like:

select t.ProjectID as ProjectID, sum(t.rnk) as weightRank
from
(
 select KEY_TBL.RANK * 1.0 as rnk, FT_TBL.ProjectID as ProjectID
 FROM Projects as FT_TBL 
 INNER JOIN FREETEXTTABLE(Projects, Description,  @SearchText) AS KEY_TBL 
   ON FT_TBL.ProjectID=KEY_TBL.[KEY]

     union all

 select KEY_TBL.RANK * 50 as rnk, FT_TBL.ProjectID as ProjectID
 FROM Projects as FT_TBL 
 ... <-- complex unimportant join
     INNER JOIN People as p on pp.PersonID = p.PersonID
 INNER JOIN FREETEXTTABLE(People, (FirstName, LastName), @SearchText) AS KEY_TBL 
    ON p.PersonID=KEY_TBL.[KEY]
 )
group by ProjectID

As is (hopefully) clear above, I am trying to weight heavily on matches of a person's name over matches in a project description field. If I do a search for something like 'john' all projects with a person named john on it will be heavily weighted (as expected). The issue I am having is on searches where someone provides a full name like 'john smith'. In this case the match is much less strong on name as (I presume) only half the search terms are matching in each of the firstname / lastname columns. In many cases this means someone with an exact match of the name entered will not necessarily be returned near the top of the search results.

I have been able to correct this by searching each of the firstname / lastname fields separately and adding their scores together so my new query looks like:

select t.ProjectID as ProjectID, sum(t.rnk) as weightRank
from
(
 select KEY_TBL.RANK * 1.0 as rnk, FT_TBL.ProjectID as ProjectID
 FROM Projects as FT_TBL 
 INNER JOIN FREETEXTTABLE(Projects, Description,  @SearchText) AS KEY_TBL 
   ON FT_TBL.ProjectID=KEY_TBL.[KEY]

     union all

 select KEY_TBL.RANK * 50 as rnk, FT_TBL.ProjectID as ProjectID
 FROM Projects as FT_TBL 
 ... <-- complex unimportant join
     INNER JOIN People as p on pp.PersonID = p.PersonID
 INNER JOIN FREETEXTTABLE(People, (FirstName), @SearchText) AS KEY_TBL 
    ON p.PersonID=KEY_TBL.[KEY]

     union all

 select KEY_TBL.RANK * 50 as rnk, FT_TBL.ProjectID as ProjectID
 FROM Projects as FT_TBL 
 ... <-- complex unimportant join
     INNER JOIN People as p on pp.PersonID = p.PersonID
 INNER JOIN FREETEXTTABLE(People, (LastName), @SearchText) AS KEY_TBL 
    ON p.PersonID=KEY_TBL.[KEY]
 )
group by ProjectID

My question:

Is this the approach I should be taking, or is there some way to have the full text searching operate on a list of columns as though it were a blob of text: i.e. treat the firstname and lastname columns as a single name column, resulting in a higher scoring match for strings including both the persons first and last name?

Was it helpful?

Solution

I have recently run into this and have used a computed column to concatenate the required columns together into one string and then have the full text index on that column.

I have achieved the weighting by duplicating the weighted fields in the computed column.

i.e. last name appears 3 times and first name once.

ALTER TABLE dbo.person ADD
PrimarySearchColumn AS 
COALESCE(NULLIF(forename,'') + ' ' + forename + ' ', '') +
COALESCE(NULLIF(surname,'')  + ' ' + surname  + ' ' + surname  + ' ', '') PERSISTED

You must make sure you use the persisted keyword so that the column isnt computed on each read.

OTHER TIPS

I know this is an old question but I've come across the same issue and solved it a different way.

Rather than add computed columns to the original tables, which may not always be an option, I have created indexed views which contain the combined fields. To use the original example:

CREATE VIEW [dbo].[v_PeopleFullName]
WITH SCHEMABINDING 
AS SELECT dbo.People.PersonID, ISNULL(dbo.People.FirstName + ' ', '') + dbo.People.LastName AS FullName
FROM dbo.People
GO

CREATE UNIQUE CLUSTERED INDEX UQ_v_PeopleFullName
ON dbo.[v_PeopleFullName] ([PersonID])
GO

Then I join that view in my query, along with the existing full-text predicate on the individual columns in the base table, so that I can find exact matches and partial matches in the individual columns, like so:

DECLARE @SearchText NVARCHAR(100) = ' "' + @OriginalSearchText + '" ' --For matching exact phrase
DECLARE @SearchTextWords NVARCHAR(100) = ' "' + REPLACE(@OriginalSearchText, ' ', '" OR "') + '" ' --For matching on words in phrase

SELECT FT_TBL.ProjectID as ProjectID, 
    ISNULL(KEY_TBL.[Rank], 0) + ISNULL(KEY_VIEW.[Rank], 0) AS [Rank]
FROM Projects as FT_TBL 
INNER JOIN People as p on FT_TBL.PersonID = p.PersonID
LEFT OUTER JOIN CONTAINSTABLE(People, (FirstName, LastName), @SearchTextWords) AS KEY_TBL ON p.PersonID = KEY_TBL.[KEY] INNER JOIN
LEFT OUTER JOIN CONTAINSTABLE(v_PeopleFullName, FullName, @SearchText) AS KEY_VIEW ON p.PersonID = KEY_VIEW.[Key]
WHERE ISNULL(KEY_TBL.[Rank], 0) + ISNULL(KEY_VIEW.[Rank], 0) > 0
ORDER BY [Rank] DESC

Some notes on this:

  • I'm using CONTAINSTABLE rather than FREETEXTTABLE as it seems more appropriate to me for searching names. I'm not interested in finding words with similar meaning or inflections of words when it's names that I'm searching on.
  • Because I'm using CONTAINSTABLE I'm having to do some pre-processing on the @SearchText variable to make it compatible and to break it down into individual words with the OR operator for searching on the base table's full-text index.
  • Rather than using a UNION query to join separate queries each using a single, joined CONTAINSTABLE I'm joining on both CONTAINSTABLE predicates in the same query. This means using outer joins rather than inner joins, so I'm then using a WHERE clause to exclude any records from the base table which don't match on either full-text index. I confess that I haven't made any examination of how this performs compared to separate queries each with a single full-text index predicate UNIONised to produce a single result set.
  • Although there's no guarantee that the Rank of matches on the full search text in the indexed view will be higher than that of matches on individual words in the full-text index on the base table's columns because the Rank value is arbitrary, my testing so far has shown that in practice it always is (so far!).
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top