Domanda

We have table with a Body (NVARCHAR(MAX)) column that contains text from emails and files. The column is full-text indexed.

Some of the documents contain reference numbers such as 00123. However the full-text engine seems to strip leading zeros so when we search using CONTAINS(Body, '00123') it also returns false positives containing just 123.

Is there anyway to fix this? Ideally there would be a way to address this in the query, but we would also consider other options such as alternative word breakers etc.

We are using SQL Server 2008 R2 and later.

È stato utile?

Soluzione

According to SS 2012's Behavior Changes to Full-Text Search page, the previous version of the word breakers, when given the term 022, produced 022 and nn022, but the new version produces 022 and nn22. So SQL Server 2008 R2 will produce the desired result when searching for numbers with leading zeros but SQL Server 2012 will not. (This assumes the columns to be full-text indexed are using English as their language for word breaking).

There are a couple of ways to achieve the desired outcome on SQL Server 2012. You can either revert to the previous word breakers or, if you have a limited number of terms that you are a looking for, consider using a custom dictionary.

Custom dictionaries are described in Creating Custom Dictionaries for special terms to be indexed 'as-is' in SQL Server 2008 Full-Text Indexes and Customize the Behavior of Word Breakers with a Custom Dictionary. Note: The first article says that the language hex code for English is 1033, but 1033 is the LCID for English. The language hex code for English is 0009. So for an English dictionary the filename should be Custom0009.lex.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top