Question

Is it possible to get the list of Full Text Search noise/stop words from SQL Server 2005 by querying the database?

I am aware that the noise words are in a text file ~/FTData/noiseEng.txt but this file is not accessible to our application.

I've look at the sys.fulltext_* tables but these don't seem to have the words.

Was it helpful?

Solution

It appears that this is not possible in SQL 2005 but is in SQL Server 2008.

Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / StopLists

This next query gets a list of all of the stopwords that ship with SQL Server 2008. This is a nice improvement, you can not do this in SQL Server 2005.

Stopwords and Stoplists - SQL Server 2008

SQL Server 2005 noise words have been replaced by stopwords. When a database is upgraded to SQL Server 2008 from a previous release, the noise-word files are no longer used in SQL Server 2008. However, the noise-word files are stored in the FTDATA\ FTNoiseThesaurusBak folder, and you can use them later when updating or building the corresponding SQL Server 2008 stoplists. For information about upgrading noise-word files to stoplists, see Full-Text Search Upgrade.

OTHER TIPS

I just copy the noise words file from \Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData into my app, and use it to strip noise words.

    Public Function StripNoiseWords(ByVal s As String) As String
        Dim NoiseWords As String = ReadFile("/Standard/Core/Config/noiseENU.txt").Trim
        Dim NoiseWordsRegex As String = Regex.Replace(NoiseWords, "\s+", "|") ' about|after|all|also etc.
        NoiseWordsRegex = String.Format("\s?\b(?:{0})\b\s?", NoiseWordsRegex)
        Dim Result As String = Regex.Replace(s, NoiseWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each noise word with a space
        Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces
        Return Result
    End Function
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top