Query SQl Server 2005 Full Text Search noise/stop words
-
13-09-2019 - |
Question
Is it possible to get the list of Full Text Search noise/stop words from SQL Server 2005 by querying the database?
I am aware that the noise words are in a text file ~/FTData/noiseEng.txt but this file is not accessible to our application.
I've look at the sys.fulltext_* tables but these don't seem to have the words.
Solution
It appears that this is not possible in SQL 2005 but is in SQL Server 2008.
Advanced Queries for Using SQL Server 2008 Full Text Search StopWords / StopLists
This next query gets a list of all of the stopwords that ship with SQL Server 2008. This is a nice improvement, you can not do this in SQL Server 2005.
Stopwords and Stoplists - SQL Server 2008
SQL Server 2005 noise words have been replaced by stopwords. When a database is upgraded to SQL Server 2008 from a previous release, the noise-word files are no longer used in SQL Server 2008. However, the noise-word files are stored in the FTDATA\ FTNoiseThesaurusBak folder, and you can use them later when updating or building the corresponding SQL Server 2008 stoplists. For information about upgrading noise-word files to stoplists, see Full-Text Search Upgrade.
OTHER TIPS
I just copy the noise words file from \Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\FTData
into my app, and use it to strip noise words.
Public Function StripNoiseWords(ByVal s As String) As String
Dim NoiseWords As String = ReadFile("/Standard/Core/Config/noiseENU.txt").Trim
Dim NoiseWordsRegex As String = Regex.Replace(NoiseWords, "\s+", "|") ' about|after|all|also etc.
NoiseWordsRegex = String.Format("\s?\b(?:{0})\b\s?", NoiseWordsRegex)
Dim Result As String = Regex.Replace(s, NoiseWordsRegex, " ", RegexOptions.IgnoreCase) ' replace each noise word with a space
Result = Regex.Replace(Result, "\s+", " ") ' eliminate any multiple spaces
Return Result
End Function