문제

I've created a stoplist based on the system's list and I set up my fulltext indexes to use it.

If I run the code select unique_index_id, stoplist_id from sys.fulltext_indexes I can see that all my indexes are using the stoplist with ID 5 which is the one I have created.

When I run the text using the FTS_PARTIAL the result comes correct. example:

SELECT special_term, display_term
FROM sys.dm_fts_parser
(' "Rua José do Patrocinio nº125, Vila América, Santo André - SP" ', 1046, 5, 0)

The words that I added to the stoplist are shown as noise words. But for some reason when I run my query it brings me the register containing the stopwords too.

For example:

SELECT *
FROM tbEndereco
WHERE CONTAINS (*, '"rua*" or "jose*"')

Brings me the register above as I would expect. Since the word 'rua' should be ignored but 'Jose' would be a match.

But if I searched:

SELECT *
FROM tbEndereco
WHERE CONTAINS (*, '"rua*"')

I would expect no register to be found. Since 'rua' is set to be a stopword.

I'm using Brazilian (Portuguese) as the stoplist language. So the word "Rua" (that means "Street") should be ignored (as I added it to the stop list). It is recognized as noise by the parser but when I run my query it brings me registers that contain "Rua".

My search is an address search, so it should ignore the words such as "Street", "Avenue", etc.. (in Portuguese of course and which I added them all as well).

This is the query that I'm using to look up the tables.

select DISTINCT(PES.idPessoa)
, PES.Nome                   
, EN.idEndereco   
, EN.idUF     
, CID.Nome as Cidade  
, EN.Bairro    
, EN.Logradouro  
, EN.Numero   
, EN.Complemento  
, EN.CEP  
, EN.Lat  
, EN.Lng      
from tbPessoa PES  
INNER JOIN tbAdvogado ADV ON PES.idPessoa = ADV.idPessoa  
INNER JOIN tbEndereco EN ON PES.idEmpresa = EN.idEmpresa  
LEFT JOIN tbCidade CID ON CID.idCidade = EN.idCidade 
where adv.Ativo = 1  
and CONTAINS (en.*, '"rua*"')
OR EN.idCidade IN (SELECT idCidade
               FROM tbCidade 
               WHERE CONTAINS (*, '"rua*"'))
OR PES.idPessoa IN (SELECT DISTINCT (ADVC.idPessoa)
                FROM tbComarca C 
                INNER JOIN tbAdvogadoComarca ADVC 
                                    ON ADVC.idComarca = C.idComarca
                WHERE CONTAINS (Nome, '"rua*"'))
OR PES.idPessoa IN (SELECT OAB.idPessoa
                FROM tbAdvogadoOAB OAB
                WHERE CONTAINS (NROAB, '"rua*"'))

I tried both FREETEXT and CONTAINS. Using something simpler like WHERE CONTAINS (NROAB, 'rua')) but it also brought me the registers containing "Rua".

I thought my query could have some problem then I tried a simpler query and it also brought me the stop-word "Rua".

SELECT *
FROM tbEndereco
WHERE CONTAINS (*, 'rua')

One thing I noticed is that the words that were native from the system stoplist work just fine. For example, if I try the word "do" (which means "of") it does not bring me any registers.

Example:

SELECT *
FROM tbEndereco
WHERE CONTAINS (*, '"do*"')

I tried to run the command "Start full population" through SSMS in all tables to check whether that was the problem and got nothing.

What am I missing here. This is the first time I work with Fulltext indexes and I may be missing some point setting it up.

Thank you in advance for your support.

Regards,

Cesar.

도움이 되었습니까?

해결책

You have changed your question so I will change my answer and try to explain it a little better.

According to Stopwords and Stoplists:

A stopword can be a word with meaning in a specific language, or it can be a token that does not have linguistic meaning. For example, in the English language, words such as "a," "and," "is," and "the" are left out of the full-text index since they are known to be useless to a search.

Although it ignores the inclusion of stopwords, the full-text index does take into account their position. For example, consider the phrase, "Instructions are applicable to these Adventure Works Cycles models". The following table depicts the position of the words in the phrase:

I am not sure why, but I think it only applies when using a phrasal search like:

If you have a line like this:

Teste anything casa

And you query the fulltext as:

SELECT *
FROM Address
WHERE CONTAINS (*, '"teste rua casa"')

The line:

Teste anything casa

Will be returned. In that case, the fulltext will translate your query as something like this:

"Search for 'teste' near any word near 'casa'"

When you query the fulltext using the "or" operator or only search for one word the rule does not apply. I have tested it several times for about 3 months and I never understood why.

EDIT

if you have the line

"Rua José do Patrocinio nº125" 

and you query the fulltext

"WHERE CONTAINS (, '"RUA" or "Jose*" or "do*"')" 

it will bring the line because it DOES contains at least one of the words you are searching for and not because the word "rua" and "do" are being ignored.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top