Pregunta

I stumbled accross an issue with full text search in columns which may contains domain names on a Microsoft SQL Server 2012.

A table cell containing example.com is added to the full text catalog by the term example and com. The latter makes it impossible to search for a domain name, as any entry containing that tld will be found.

What do you need to do in order to prevent urls from being broken?

EDIT: The example query would be:

SELECT * FROM Test WHERE FREETEXT(test, 'example.com')

The Test table only contains two rows, containing example.comand differenturl.com, both are returned as a result. The Test database was created for this example.

¿Fue útil?

Solución

Use CONTAINS instead of FREETEXT.

SELECT * FROM Test WHERE CONTAINS(test, 'example.com')

FREETEXT will treat example.com as if it is example OR com which explains why your FREETEXT query for example.com matches url.com. It will also match inflectional forms (examples, exampling...) and synonyms which would likely cause other problems for you.

If you still need the flexibility of FREETEXT for your other search terms you could use both functions:

SELECT * FROM Test WHERE CONTAINS(test, 'example.com') and FREETEXT(test, 'some other text')

Otros consejos

This:

SELECT TOP 1000 
[Domain1]
FROM [TESTIT].[dbo].[DomainTest] where Domain1 like '%example%com';
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top