Question

I stumbled accross an issue with full text search in columns which may contains domain names on a Microsoft SQL Server 2012.

A table cell containing example.com is added to the full text catalog by the term example and com. The latter makes it impossible to search for a domain name, as any entry containing that tld will be found.

What do you need to do in order to prevent urls from being broken?

EDIT: The example query would be:

SELECT * FROM Test WHERE FREETEXT(test, 'example.com')

The Test table only contains two rows, containing example.comand differenturl.com, both are returned as a result. The Test database was created for this example.

Était-ce utile?

La solution

Use CONTAINS instead of FREETEXT.

SELECT * FROM Test WHERE CONTAINS(test, 'example.com')

FREETEXT will treat example.com as if it is example OR com which explains why your FREETEXT query for example.com matches url.com. It will also match inflectional forms (examples, exampling...) and synonyms which would likely cause other problems for you.

If you still need the flexibility of FREETEXT for your other search terms you could use both functions:

SELECT * FROM Test WHERE CONTAINS(test, 'example.com') and FREETEXT(test, 'some other text')

Autres conseils

This:

SELECT TOP 1000 
[Domain1]
FROM [TESTIT].[dbo].[DomainTest] where Domain1 like '%example%com';
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top