Question

I stumbled accross an issue with full text search in columns which may contains domain names on a Microsoft SQL Server 2012.

A table cell containing example.com is added to the full text catalog by the term example and com. The latter makes it impossible to search for a domain name, as any entry containing that tld will be found.

What do you need to do in order to prevent urls from being broken?

EDIT: The example query would be:

SELECT * FROM Test WHERE FREETEXT(test, 'example.com')

The Test table only contains two rows, containing example.comand differenturl.com, both are returned as a result. The Test database was created for this example.

Was it helpful?

Solution

Use CONTAINS instead of FREETEXT.

SELECT * FROM Test WHERE CONTAINS(test, 'example.com')

FREETEXT will treat example.com as if it is example OR com which explains why your FREETEXT query for example.com matches url.com. It will also match inflectional forms (examples, exampling...) and synonyms which would likely cause other problems for you.

If you still need the flexibility of FREETEXT for your other search terms you could use both functions:

SELECT * FROM Test WHERE CONTAINS(test, 'example.com') and FREETEXT(test, 'some other text')

OTHER TIPS

This:

SELECT TOP 1000 
[Domain1]
FROM [TESTIT].[dbo].[DomainTest] where Domain1 like '%example%com';
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top