Question

I am using Postgresql with full test search with english dict. When I want to receive records with some english words I get verid results.

And so:

SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('it')

returns 0 results.

SELECT id FROM table1 WHERE ts_vector1 @@ to_tsquery('specialist & it')

returns more than 0 results (word 'it' exists in table and index). ts_vector1 is created as follow:

ts_vector1 = to_tsvector('english', some_text_column)

Is 'it' a reserved word? If so, what is the best way to 'escape' reserved words?

Was it helpful?

Solution

'It' is ignored as a stop word, per the relevant docs:

http://www.postgresql.org/docs/current/static/textsearch-controls.html

In the example above we see that the resulting tsvector does not contain the words a, on, or it, the word rats became rat, and the punctuation sign - was ignored.

You can change the list of stop words by configuring the needed dictionaries:

http://www.postgresql.org/docs/current/static/textsearch-dictionaries.html

OTHER TIPS

Ok so 2013 is a while ago but the problem is still valid. You want to remove 'it' because it is noise, but keep the 'IT' word. Usually 'it' for information technology is written as 'IT'.

Before feeding full-text search via to_tsvector:

  1. Tokenize your text

  2. Replace "IT" word by "information technology"

Before doing a search using to_tsquery:

  1. Tokenize search query text

  2. Replace "IT" word by "information technology"

You no longer have a conflict between english "it" and "IT", this should work in most cases. Maybe you could also attempt to detect the context using other keywords before doing this.

Doing this entirely in the database is probably possible, but in most applications this could be done via your main server/program general purpose language.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top