Question

I'm developing a simple articles website in brazilian portuguese language. The search feature is based on a full-text search, but it isn't returning expected results.

I made this on postgresql. Here is the simplified table:

Artigos
-id
-title -- article title
-intro -- article introduction
-content -- article body
-publishdate -- date of launch
-artigosts -- this will work as our fts index.

After creating the table, I ran:

UPDATE artigos SET artigosts = 
setweight(to_tsvector('pg_catalog.portuguese', coalesce(title,'')), 'A') || 
setweight(to_tsvector('pg_catalog.portuguese', coalesce(intro,'')), 'B') ||
setweight(to_tsvector('pg_catalog.portuguese', coalesce(content,'')), 'C');

CREATE INDEX artigosts_idx ON artigos USING gist (artigosts);

CREATE TRIGGER artigosts_tg 
BEFORE INSERT OR UPDATE ON artigos 
FOR EACH ROW EXECUTE PROCEDURE 
  tsvector_update_trigger('artigosts', 'pg_catalog.portuguese', 'title', 'intro', 'content');

Yes, I intend to use simple weightning on the searches. Made a index to speed up, a trigger, so I can insert and update without worry about remaking index and so on.

Well, per my understanding, it is everything fine. But results aren't. A simple example.

Let's say I have "... banco de dados ... no banco ... " as one article content. When I do:

SELECT title, intro, content FROM artigos WHERE plainto_tsquery('banco de dados') @@ artigosts;

It returns an empty set. I checked the ts_vector column and saw the predicates 'banc' and 'dad'. But I still can't understand why it does not return the row containing the mentioned article.

Can someone bring a light into this question?

Was it helpful?

Solution

The reason for this is probably because your default dictionary is set to english. Try the following queries to determine if that is actually the case.

SELECT * FROM ts_debug('banco de dados');

This query will show how the dictionary parses up your search phrase. It should provide the lexemes "banco", "de", and "dado". So what you're actually searching for will not exist in the index and you will receive 0 results.

Now try this:

SELECT * FROM ts_debug('portuguese', 'banco de dados');

It should return the lexemes that exist in the index, "banc" and "dad". If that is the case then you can simply change your search query to get the appropriate result.

SELECT title, intro, content FROM artigos WHERE plainto_tsquery('portuguese', 'banco de dados') @@ artigosts;

OTHER TIPS

It's probably to late for you. And I don't have enough reputations to add a simple comment...

With regard to the special characters I unaccent them before transforming to tsvector.

SELECT to_tsvector('simple',unaccent('banco de dados áéóíúçãõ'))

So I get:

"'aeoiucao':4 'banco':1 'dados':3 'de':2"

You need to:

CREATE EXTENSION unaccent;

... as user postgres. And sure, you must unaccent your tsquery also

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top