Question

For the sake of this question I have two tables :

CREATE TABLE room (
    id serial primary key,
    lang varchar(12) NOT NULL default 'english'
);

CREATE TABLE message (
    id bigserial primary key,
    room integer references room(id),
    content text NOT NULL
);

and I want to have a full text search with a language dependant tokenization : the tokenization of messages and their search must depend on the language of the room.

The totally unoptimized and unindexed search would be like this :

select message.id, content, lang from message, room
where message.room=room.id
and to_tsvector(lang::regconfig, content)
   @@ plainto_tsquery(lang::regconfig,'what I search')
and room=33;

A search query is always done in one unique room (so the language is homogeneous).

Now my question is how to do this efficiently ? I can't directly build an expression index as the expressions used in indexes must be "immutable" (rely only on the indexed row).

Is the creation of a new column containing to_tsvector(lang::regconfig, content) (and maintained with a trigger) the only reasonable solution if I want to have an index ?

Is that the most efficient ?

Was it helpful?

Solution

If you know that the association between language and room does not change, you can feed this information to Postgres by way of an IMMUTABLE function.

CREATE OR REPLACE FUNCTION room_lang(int)
RETURNS varchar(12) AS
$$
   SELECT lang FROM room WHERE id = $1
$$ LANGUAGE sql IMMUTABLE;

And use this for partial indexes:

CREATE INDEX idx_en ON message ...
WHERE room_lang(room) = 'english';

CREATE INDEX idx_es ON message ...
WHERE room_lang(room) = 'spanish';

Of course, you have to recreate any such index, if you change anything in room that breaks the promise of "immutability", thereby breaking the index ...

Use a compatible WHERE clause for your queries to let Postgres know it can use the index:

SELECT ...
WHERE room_lang(room) = 'english';

Here is a related example for indexes with an IMMUTABLE function with a lot more details:
Does PostgreSQL support "accent insensitive" collations?

Aside: I'd rather use just text instead of varchar(12).

OTHER TIPS

In MS SQL we got Full Text Search but I don't know if Postgres got anything similar. In the case your RDBMS don't offer a solution I created one years ago. At the time we cannot activate FTS in the shared server my client rent. So I created a full customized solution.

I wrote a article with the solution at: SQL Server Central

(Obs:. you ill need do create a free account in order to see the article)

The solution was writen for MS Sql but I bet it's easily portable to Postgres.

Also posted a example at: SQL Fiddle

I hope you don't need to write a full solution like I did and hope if you need it that article can easy your pain.

Note, the final solution worked like a charm (in production) but ended a bit more sophisticated.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top