質問

For the sake of this question I have two tables :

CREATE TABLE room (
    id serial primary key,
    lang varchar(12) NOT NULL default 'english'
);

CREATE TABLE message (
    id bigserial primary key,
    room integer references room(id),
    content text NOT NULL
);

and I want to have a full text search with a language dependant tokenization : the tokenization of messages and their search must depend on the language of the room.

The totally unoptimized and unindexed search would be like this :

select message.id, content, lang from message, room
where message.room=room.id
and to_tsvector(lang::regconfig, content)
   @@ plainto_tsquery(lang::regconfig,'what I search')
and room=33;

A search query is always done in one unique room (so the language is homogeneous).

Now my question is how to do this efficiently ? I can't directly build an expression index as the expressions used in indexes must be "immutable" (rely only on the indexed row).

Is the creation of a new column containing to_tsvector(lang::regconfig, content) (and maintained with a trigger) the only reasonable solution if I want to have an index ?

Is that the most efficient ?

役に立ちましたか?

解決

If you know that the association between language and room does not change, you can feed this information to Postgres by way of an IMMUTABLE function.

CREATE OR REPLACE FUNCTION room_lang(int)
RETURNS varchar(12) AS
$$
   SELECT lang FROM room WHERE id = $1
$$ LANGUAGE sql IMMUTABLE;

And use this for partial indexes:

CREATE INDEX idx_en ON message ...
WHERE room_lang(room) = 'english';

CREATE INDEX idx_es ON message ...
WHERE room_lang(room) = 'spanish';

Of course, you have to recreate any such index, if you change anything in room that breaks the promise of "immutability", thereby breaking the index ...

Use a compatible WHERE clause for your queries to let Postgres know it can use the index:

SELECT ...
WHERE room_lang(room) = 'english';

Here is a related example for indexes with an IMMUTABLE function with a lot more details:
Does PostgreSQL support "accent insensitive" collations?

Aside: I'd rather use just text instead of varchar(12).

他のヒント

In MS SQL we got Full Text Search but I don't know if Postgres got anything similar. In the case your RDBMS don't offer a solution I created one years ago. At the time we cannot activate FTS in the shared server my client rent. So I created a full customized solution.

I wrote a article with the solution at: SQL Server Central

(Obs:. you ill need do create a free account in order to see the article)

The solution was writen for MS Sql but I bet it's easily portable to Postgres.

Also posted a example at: SQL Fiddle

I hope you don't need to write a full solution like I did and hope if you need it that article can easy your pain.

Note, the final solution worked like a charm (in production) but ended a bit more sophisticated.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top