How to setup a Full Text Search query in PostgreSQL

https://stackoverflow.com/questions/22732507

23-06-2023
|

Domanda

I am really new to PostgreSQL and have some problems implementing full text search. I am currently using the following setup:

CREATE DATABASE test;

CREATE TABLE data_table (
   id BIGSERIAL PRIMARY KEY,
   name VARCHAR(160) NOT NULL,
   description VARCHAR NOT NULL
);

CREATE INDEX data_table_idx ON data_table 
USING gin(to_tsvector('English', name || ' ' || description)); 

INSERT INTO data_table (name, description) VALUES 
    ('Penguin', 'This is the Linux penguin.'), 
    ('Gnu', 'This is the GNU gnu.'), 
    ('Elephant', 'This is the PHP elephant.'), 
    ('Elephant', 'This is the postgres elephant.'), 
    ('Duck', 'This is the duckduckgo duck.'), 
    ('Cat', 'This is the GitHub cat.'), 
    ('Bird', 'This is the Twitter bird.'), 
    ('Lion', 'This is the Leo lion.');

Now I try to search the table for a given user input and return the whole data row ant the highlighted matches, which should look somehow like the following:

WITH 
    q AS ( SELECT plainto_tsquery('English', 'elephants php') AS query ),
    d AS ( SELECT (name || ' ' || description) AS document FROM data_table ),
    t AS ( SELECT to_tsvector('English', d.document) AS textsearch FROM d ),
    r AS ( SELECT ts_rank_cd(t.textsearch, q.query) AS rank FROM t, q )
SELECT data_table.*, ts_headline('german', d.document, q.query) AS matches
FROM data_table, q, d, t , r
WHERE q.query @@ t.textsearch 
ORDER BY r.rank DESC 
LIMIT 10;

Which leaves me with the following output:

 id |   name   |          description           |              matches               
----+----------+--------------------------------+------------------------------------
  5 | duck     | This is the duckduckgo duck.   | Penguin This is the Linux penguin.
  2 | Gnu      | This is the GNU gnu.           | Gnu This is the GNU gnu.
  3 | Elephant | This is the PHP elephant.      | Penguin This is the Linux penguin.
  4 | elephant | This is the postgres elephant. | Penguin This is the Linux penguin.
  6 | Cat      | This is the GitHub cat.        | Penguin This is the Linux penguin.
  1 | Penguin  | This is the Linux penguin.     | Gnu This is the GNU gnu.
  1 | Penguin  | This is the Linux penguin.     | Penguin This is the Linux penguin.
  2 | Gnu      | This is the GNU gnu.           | Penguin This is the Linux penguin.
  4 | elephant | This is the postgres elephant. | Gnu This is the GNU gnu.
  3 | Elephant | This is the PHP elephant.      | Gnu This is the GNU gnu.
(10 rows)

So the query, does return something, but it is not sorted by rank, each document is combined with each combination of name/description and the only thing that works is the correct highlighting of the search results in the document. So what I am doing wrong and how can I fix it?

Soluzione

Finally I was able to get this working. Please find my solution below. I hope this will help someone. If someone knows a better solution, with better/faster indexing, I would be happy to know.

Query:

WITH 
    q AS ( SELECT to_tsquery('german', 'elephant | php') AS query ),
    d AS ( SELECT id, (name || ' ' || description) AS doc FROM data_table ),
    t AS ( SELECT id, doc, to_tsvector('german', doc) AS vector FROM d ),
    r AS ( 
        SELECT id, doc, ts_rank_cd(vector, query) AS rank 
        FROM t, q
        WHERE q.query @@ vector
        ORDER BY rank DESC 
    )
SELECT id, ts_headline('german', doc, q.query) AS matches, rank
FROM r, q
ORDER BY r;

Result:

 id |                         matches                         | rank 
----+---------------------------------------------------------+------
  3 | <b>Elephant</b> This is the <b>PHP</b> <b>elephant</b>. |  0.3
  4 | <b>elephant</b> This is the postgres <b>elephant</b>.   |  0.2

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow