Question

i have a table with tsvector type column (col).

I fill some rows via script and use cast data::tsvector, but (!) some rows was filled with to_tsvector(data), so i have rows with different values like this:

  1. 'в' 'г' 'зея' 'иеговы' 'местная' 'организация' 'религиозная' 'свидетелей'

  2. 'г':7 'зе':8 'иегов':5 'местн':1 'организац':3 'религиозн':2 'свидетел':4

It was surprise for me :) So what the overhead for values not prepaired with to_tsvector(data) ? Will pg do to_tsvector on runtime or what?

Was it helpful?

Solution

The difference between the string->tsvector cast and the function-constructor to_tsvector(text)

  1. the cast (::) assumes the input is already a tsvector that's been casted to text. It does not use the language-specific stubbing mechanisms or store positional information, it simply assumes it is given a space-delimited text-search tokens (positions and lexemes) and assumes they've already undergone the proceess.
  2. to_tsvector() takes string input and does all the requisite work to tokenize it, including normalizing and stubbing.

You should probably avoid the cast for everything but text representations of tsvector.

Examples

Here we read the string 'test testing tested' as three tsvector lexemes with undefined positions, notice the output lacks stubbing

test=# select 'test testing tested'::tsvector;
         tsvector          
---------------------------
 'test' 'tested' 'testing'
(1 row)

Here we "convert" a string of words to lexemes with positions by sending the string through the process specified in your specific TEXT SEARCH CONFIGURATION, in my case for english.

test=# SELECT to_tsvector('test testing tested');
 to_tsvector  
--------------
 'test':1,2,3
(1 row)

Here we explicitly tell PostgreSQL to use the simple text search configuration.

test=# SELECT to_tsvector('simple', 'test testing tested');
           to_tsvector           
---------------------------------
 'test':1 'tested':3 'testing':2
(1 row)

So what is the cast used for? You could however use it for something like this

SELECT $$'test':1 'tested':3 'testing':2$$::tsvector @@ 'testing';

Here is another case demonstrating what it does.. (returns true)

SELECT to_tsvector(test) = to_tsvector(test)::text::tsvector
FROM ( VALUES
  ( 'This is my test. Hello from testing test tested' )
) AS t(test);

If you're getting the tsvector over the wire, or serializing it as text in the case of pg_dump, that can be useful.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top