How to retrieve the difference of two tsvectors in postgres?

https://stackoverflow.com/questions/23240166

08-07-2023
|

Question

I have two varchars fields, I would like to get an array of words, that are present in one of them, and not present in another, i.e. :

old_text := to_tsvector("The quick brown fox jumps over the lazy dog")
new_text := to_tsvector("The slow brown fox jumps over the quick dog at Friday")
-> new words: ARRAY["slow", "at", "Friday"] ( the order of words doesn't matter )

I tried fiddling around with ts_vectors, but no luck.. Any other functionality in postgres, that supports something like this?

Solution

If you really want to involve text search, have a look at ts_parse().

SELECT token
FROM ts_parse('default', 'The slow brown fox jumps over the quick dog at Friday')
WHERE tokid != 12 -- blank
EXCEPT
SELECT token
FROM ts_parse('default', 'The quick brown fox jumps over the lazy dog')
WHERE tokid != 12 -- blank

-- will give you

"token"
--------
'slow'
'at'
'Friday'

Or, you can use regular expressions for that:

SELECT *
FROM regexp_split_to_table('The slow brown fox jumps over the quick dog at Friday', '\s+')
EXCEPT
SELECT *
FROM regexp_split_to_table('The quick brown fox jumps over the lazy dog', '\s+')

At the end, use array_agg() to accumulate the results into an array, if necessary.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow