سؤال

I have the following PostgreSQL function which checks if p_text1 contains the word/phrase p_text2 within it:

CREATE OR REPLACE FUNCTION public."Contains"(
p_text1 character varying,
p_text2 character varying)
RETURNS boolean
LANGUAGE 'plpgsql'

COST 100
IMMUTABLE 
AS $BODY$

BEGIN

 perform 1 
 where   ( p_text1 similar to '((% )|(%-)|(%\())?'||p_text2||'(( %)|(-%)|(\)%))?' ) or 
    ( replace(p_text1,'-',' ') similar to '((% )|(%\())?'||replace(p_text2,'-',' ')||'(( %)|(\)%))?' ) or 
    ( replace(p_text1,'-','') similar to '((% )|(%\())?'||replace(p_text2,'-','')||'(( %)|(\)%))?' );      

 return found;

END; 
$BODY$;

p_text2 is considered a word/phrase if it is preceded/followed by a dash, space, parenthesis or nothing.

Examples:

select public."Contains"('data mining' , 'mining') --> true

select public."Contains"('information retrieval (ir) system' , 'ir') --> true

select public."Contains"('semantic (information retrieval)' , 'semantic information') --> false

select public."Contains"('ontology-based queries' , 'ontology') --> true

select public."Contains"('ontology-based queries' , 'ontology based') --> true

The function will be called like this:

select * from my_table
where public."Contains"( text_column , some_text_variable) = true;

Given that my_table contains about 15,000 rows.

I read many advises to avoid using SIMILAR TO and to replace it with simple LIKE for performance issues... I just don't know how to rewrite such a query with LIKE, and I don't know if that would yield better performance. Any help is appreciated. Thanks in advance

هل كانت مفيدة؟

المحلول

LIKE is much less flexible than SIMILAR TO. In general there is going to be no formulaic way to downgrade from SIMILAR TO to LIKE. Based on your example, it might take like 34 (4*4 + 3*3 + 3*3) different ORed together LIKE to accomplish this. For example, your first expression in the OR branch gets to pair 3 alternatives plus the empty possibility (so 4 total) at the beginning with another 4 at the end, so that would expand to 16.

I read many advises to avoid using SIMILAR TO and to replace it with simple LIKE for performance issues

This advice would only apply if there is a simple like which does the same job. But in general, tuning should be based on evidence, not on rumors. So to start with, do you actually have a performance problem?

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى dba.stackexchange
scroll top