How to replace (SIMILAR TO + regular expression) with LIKE or ~ in PostgreSQL?
سؤال
I have the following PostgreSQL function which checks if p_text1
contains the word/phrase p_text2
within it:
CREATE OR REPLACE FUNCTION public."Contains"(
p_text1 character varying,
p_text2 character varying)
RETURNS boolean
LANGUAGE 'plpgsql'
COST 100
IMMUTABLE
AS $BODY$
BEGIN
perform 1
where ( p_text1 similar to '((% )|(%-)|(%\())?'||p_text2||'(( %)|(-%)|(\)%))?' ) or
( replace(p_text1,'-',' ') similar to '((% )|(%\())?'||replace(p_text2,'-',' ')||'(( %)|(\)%))?' ) or
( replace(p_text1,'-','') similar to '((% )|(%\())?'||replace(p_text2,'-','')||'(( %)|(\)%))?' );
return found;
END;
$BODY$;
p_text2
is considered a word/phrase if it is preceded/followed by a dash, space, parenthesis or nothing.
Examples:
select public."Contains"('data mining' , 'mining') --> true
select public."Contains"('information retrieval (ir) system' , 'ir') --> true
select public."Contains"('semantic (information retrieval)' , 'semantic information') --> false
select public."Contains"('ontology-based queries' , 'ontology') --> true
select public."Contains"('ontology-based queries' , 'ontology based') --> true
The function will be called like this:
select * from my_table
where public."Contains"( text_column , some_text_variable) = true;
Given that my_table
contains about 15,000 rows.
I read many advises to avoid using SIMILAR TO and to replace it with simple LIKE for performance issues... I just don't know how to rewrite such a query with LIKE, and I don't know if that would yield better performance. Any help is appreciated. Thanks in advance
المحلول
LIKE is much less flexible than SIMILAR TO. In general there is going to be no formulaic way to downgrade from SIMILAR TO to LIKE. Based on your example, it might take like 34 (4*4 + 3*3 + 3*3) different ORed together LIKE to accomplish this. For example, your first expression in the OR branch gets to pair 3 alternatives plus the empty possibility (so 4 total) at the beginning with another 4 at the end, so that would expand to 16.
I read many advises to avoid using SIMILAR TO and to replace it with simple LIKE for performance issues
This advice would only apply if there is a simple like which does the same job. But in general, tuning should be based on evidence, not on rumors. So to start with, do you actually have a performance problem?