Question

I'm writing some very simple search functionality for a list of FAQ's. I'm splitting the search string on a variety of characters, including spaces. Then performing a select along the lines of

SELECT *
FROM "faq"
WHERE
    ((LOWER("Question") LIKE '%what%'
   OR LOWER("Question") LIKE '%is%'
   OR LOWER("Question") LIKE '%a%'
   OR LOWER("Question") LIKE '%duck%'))

I've had to edit this slightly as its generated by our data access layer but it should give you an idea of whats going on.

The problem is demonstrated well with the above query in that most questions are likely to have the words a or is in them, however I can not filter these out as acronyms may well be important for the searcher. What has been suggested is we order by the number of matching keywords. However I have been unable to find a way of doing this in SQL (we do not have time to create a simple search engine with an index of keywords etc). Does anyone know if there's a way of counting the number of LIKE matches in an SQL statement and ordering by that so that the questions with the most keywords appear at the top of the results?

Was it helpful?

Solution

I assume the list of matching keywords is being entered by the user and inserted into the query dynamically by the application, immediately prior to executing the query. If so, I suggest amending the query like so:

SELECT *
FROM "faq"
WHERE
    ((LOWER("Question") LIKE '%what%'
   OR LOWER("Question") LIKE '%is%'
   OR LOWER("Question") LIKE '%a%'
   OR LOWER("Question") LIKE '%duck%'))
order by
    case when LOWER("Question") LIKE '%what%' then 1 else 0 end +
    case when LOWER("Question") LIKE '%is%' then 1 else 0 end +
    case when LOWER("Question") LIKE '%a%' then 1 else 0 end +
    case when LOWER("Question") LIKE '%duck%' then 1 else 0 end
descending;

This would even enable you to "weight" the importance of each selection term, assuming the user (or an algorithm) could assign a weighting to each term.

One caveat: if your query is being constructed dynamically, are you aware of the risk of SQL Insertion attacks?

OTHER TIPS

You can write a function which counts the occurrences of one string in another like this:

CREATE OR REPLACE FUNCTION CountInString(text,text)
RETURNS integer AS $$
 SELECT(Length($1) - Length(REPLACE($1, $2, ''))) / Length($2) ;
$$ LANGUAGE SQL IMMUTABLE;

And use it in the select: select CountInString("Question",' what ') from "faq".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top