Question

I implemented a trigram search using pg_search gem on rails. https://github.com/Casecommons/pg_search

The problem is that sometimes the order of returned results doesn't seems right according to the definition of the trigram search that shows the gem documentation:

Trigram search works by counting how many three-letter substrings (or “trigrams”) match between the query and the text.

My application receives string input from the user ("111 Streetname") and returns a list of addresses that matches the Address.full_string value with approximate search with trigram.

List of search examples

Trigram search: "1493 cambrid"

  • Results:
    • 100 Cambridgeside Pl
    • 100 Cambridgeside Pl
    • 150 Cambridgepark Dr
    • 1575 Cambridge St
    • 1573 Cambridge St
    • 1493 Cambridge St

Trigram search: "1493 cambr"

  • Result:
    • 1493 Cambridge St

Trigram search: "1493 cambri"

  • Results:
    • 1575 Cambridge St
    • 1573 Cambridge St
    • 1493 Cambridge St

Trigram search: "1493 cambridge"

  • Results:
    • 1493 Cambridge St
    • 5 Cambridgepark Dr
    • 7 Cambridgepark Dr
    • 100 Cambridgeside Pl
    • and many more

Question

¿Why isn't "1493 Cambridge St" always on top of the results? ¿Do I need to change the query of the trigram search or is it just the way the algorithm works?

Query example

SELECT "addresses".*, (ts_rank((to_tsvector('simple', coalesce("addresses"."full_string"::text, ''))), (to_tsquery('simple', ''' ' || '1493' || ' ''') && to_tsquery('simple', ''' ' || 'cambridge' || ' ''')), 0)) AS pg_search_rank FROM "addresses" WHERE (((coalesce("addresses"."full_string"::text, '')) % '1493 cambridge')) ORDER BY pg_search_rank DESC, "addresses"."id" ASC
Was it helpful?

Solution

While you quote the manual on trigram search, you are actually operating with the ts_rank() function from text search.

If you order the results by

(addresses.full_string <-> '1493 cambridge')

... you get what you ask for.
<-> being the trigram "distance" operator.

You may also want to use the % ("similarity") operator in the WHERE clause. Ideally you would have a GiST index with gist_trgm_ops on the column for this.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top