Question

I'm using postgres 9.3 with full text search and I'm running a query like

select * from jobs where fts @@ plainto_tsquery('pg_catalog.english','search term');

I'm getting the proper results, however, I'd like to be able to get a portion of the search results that match the terms searched. The FTS column is just a to_tsvector() of the description column. What I'd like to do is show a short excerpt of the description, with the terms highlighted. Any ideas on how I'd achieve this?

Was it helpful?

Solution

This is what the ts_headline() function is intended for.

It is designed to deliver you excerpts or highlights of the "original" text you have normalized. The most basic usage would be this:

SELECT ts_headline(description, keywords) as result
    FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
    WHERE fts @@ keywords;

Note that "description" in this query is my guess to the name of your column that holds the original text and "fts" is the guess for the column that contains the normalized text.

This query will return a result set containing an excerpt of your orignal text with the matching tokens highlighted through HTML <b> tags.

There is a comma separated string of optional values you can pass into this function to alter its behavior. You could, for example, alter the surrounding tags you will get back by setting the StartSel and EndSel values:

SELECT ts_headline(description, keywords, 'StartSel=<em>,StopSel=</em>') as result
    FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
    WHERE fts @@ keywords;

Now the <b> tags will become <em> tags. Actually, they do not have to be HTML tags, you can pass in (almost) any string.

Another popular value to set is the amount of excerpts you wish to see by setting the MaxFragments values to control the maximum amount of possible excerpts to return in combination with the MaxWords and MinWords values to set how much text should surround each excerpt.

SELECT ts_headline(description, keywords, 'MaxFragments=4,MaxWords=5,MinWords=2') as result
    FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
    WHERE fts @@ keywords; 

The above query will now show a maximum of four possible excerpts and have a word boundary set between two and five words.

If you wish to simply show the whole document with the results highlighted, you could use the HighlightAll value, which overrides all fragment values set:

SELECT ts_headline(description, keywords, 'HighlightAll=true') as result
    FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
    WHERE fts @@ keywords; 

Note: beware of using ts_headline() for it is a possible bottleneck in performance. For each record you wish to highlight, the database has to go and fetch the whole text, parse it and insert the desired start and end elements.

Please use the function with great care and only set it loose on a small portion (top five or top ten records) of your complete result set.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top