This is what the ts_headline()
function is intended for.
It is designed to deliver you excerpts or highlights of the "original" text you have normalized. The most basic usage would be this:
SELECT ts_headline(description, keywords) as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts @@ keywords;
Note that "description" in this query is my guess to the name of your column that holds the original text and "fts" is the guess for the column that contains the normalized text.
This query will return a result set containing an excerpt of your orignal text with the matching tokens highlighted through HTML <b>
tags.
There is a comma separated string of optional values you can pass into this function to alter its behavior. You could, for example, alter the surrounding tags you will get back by setting the StartSel
and EndSel
values:
SELECT ts_headline(description, keywords, 'StartSel=<em>,StopSel=</em>') as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts @@ keywords;
Now the <b>
tags will become <em>
tags. Actually, they do not have to be HTML tags, you can pass in (almost) any string.
Another popular value to set is the amount of excerpts you wish to see by setting the MaxFragments
values to control the maximum amount of possible excerpts to return in combination with the MaxWords
and MinWords
values to set how much text should surround each excerpt.
SELECT ts_headline(description, keywords, 'MaxFragments=4,MaxWords=5,MinWords=2') as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts @@ keywords;
The above query will now show a maximum of four possible excerpts and have a word boundary set between two and five words.
If you wish to simply show the whole document with the results highlighted, you could use the HighlightAll
value, which overrides all fragment values set:
SELECT ts_headline(description, keywords, 'HighlightAll=true') as result
FROM jobs, plainto_tsquery('pg_catalog.english','search term') as keywords
WHERE fts @@ keywords;
Note: beware of using ts_headline()
for it is a possible bottleneck in performance. For each record you wish to highlight, the database has to go and fetch the whole text, parse it and insert the desired start and end elements.
Please use the function with great care and only set it loose on a small portion (top five or top ten records) of your complete result set.