lucene.net/examine weight html tags

https://stackoverflow.com/questions/17567136

02-06-2022
|

Question

I've got this project where we are implementing Examine / Lucene.net. And I'm look for some guidance from you guys.

As far as I have been able to find out from the knowledge of google, is that if I want to boost the weight, I need to boost the weight on the Field, right ?

But could I get something like this: Is it able to give a boost to a term if the term is inside a h1-tag or the title for that matter. When giving a complete site-html, and do a frequent term search.

the thing i would like to do, is no make a service which gets a html document, and from that is able to find what words in a this document optimised after depending on which terms are used in the text and if they are in the important places, like in a title-tag or h2-tag and so forward.

Is this possible to achieve ? its so the editors live can know, "what they are writing are best found with which searchwords.

Big thanks in advance.

Solution

I don't think it quite works that way. Yes, you can boost a field but you cannot boost a term dependent on it's location in some markup because you don't know that at the time of the search.

I think what you could do is create an Umbraco event handler that fires when a page is published. This event could:

Utilise the GatheringNodeData event of an Index
Take the contents of the rich text editor-based field and using regex or something like HtmlUtility extract specific text based upon it's markup location, e.g. H1, H2 and H3 text.
For each piece of text in a heading found, add it into a string variable
Add the whole string into the Lucene index as a new field, e.g. "Headings"
You can now boost on the "Headings" field separately to the field containing the field containing the HTML.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow