Document tagging

https://stackoverflow.com/questions/13448500

30-11-2021
|

Question

I have very huge solr index. I want to tag all documents with terms which better represent that document like this. Does this type of clustering results is also come under document tagging?

Which approach is better, Index time Document tagging or Query time document tagging like carrot2 ?

Solution

Query time has the obvious drawback that this makes the query more expensive.

However, the clustering results at query time are supposedly better, because at that time, more information has been seen and user feedback can be incorporated.

Note that technically, this is probably more frequent pattern mining than cluster analysis.

Maybe you should just try this variant of frequent pattern mining on your whole data set. You might not even need to store which documents were tagged which way - the solr engine should already be optimized to retrieve them again when needed.

OTHER TIPS

I understood from your question that you want to know how to implement something similar to carrot2 faceting using solr.

IMO you can add a multivalued field tag to your documents (see this Stack Overflow Question for an example) with the cluster names for that doc, and then build facets using that field as explained in Solr wiki here and here.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow