Domanda

Trying to use carrot2 for doing to resultset clustering. I have couple of questions with respect to this.

a) Can we cluster the documents in Solr/Lucene based on the specific fields in solr? like cluster them based name, person name and geo-distance location (lat, long) with specific field weights?

b) My use case for clustering is not really online, it is more of a batch use case, given that, do we still have this restriction of 1K max no. of results?

È stato utile?

Soluzione

Carrot2 performs clustering based only on the natural text of your documents. Person names would probably be too short for meaningful clustering; Carrot2 is not suitable for geo-distance and other numerical data.

The 1k restriction / recommendation is based on the design goal of Carrot2: to cluster small collections of texts (such as search results) fast enough so that the process can be done on-line. Carrot2 does well for collections around 1k documents, but will not scale very well beyond several thousands of documents.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top