Question


I need to build that matrix but I can't find a way to compute normalized tf-idf for each cell. The normalization I would perform is cosine-normalization that is divide tf-idf (computed using DefaultSimilarity ) per 1/sqrt(sumOfSquaredtf-idf in the column).

Does anyone know a way to perform that?
Thanks in advance
Antonio

Was it helpful?

Solution

One way, not using Lucene, is described in Sujit Pal's blog. Alternatively, you can build a Lucene index that has term vectors per field, iterate over terms to get idf, then iterate over term's documents to get tf.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top