Assume, I have 100 text documents, and I want to cluster those documents.

The first step is the construct pairwise similarity matrix 100X100 for the documents

My question is:

what are common way to measure similarity between two documents?

Thanks,

没有正确的解决方案

许可以下: CC-BY-SA归因
scroll top