How to measure the similarity between two text documents?

题

Assume, I have 100 text documents, and I want to cluster those documents.

The first step is the construct pairwise similarity matrix 100X100 for the documents

My question is:

what are common way to measure similarity between two documents?

Thanks,

没有正确的解决方案

许可以下： CC-BY-SA 和归因