I would like to compare two methods: locality sensitivity hashing and document embedding to get the similarity between two documents. Both of those methods encode information of a document in a vector which I would like to use to find similar documents in a very large corpus (potentially more than 100 000 documents). Have anybody ever compared those two methods and what are the advantages of each of them.

Cheers in advance

没有正确的解决方案

许可以下: CC-BY-SA归因
scroll top