Document embedding vs locality sensitive hashing for document clustering

https://datascience.stackexchange.com/questions/60817

dimensionality-reduction
embeddings
similar-documents
natural-language-process

02-11-2019
|

题

I would like to compare two methods: locality sensitivity hashing and document embedding to get the similarity between two documents. Both of those methods encode information of a document in a vector which I would like to use to find similar documents in a very large corpus (potentially more than 100 000 documents). Have anybody ever compared those two methods and what are the advantages of each of them.

Cheers in advance

没有正确的解决方案

许可以下： CC-BY-SA 和归因

不隶属于 datascience.stackexchange