Document embedding vs locality sensitive hashing for document clustering

https://datascience.stackexchange.com/questions/60817

dimensionality-reduction
embeddings
similar-documents
natural-language-process

02-11-2019
|

문제

I would like to compare two methods: locality sensitivity hashing and document embedding to get the similarity between two documents. Both of those methods encode information of a document in a vector which I would like to use to find similar documents in a very large corpus (potentially more than 100 000 documents). Have anybody ever compared those two methods and what are the advantages of each of them.

Cheers in advance

올바른 솔루션이 없습니다

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 datascience.stackexchange