Document embedding vs locality sensitive hashing for document clustering

https://datascience.stackexchange.com/questions/60817

dimensionality-reduction
embeddings
similar-documents
natural-language-process

02-11-2019
|

質問

I would like to compare two methods: locality sensitivity hashing and document embedding to get the similarity between two documents. Both of those methods encode information of a document in a vector which I would like to use to find similar documents in a very large corpus (potentially more than 100 000 documents). Have anybody ever compared those two methods and what are the advantages of each of them.

Cheers in advance

正しい解決策はありません

ライセンス： CC-BY-SA と帰属

所属していません datascience.stackexchange