Question

I want to get the semantic similarity of two words using cosine similarity method using TF-IDF. For that first I want to take the meaning of those words from wikipedia or word-net.After that I want to pre-process the text and find the TF-IDF. When I googled the problem I found that for finding the TF-IDF we should have a train set and test set. In my case which one is train set and which one is test set? How can I calculate cosine similarity using computed result?

Était-ce utile?

La solution

The training phase is finding the weights in TF-IDF, which is calculated based on the frequency of a given word in a document vs. all documents. Once you have all the weights, it means that you turned each document into a vector of N words.

Now, given two documents i and j, you calculate their similarity by the Cosine function. A cosine similarity measure on two vectors is calculated by their dot product over their magnitudes. Look here for more info.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top