Is there any data-mining/text-mining/machine learning techniques to find the most appropriate Tags for a given document [closed]

StackOverflow https://stackoverflow.com/questions/19615318

Question

Say I have a huge set of documents represented in relational Table with columns

    ID (unique identifier)
    Title (255 characters)
    Description (5000 characters)
    Category (predefined meta-data )
    Additional Notes (1000 characters )

I would like to add one or more Tags for each row in the document table. Here Tags refer to a word or a group of words that tells readers what a document is about.

Is there any data-mining/text-mining/machine learning techniques or approach that will help me to find the most appropriate Tags for a given document without human interference.

Was it helpful?

Solution

One of the simple possible approaches: for a given document calculate TF-IDF measure for every word and choose top-N words as tags (or cut candidates by some threshold). Also in your case it's reasonable to use empirical boosting coefficients for words in the Title and Category fields.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top