Is there any data-mining/text-mining/machine learning techniques to find the most appropriate Tags for a given document [closed]

StackOverflow https://stackoverflow.com/questions/19615318

Вопрос

Say I have a huge set of documents represented in relational Table with columns

    ID (unique identifier)
    Title (255 characters)
    Description (5000 characters)
    Category (predefined meta-data )
    Additional Notes (1000 characters )

I would like to add one or more Tags for each row in the document table. Here Tags refer to a word or a group of words that tells readers what a document is about.

Is there any data-mining/text-mining/machine learning techniques or approach that will help me to find the most appropriate Tags for a given document without human interference.

Это было полезно?

Решение

One of the simple possible approaches: for a given document calculate TF-IDF measure for every word and choose top-N words as tags (or cut candidates by some threshold). Also in your case it's reasonable to use empirical boosting coefficients for words in the Title and Category fields.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top