One of the simple possible approaches: for a given document calculate TF-IDF measure for every word and choose top-N words as tags (or cut candidates by some threshold). Also in your case it's reasonable to use empirical boosting coefficients for words in the Title and Category fields.
Is there any data-mining/text-mining/machine learning techniques to find the most appropriate Tags for a given document [closed]
-
01-07-2022 - |
Question
Say I have a huge set of documents represented in relational Table with columns
ID (unique identifier)
Title (255 characters)
Description (5000 characters)
Category (predefined meta-data )
Additional Notes (1000 characters )
I would like to add one or more Tags for each row in the document table. Here Tags refer to a word or a group of words that tells readers what a document is about.
Is there any data-mining/text-mining/machine learning techniques or approach that will help me to find the most appropriate Tags for a given document without human interference.
Solution
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow