Question

Do guide me along if I am not posting in the right section.

I have some text files for my training data which are unformatted in word documents. They all contain ASCII characters only.

I would like to train a model on the text files using data mining methods.

The text files do have about 300 words in each file on average.

Are there any software that are recommended for me to start on it?

My initial idea is to use all the words in one of the file as training data and the remaining as test data. This is to perform cross fold validation.

However, I have tools such as weka but it does not seem to satisfy my needs as converting to csv files does not seem to be feasible in my case as the text files are separated

I have trying to perform cross validation in such a way that all the words in the training data are considered as features.

Était-ce utile?

La solution

You need to use weka StringToWord filter and convert your text files to arff files. After that you can use weka classification algorithms. Watch following video to learn basics.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top