The easiest way to acomplish these tasks is using a FilteredClassifier
. This kind of classifier integrates a Filter
and a Classifier
, so you can connect a StringToWordVector
filter with the classifier you prefer (J48
, NaiveBayes
, whatever), and you will be always keeping the original training set (unprocessed text), and applying the classifier to new tweets (unprocessed) by using the vocabular derived by the StringToWordVector
filter.
You can see how to do this in the command line in "Command Line Functions for Text Mining in WEKA" and via a program in "A Simple Text Classifier in Java with WEKA".