WEKA: Classify instances with a deserialized model

Question 1

Your problem is that your model doesn't know anything about what the filter did to the data. The StringToWordVector filter changes the data, but depending on the input (training) data. A model trained on this transformed data set will only work on data that underwent the exact same transformation. To guarantee this, the filter needs to be part of your model.

Using a FilteredClassifier is the correct idea, but you have to use it from the beginning:

Load the ARFF file
Select FilteredClassifier as classifier
Select StringToWordVector as filter for it
Select IBk as classifier for the FilteredClassifier
Generate/Save the model to my_model.binary

The trained and serialized model will then also contain the intialized filter, including the information on how to transform data.

Question 2

Another way to do this is to use the same filter to your testing data as the one used on training data. I describe the procedure analytically. In your case you just need to follow steps after the loading of your serialized classifier.

Create your training file (e.g training.arff)
Create Instances from training file. Instances trainingData = ..
Use StringToWordVector to transform your string attributes to number representation:

sample code:

    StringToWordVector() filter = new StringToWordVector(); 
    filter.setWordsToKeep(1000000);
    if(useIdf){
        filter.setIDFTransform(true);
    }
    filter.setTFTransform(true);
    filter.setLowerCaseTokens(true);
    filter.setOutputWordCounts(true);
    filter.setMinTermFreq(minTermFreq);
    filter.setNormalizeDocLength(new SelectedTag(StringToWordVector.FILTER_NORMALIZE_ALL,StringToWordVector.TAGS_FILTER));
    NGramTokenizer t = new NGramTokenizer();
    t.setNGramMaxSize(maxGrams);
    t.setNGramMinSize(minGrams);    
    filter.setTokenizer(t);  
    WordsFromFile stopwords = new WordsFromFile();
    stopwords.setStopwords(new File("data/stopwords/stopwords.txt"));
    filter.setStopwordsHandler(stopwords);
    if (useStemmer){
        Stemmer s = new /*Iterated*/LovinsStemmer();
        filter.setStemmer(s);
    }
    filter.setInputFormat(trainingData);

Apply the filter to trainingData: trainingData = Filter.useFilter(trainingData, filter);
Select a classifier to create your model

sample code for LibLinear classifier

        Classifier cls = null;
        LibLINEAR liblinear = new LibLINEAR();
        liblinear.setSVMType(new SelectedTag(0, LibLINEAR.TAGS_SVMTYPE));
        liblinear.setProbabilityEstimates(true);
        // liblinear.setBias(1); // default value
        cls = liblinear;
        cls.buildClassifier(trainingData);

Save model

sample code

    System.out.println("Saving the model...");
    ObjectOutputStream oos;
    oos = new ObjectOutputStream(new FileOutputStream(path+"mymodel.model"));
    oos.writeObject(cls);
    oos.flush();
    oos.close();

Create a testing file (e.g testing.arff)
Create Instances from training file: Instances testingData=...
Load classifier

sample code

Classifier myCls = (Classifier) weka.core.SerializationHelper.read(path+"mymodel.model");

Use the same StringToWordVector filter as above or create a new one for testingData, but remember to use the trainingData for this command:filter.setInputFormat(trainingData); This will keep the format of training set and will not add words that are not in training set.
Apply the filter to testingData: testingData = Filter.useFilter(testingData, filter);
Classify!

sample code

 for (int j = 0; j < testingData.numInstances(); j++) {
    double res = myCls.classifyInstance(testingData.get(j));
 }