Train and test set are not compatible error in weka?

Question 1

The same with the comment that I left after problem statement:

All the three attributes are nominal attributes followed by all the possible values quoted by '{}'. One of my guess is that the possible values are not the same. For example, for RESOURCE attribute there is no 199 in test file, while it is in training-file.

Question 2

After struggling with the same problem for a day. I figured out two ways to make the trained model working on supplied test set.

Method 1. Use knowledge flow. For example something like below: CSVLoader(for train set) -> classAssigner -> TrainingSetMaker -->(classifier of your choice) -> ClassfierPerformanceEvaluator - TextViewer. CSVLoader(for test set) -> classAssigner -> TestgSetMaker -->(the same classifier instance above) -> PredictionAppender -> CSVSaver. Then load the data from the CSVLoader or arffLoder for the training set. The model will be trained. After that load data from the loader for the test set. It will evaluate the model(classifier, for example) on the supplied test set and you can see the result from the textviewer (connected to the ClassifierPerformanceEvaluator) and get the saved result from the CSVSaver or arffSaver connected to the PredictionAppender.An additional column, the "classfied as" will be added to the output file. In my case, I used "?" for the class column in the supplied test set if the class labels are not available.

Method 2. Combine the Training and Test set into one file. Then the exact same filter can be applied to both training and test set. Then you can separate training set and test set by applying instance filter. Since I use "?" as class label in the test set. It is not visible in the instance filter indices. Hence just select those indices that you can see in the attribute values to be removed when apply the instance filter. You will get the test data left only. Save it and load it in supply test set at the classifier page.This time it will work. I guess it is the class attribute that causes the NOT compatible train and test set issue. As many classfier requires nominal class attribute. The value of which is converted to the index to available values of the class attribute according to http://weka.wikispaces.com/Why+do+I+get+the+error+message+%27training+and+test+set+are+not+compatible%27%3F

Question 3

See following answer, your train.arff and test.arff should have same header. According to your comparison they are similar but not same.

Question 4

I just encountered the same problem and I found a bare-bones solution. The format of my file is .csv and I simply open my files(for training and testing，respectively) and use the save button on the Preprocess panel of WEKA to save them in .arff format. Then the problem is solved.

Question 5

Look there is a difference between similar and same, your train.arrf and test.arrf should have the same header and if not then you should copy the header of train.arrf and paste it in your test.arrf as a new header.

Question 6

    trainPath = ""

    otherPadelPath = ""

    testPath = ""



    trainFile = open(trainPath,"r")

    trainAttributes = trainFile.readlines()[0].split(",")
    trainFile.close()



    otherPadelFile = open(otherPadelPath,"r")


    otherPadelLines = otherPadelFile.readlines()
    otherPadelFile.close()
    otherPadelColumns = []

    testLines = []

    for attribute in trainAttributes:
      if attribute in otherPadelLines[0].split(","):
        otherPadelColumns += [otherPadelLines[0].split(",").index(attribute)]


    for line in otherPadelLines:
      rearrangedLine = []
      for inDex in otherPadelColumns:
        rearrangedLine += [line.split(",")[inDex]]
      testLines += [",".join(rearrangedLine)]




    testFile = open(testPath,"w")
    testFile.writelines(testLines)
    testFile.close()

This script can rearrange your test dataset to contain the same order/number of attribute columns in your training set, provided that each attribute has the same type and title. Also, (in keeping with WEKA default), the class attribute should be in the last column for both datasets.