Question

We have a data set of 15k classified tweets with which we need to perform sentiment analysis. I would like to test against a test set of 5k classified tweets. Due to Weka needing the same attributes within the header of the test set as exist in the header of training set, I will have to use batch filtering if I want to be able to run my classifier against this 5k test set.

However, there are several filters that I need to run my training set through, so I figured the running a multifilter against the training set would be a good idea. The multifilter works fine when not running the batch argument, but when I try to batch filter I get an error from the CLI as it tried to execute the first filter within the multi-filter:

CLI multiFilter command w/batch argument:

java weka.filters.MultiFilter -F "weka.filters.supervised.instance.Resample -B 1.0 -S 1 -Z 15.0 -no-replacement" \
-F "weka.filters.unsupervised.attribute.StringToWordVector -R first-last -W 100000 -prune-rate -1.0 -N 0 -S -stemmer weka.core.stemmers.NullStemmer -M 2 -tokenizer weka.core.tokenizers.AlphabeticTokenizer" \
-F "weka.filters.unsupervised.attribute.Reorder -R 2-last,first"\
-F "weka.filters.supervised.attribute.AttributeSelection -E \"weka.attributeSelection.InfoGainAttributeEval \" -S \"weka.attributeSelection.Ranker -T 0.0 -N -1\"" \
-F weka.filters.AllFilter \
-b -i input\Train.arff -o output\Train_b_out.arff -r input\Test.arff -s output\Test_b_out.arff

Here is the resultant error from the CLI:

weka.core.UnassignedClassException: weka.filters.supervised.instance.Resample: Class attribute not set!
at weka.core.Capabilities.test(Capabilities.java:1091)
at weka.core.Capabilities.test(Capabilities.java:1023)
at weka.core.Capabilities.testWithFail(Capabilities.java:1302)
at weka.filters.Filter.testInputFormat(Filter.java:434)
at weka.filters.Filter.setInputFormat(Filter.java:452)
at weka.filters.SimpleFilter.setInputFormat(SimpleFilter.java:195)
at weka.filters.Filter.batchFilterFile(Filter.java:1243)
at weka.filters.Filter.runFilter(Filter.java:1319)
at weka.filters.MultiFilter.main(MultiFilter.java:425)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at weka.gui.SimpleCLIPanel$ClassRunner.run(SimpleCLIPanel.java:265)

And here are the headers with a portion of data for both the training and test input arffs:

Training:

@RELATION classifiedTweets
@ATTRIBUTE @@sentence@@ string
@ATTRIBUTE @@class@@ {1,-1,0}
@DATA
"Conditioning be very important for curly dry hair",0
"Combine with Sunday paper coupon and",0
"Price may vary by store",0
"Oil be not really moisturizers",-1

Testing:

@RELATION classifiedTweets
@ATTRIBUTE @@sentence@@ string
@ATTRIBUTE @@class@@ {1,-1,0}
@DATA
"5",0
"I give the curl a good form and discipline",1
"I have be cowashing every day",0
"LOL",0
"TITLETITLE Walgreens Weekly and Midweek Deal",0
"And then they walk away",0

Am I doing something wrong here? I know that supervised resampling requires the class attribute to be on the bottom of the attribute list within the header, and it is... within both the test and training input files.

EDIT:

Further testing reveals that this error does not occur with relationship to the batch filtering, it occurs whenever I run the supervised resample filter from the CLI... The data that I use works on every other filter I've tried within the CLI, so I don't understand why this filter is any different... resampling the data in the GUI works fine as well...

Update:

This also happens with the SMOTE filter instead of the resample filter

Was it helpful?

Solution

Could not get the batch filter to work with any resampling filter. However, our workaround was to simply resample (and then randomize) the training data as step 1. From this reduced set we ran batch filters for everything else we wanted on the test set. This seemed to work fine.

OTHER TIPS

You could have used the multifilter along with the ClassAssigner method to make it work:

java -classpath $jcp weka.filters.MultiFilter
     -F "weka.filters.unsupervised.attribute.ClassAssigner -C last"
     -F "weka.filters.supervised.instance.Resample -B 1.0 -S 1 -Z 66.0"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top