Weka GUI - TF-IDF is not calculated - Please Help For My Academic Work

https://stackoverflow.com/questions/16940793

31-05-2022
|

Question

I want to use KNN algorithm with TF-IDF in WEKA GUI. Firstly I run the algorithm in default conditions. Secondly I choose "IDFTransform" and "TFTransform" as "true" in StringToWordVector filter and run.
There is no difference in two results.

Result1:

Correctly Classified Instances        1346               91.3781 %

Result2:

Correctly Classified Instances        1346               91.3781 %

My ".arff" file is as follows:

@relation et9

@attribute 'alis' real
@attribute 'banka' real
...
@attribute 'urun' real
@attribute 'class' {yes, no}

@data
70,0,0,0,3,0,40,0,3,1,0,0,20,0,717,2,4,0,0,0,2,5,0,0,0,717,0,1,0,30,yes
22,0,0,63,158,0,1,0,7,0,10,0,4,0,57,0,0,0,0,204,0,0,2,2,0,530,0,0,6,0,yes
0,0,1,0,0,0,0,0,2,1,3,0,0,0,0,0,5,0,0,0,0,0,2,1,0,0,0,0,0,0,no
...

I know that StringToWordVector is used for strings. But I want to calculate TF-IDF for this ".arff" file. How can I use my current ".arff" file and have KNN algorithm result with TF-IDF?

(This is my academic work. Please help...)

Solution

According to Weka's documentation, the StringToWordVector filter "Converts String attributes into a set of attributes representing word occurrences [...]". Therefore, applying this filter to an arff file that does not contain any String attributes will have no effect on the dataset.

In order to make use of this filter, you will need to prepare an arff file that contains a String attribute, where the value of this attribute is the text for the given instance. For example, if each instance represents one tweet, then the text from the tweet would be the value for this String attribute. More information on working with text in weka is documented here.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow