weka StringToWordVector filter reversion (java)

https://stackoverflow.com/questions/21260583

30-09-2022
|

题

I can't deal with clustering with weka library. I have string attributes, so I use StringToWordVector filter, but how can I after clustering move back from WordVector to string representation to show 'human readable' results? I want to revert this operation:

StringToWordVector filter = new StringToWordVector();
filter.setInputFormat(instancesToFilter);
Instances dataFiltered = Filter.useFilter(instancesToFilter, filter);

Is it possibile?

解决方案

The StringToWordVector filter cannot be reversed. However, you have at least two possibilities:

If you just want to see or show the original strings that are in each cluster, you can add an ID attribute, ensure it is not used during clustering (to avoid unexpected behavior), then recover the text from the original strings (ARFF file).
If you want to show some meaningful summary of the contents of each cluster, you can just output the most frequent/heavy words in each cluster. This is a rather common approach when clustering texts.

其他提示

The filter is lossy.

As such, there cannot exist an (exact) reverse transformation. You can maybe approximate it somehow, though.

Consider looking at the source code of the filter.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow