Pregunta

I can't deal with clustering with weka library. I have string attributes, so I use StringToWordVector filter, but how can I after clustering move back from WordVector to string representation to show 'human readable' results? I want to revert this operation:

StringToWordVector filter = new StringToWordVector();
filter.setInputFormat(instancesToFilter);
Instances dataFiltered = Filter.useFilter(instancesToFilter, filter);

Is it possibile?

¿Fue útil?

Solución

The StringToWordVector filter cannot be reversed. However, you have at least two possibilities:

  • If you just want to see or show the original strings that are in each cluster, you can add an ID attribute, ensure it is not used during clustering (to avoid unexpected behavior), then recover the text from the original strings (ARFF file).
  • If you want to show some meaningful summary of the contents of each cluster, you can just output the most frequent/heavy words in each cluster. This is a rather common approach when clustering texts.

Otros consejos

The filter is lossy.

As such, there cannot exist an (exact) reverse transformation. You can maybe approximate it somehow, though.

Consider looking at the source code of the filter.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top