Question

I am dealing with text classification in rapidminer. I have seperate test and training splits. I applied Information Gain to a dataset using n-fold cross validation but i am confused on how to apply it on seperate test set ? Below is attached image enter image description here

In figure i have connected the word list output from first "Process Documents From Files" which is used for training to second "Processed Documents From Files" which is used for testing but i want to apply the reduced feature to the second "Process Documents From Files" which perhaps should be the one returned from "Select By Weight" (reduced dimensions) operator but it returns weights which i cannot provide to second "Process Documents From Files". I searched alot but did'nt managed to find anything which can satisfy my need ?

Is it really possible for Rapidminer to have seperate test/train splits and apply feature selection ?

Is there any way to convert these weights into word list ? Please don't say write in repository (i can't do this) ?

In such scenario when i have different test/train splits and needs to apply feature selection, how would i make sure that test/train splits have same dimension vectors ?

I am really trapped out at it, kindly help ...

Was it helpful?

Solution

Immediately after the lower Process Documents operator insert a new Select By Weight operator before the Apply Model. Use a Multiply operator to copy the weights from the Weight By Information Gain operator and connect this to the input of the new Select By Weight operator.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top