Question

I'm trying to create a model with a training dataset and want to label the records in a test data set.

All tutorials or help I find online has information on only using cross validation with one data set, i.e., training dataset. I couldn't find how to use test data. I tried to apply the result model on to the test set. But the test set seems to give different no. of attributes than training set after pre-processing. This is a text classification problem.

At the end I get some output like this

18.03.2013 01:47:00 Results of ResultWriter 'Write as Text (2)' [1]: 
18.03.2013 01:47:00 SimpleExampleSet:
5275 examples,
366 regular attributes,
special attributes = {
confidence_1 = #367: confidence(1) (real/single_value)
confidence_5 = #368: confidence(5) (real/single_value)
confidence_2 = #369: confidence(2) (real/single_value)
confidence_4 = #370: confidence(4) (real/single_value)
prediction = #366: prediction(label) (nominal/single_value)/values=[1, 5, 2, 4]
}

But what I wanted is all my examples to be labelled.

It seems that my test data and training data have different no. of attributes, I see many of following in the logs.

Mar 18, 2013 1:46:41 AM WARNING: Kernel Model: The given example set does not contain a regular attribute with name 'wireless'. This might cause problems for some models depending on this particular attribute.

But how do we solve such problem in text classification as we cannot know no. of and name of attributes before hand.

Can some one please throw some pointers.

Was it helpful?

Solution

You probably use a Process Documents operator to preprocess both training and test set. Here it is important that both these operators are setup identically. To "synchronize" the wordlist, i.e. consider the same set of words in both of them, you have to connect the wordlist (wor) output of the Process Documents operator used for training to the corresponding input port of the Process Documents operator used for preprocessing the test set.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top