Question

I'm using Rapidminer to do an analysis. I used cross-validation on several models to get the best working model. Now I want to use this model to test on a separate testset that I made using Split Data to estimate the performance.

How do I use the test set? As far as I can tell, all the validation modules use the training set that the model was made on. Which performance measure can I use that takes in a model and my test set?

Was it helpful?

Solution

Use the "Apply Model" operator with your model as the first input and your test set as the second input. This operator will return a labelled data set which is your data input with some additional special attributes, e.g. the prediction and the confidence. The "Performance" operator needs this attributes to measure the performance of the model applied on your test set.

Here is one small example which uses the a training and test set from the "Samples" repository.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.007">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Golf" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="decision_tree" compatibility="5.3.007" expanded="true" height="76" name="Decision Tree" width="90" x="179" y="30"/>
      <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Golf-Testset" width="90" x="179" y="120">
        <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
      </operator>
      <operator activated="true" breakpoints="before,after" class="apply_model" compatibility="5.3.007" expanded="true" height="76" name="Apply Model" width="90" x="313" y="30">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="performance" compatibility="5.3.007" expanded="true" height="76" name="Performance" width="90" x="447" y="30"/>
      <connect from_op="Golf" from_port="output" to_op="Decision Tree" to_port="training set"/>
      <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Golf-Testset" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
      <connect from_op="Performance" from_port="performance" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top