Question

Below is a program that shows how strings are output incorrectly if the ARFF saver from weka is writing in incremental mode. The program below runs in incremental mode if a parameter is passed to the program and in batch mode if no parameter is passed.

Note that in batch mode, the ARFF file contains strings ... normal operation. In incremental mode, the ARFF file contains integers in place of strings ... strange !

Any ideas on how to get the ARFF formater to output strings in incremental format?

import java.io.File;
import java.io.IOException;

import weka.core.Attribute;
import weka.core.FastVector;
import weka.core.Instance;
import weka.core.Instances;
import weka.core.converters.ArffSaver;
import weka.core.converters.Saver;

public class ArffTest {
    static Instances instances; 
    static ArffSaver saver;
    static boolean flag=false;

    public static void addData(String ticker, double price) throws IOException{
        int numAttr = instances.numAttributes(); // same for
        double[] vals = new double[numAttr]; 
        int i=0;
        vals[i++] = instances.attribute(0).addStringValue(ticker);
        vals[i++] = price;
        Instance instance = new Instance(1.0, vals);
        if (flag)
            saver.writeIncremental(instance);
        else
            instances.add(instance);
    }

    public static void main(String[] args) {
        if(args.length>0){
            flag=true;
        }
        FastVector atts = new FastVector();         // attributes
        atts.addElement(new Attribute("Ticker", (FastVector)null));// symbol
        atts.addElement(new Attribute("Price"));    // price that order exited at.

        instances = new Instances("Samples", atts, 0);  // create header
        saver = new ArffSaver();
        saver.setInstances(instances);
        if(flag)
            saver.setRetrieval(Saver.INCREMENTAL);

        try{
            saver.setFile(new File("test.arff"));
            addData("YY", 23.0);
            addData("XY", 24.0);
            addData("XX", 29.0);
            if(flag)
                saver.writeIncremental(null);
            else
                saver.writeBatch();
        }catch(Exception e){
            System.out.println("Exception");
        }
    }
}
Was it helpful?

Solution

You forgot to add the newly created Instance to the dataset.

Instance instance = new DenseInstance(1.0, vals);
instance.setDataset(instances); //Add instance!
if (flag)
   saver.writeIncremental(instance);
else
   instances.add(instance);

The Instance must have access to the dataset to retrieve the String attribute. If it doesn't it just writes out the index.

Besides that I recommend to use Weka 3.7.6. Instance is now an interface with two implementations.

cheers, Muki

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top