Well, it seems that your dataset is in an sparse format in which InformationID
identifies the instance, and FeatureID
identifies the feature, being Value
the value for each couple of instance/feature.
Lets us assume that the label_training.txt
is explicit (an instance is identified by the line, for example line #1 identifies instance #1, which corresponds to InformationID
1.
In this case, you need to generate ARFF files like the following one:
@relation my-relation
@attribute my-class {-1,1}
@attribute 1 numeric
@attribute 2 numeric
../..
@data
{0 1, 6 1.00, 160 31.00, 438 1.00, 479 1.00}
{0 1, 6457 2.00, 6664 0.65, 6761 0.46, 6762 1.00}
../..
This a WEKA sparse ARFF format in which each couple of numbers correspond to an attribute number and to its value.
I suggest to write an script to perform this transformation.