Question

I have a problem importing a CSV file with RapidMiner. Floating point values are written with commas instead of the separating dot between the integer and decimal values.

Anyone know how to import correctly the values formatted in this way?

sample data:

BMI;1;0;1;1;1;blue;-0,138812155;0,520378909;5;0;50;107;0;9;0;other;good;2011 BMI;1;0;1;1;1;pink;-0,624654696;;8;0;73;120;1;3;0,882638889;other;good;2011

Rapid miner actually interprets it as "polynomial". Forcing it to "real" leads only to a correct interpretation of the "0" value.

thanks

Was it helpful?

Solution

Use semi-colon as the delimiter. You can use java.util.Scanner to read each line. String.split() to split on the semi-colon. When you get a token with a comma you can use String.replace() to change the comma to a decimal. Then you can use Float.parseFloat()

Hope this answers you question.

OTHER TIPS

This seems to be a very old request. Not sure if this will help you, but this may help others with a similar situation.

Step 1: in the "Read CSV" operator, under "import configuration wizard", make sure you select "Semicolon" as the separator

Step 2: use the "Guess Types" operator. Attribute Filter Type -> Subset, Select Attributes -> select the attributes 8, 9 and 16 (based on your example above), change "decimal point character" to a "," and you should be all set.

Hope this helps (someone!)

public static void main(String args){
    BufferedReader br = new BufferedReader(new FileReader("c:\\path\\semicolons and numbers and commas.csv"));
    try {
        for(String line; (line=br.readLine()) != null);) {
            //Variable line now has a single line from the file. This code will execute for each line.
            String array = line.split(";");// Split on the semicolon. Beware of changing this. This uses regex which means that some characters mean something like . means anything, not just dots.
            double firstDouble = Double.parseDouble(array[7].replace(',','.')); // Get field 7 (the eighth field) and turn it into a double (high precision floating point). Replace , with . so it will not make an error
            System.err.println("Have a number " + firstDouble);
            System.err.println("Can play with it " + (firstDouble * 2.0));
        }
    }finally{
        br.close(); // Free resources (and unlock file on Windows).
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top