parsing CSV files backwards

https://stackoverflow.com/questions/1272315

13-09-2019
|

Question

I have csv files with the following format:

CSV FILE
"a"             , "b"     , "c" , "d"
hello, world    , 1       , 2   , 3
1,2,3,4,5,6,7   , 2       , 456 , 87
h,1231232,3     , 3       , 45  , 44

The problem is that the first field has commas "," in it. I have no control over file generation, as that's the format I receive them in. Is there a way to read a CSV file backwards, from the end of line to the beginning?

I don't mind writing a little python script to do so, if I’m guided in the right direction.

Solution

The rsplit string method splits a string starting from the right instead of the left, and so it's probably what you're looking for (it takes an argument specifying the max number of times to split):

line = "hello, world    , 1       , 2   , 3"
parts = line.rsplit(",", 3)
print parts  # prints ['hello, world    ', ' 1       ', ' 2   ', ' 3']

If you want to strip the whitespace from the beginning and end of each item in your splitted list, then you can just use the strip method with a list comprehension

parts = [s.strip() for s in parts]
print parts  # prints ['hello, world', '1', '2', '3']

OTHER TIPS

I don't fully understand why you want to read each line in reverse, but you could do this:

import csv
file = open("mycsvfile.csv")
reversedLines = [line[::-1] for line in file]
file.close()
reader = csv.reader(reversedLines)
for backwardRow in reader:
    lastField = backwardRow[0][::-1]
    secondField = backwardRow[1][::-1]

Reverse the string first and then process it.

tmp = tmp[::-1]

From the sample You have provided, it looks like "columns" are fixed size. First (the one with commas) is 16 characters long, so why don't You try reading the file line by line, then for each line reading the first 16 characters (as a value of first column), and the rest accordingly? After You have each value, You can go and parse it further (trim whitespaces, and so on...).

That's not then a CSV file, comma separated means just that.

How can you be certain that is not:

CSV FILE
"a"             , "b"     , "c" , "d"
hello           , world   , 1   , 2   , 3
1               , 2       , 3   , 4   , 5,6,7,2,456,87
h               , 1231232 , 3   , 3   , 45,44

If the file is as you indicate then the first group should be surrounded by quotes, looks as though the field names are so odd that fields containing commas are not.

I'm not a fan of fixing errors away from their source, I'd push back to the data generator to deliver proper CSV if that's what they are claiming it is.

You could always do something with regex's, like (perl regex)

#!/usr/bin/perl

use IO::File;

if (my $file = new IO::File("test.csv"))
{
    foreach my $line (<$file>) {
    $line =~ m/^(.*),(.*?),(.*?),(.*?)$/;
    print "[$1][$2][$3][$4]\n";
    }
} else {
    print "Unable to open test.csv\n";
}

(The first is a greedy search, the last 3 are not) Edit: posted full code instead of just the regex

If you always expect the same number of columns, and only the first column can contain commas, just read anything and concatenate excess columns at the beginning.

The problem is that the interface is ambiguous, and you can try to circumvent this, but the better solution is to try to get the interface fixed (which is often harder than creating several patches...).

I agree with mr beer. That is a badly formed csv file. Your best bet is to find other delimiters or stop overloading the commas or quote/escape the non field separating commas

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow