Removing \n in between double quotes in a CSV UNIX

Question 1

I have researched what was suggested by glenn jackman and have worked a solution in python. Here is my code using Python:

#!/usr/bin/python

import sys, csv, os

inputfile=sys.argv[1]
outputfile=sys.argv[1] + '.filter'
newtext=' '

print inputfile
print outputfile

with open(inputfile, "rb") as input:
  with open(outputfile, "wb") as output:
    w = csv.writer(output, delimiter=',', quotechar='"', quoting=csv.QUOTE_NONNUMERIC, lineterminator='\n')
    for record in csv.reader(input):
      w.writerow(tuple(s.replace("\n", newtext) for s in record))

os.rename(outputfile, inputfile)

Thanks everyone for all the help. Hopefully someone having the same issue will find this. My only problem with this solution is that it adds quotes around all fields, including null fields.

Thanks, Josh

EDIT:

I was able to use perl to quickly remove all double quotes next to eachother.

perl -pi -le 's/""//g' data

Question 2

Here's one way of doing it:

sed -n -e ':b; /^[^"]*"[^"]*\("[^"]*"[^"]*\)*$/ { N; s/\
//; bb; }; p; '

In pseudocode it goes

label foo:
  if we have an odd number of quotes:
    read and append the next line
    remove the line feed
    goto foo

print line

Example output:

$ cat file
"2013-11-19 10:09:09","0","I","NOT SET   ","
simple string                    "
"normal data",42
"some other
string"
$ sed -n -e ':b; /^[^"]*"[^"]*\("[^"]*"[^"]*\)*$/ { N; s/\
//; bb; }; p; ' < file
"2013-11-19 10:09:09","0","I","NOT SET   ","simple string                  "
"normal data",42
"some otherstring"
$

Note that any quotes escaped with backslash will ruin it ("foo\"bar"), while quotes escaped with quotes ("foo""bar") will work. Make sure you know what your CSV dialect you're using.

Question 3

I would use a language with a CVS parser. Try to parse the current line, if there's an error, join the next line and try again: for example, with ruby:

ruby -rcsv -ne '
  chomp
  loop do
    begin
      row=CSV.parse_line($_)
      # if no error thrown, we have a parseable line
      puts row.inspect
      break
    rescue
      # grab the next line and try again
      $_ += gets
    end
  end
' << END
a,b,c,d,e
1,2,3,4,5
"2013-11-19 10:09:09","0","I","NOT SET   ","
simple string                            "
"a 1","b 2","c 3","d 4","e 5"
END

["a", "b", "c", "d", "e"]
["1", "2", "3", "4", "5"]
["2013-11-19 10:09:09", "0", "I", "NOT SET   ", "simple string                            "]
["a 1", "b 2", "c 3", "d 4", "e 5"]

Question 4

sed -n -e '/"/ {
   s/:/:d/g;s/\\"/:e/g
:b 
      /^\(\("[^"]*"\)*[^"]*\)*"\([^"]*\)$/ { 
      N
      s/\
//
      b b
      }
   s/:e/\\"/g;s/:d/:g/
  }
p' YourFile

use a "translation of \" before. This use a bit more cpu but pass throug escaped "