Question

I did my research, but none of the solutions seems to work for my case. I had a gmail format csv, exported from gmail itself. My parsing code is simple as that:

CSV.open(file.path) do |csv|

Error is:

Unquoted fields do not allow \r or \n

I tried combinations of row_sep, encoding, but none of them helps. Any thoughts?

Ruby file read returned:

ruby -e 'p File.read("tmp/google.csv")'       

"\xFF\xFEN\u0000a\u0000m\u0000e\u0000,\u0000G\u0000i\u0000v\u0000e\u0000n\u0000 \u0000N\u0000a\u0000m\u0000e\u0000,\u0000A\u0000d\u0000d\u0000i\u0000t\u0000i\u0000o\u0000n\u0000a\u0000l\u0000 \u0000N\u0000a\u0000m\u0000e\u0000,\u0000F\u0000a\u0000m\u0000i\u0000l\u0000y\u0000 \u0000N\u0000a\u0000m\u0000e\u0000,\u0000Y\u0000o\u0000m\u0000i\u0000 \u0000N\u0000a\u0000m\u0000e\u0000,\u0000G\u0000i\u0000v\u0000e\u0000n\u0000 \u0000N\u0000a\u0000m\u0000e\u0000 \u0000Y\u0000o\u0000m\u0000i\u0000,\u0000A\u0000d\u0000d\u0000i\u0000t\u0000i\u0000o\u0000n\u0000a\u0000l\u0000 \u0000N\u0000a\u0000m\u0000e\u0000 \u0000Y\u0000o\u0000m\u0000i\u0000,\u0000F\u0000a\u0000m\u0000i\u0000l\u0000y\u0000 \u0000N\u0000a\u0000m\u0000e\u0000 \u0000Y\u0000o\u0000m\u0000i\u0000,\u0000N\u0000a\u0000m\u0000e\u0000 \u0000P\u0000r\u0000e\u0000f\u0000i\u0000x\u0000,\u0000N\u0000a\u0000m\u0000e\u0000 \u0000S\u0000u\u0000f\u0000f\u0000i\u0000x\u0000,\u0000I\u0000n\u0000i\u0000t\u0000i\u0000a\u0000l\u0000s\u0000,\u0000N\u0000i\u0000c\u0000k\u0000n\u0000a\u0000m\u0000e\u0000,\u0000S\u0000h\u0000o\u0000r\u0000t\u0000 \u0000N\u0000a\u0000m\u0000e\u0000,\u0000M\u0000a\u0000i\u0000d\u0000e\u0000n\u0000 \u0000N\u0000a\u0000m\u0000e\u0000,\u0000B\u0000i\u0000r\u0000t\u0000h\u0000d\u0000a\u0000y\u0000,\u0000G\u0000e\u0000n\u0000d\u0000e\u0000r\u0000,\u0000L\u0000o\u0000c\u0000a\u0000t\u0000i\u0000o\u0000n\u0000,\u0000B\u0000i\u0000l\u0000l\u0000i\u0000n\u0000g\u0000 \u0000I\u0000n\u0000f\u0000o\u0000r\u0000m\u0000a\u0000t\u0000i\u0000o\u0000n\u0000,\u0000D\u0000i\u0000r\u0000e\u0000c\u0000t\u0000o\u0000r\u0000y\u0000 \u0000S\u0000e\u0000r\u0000v\u0000e\u0000r\u0000,\u0000M\u0000i\u0000l\u0000e\u0000a\u0000g\u0000e\u0000,\u0000O\u0000c\u0000c\u0000u\u0000p\u0000a\u0000t\u0000i\u0000o\u0000n\u0000,\u0000H\u0000o\u0000b\u0000b\u0000y\u0000,\u0000S\u0000e\u0000n\u0000s\u0000i\u0000t\u0000i\u0000v\u0000i\u0000t\u0000y\u0000,\u0000P\u0000r\u0000i\u0000o\u0000r\u0000i\u0000t\u0000y\u0000,\u0000S\u0000u\u0000b\u0000j\u0000e\u0000c\u0000t\u0000,\u0000N\u0000o\u0000t\u0000e\u0000s\u0000,\u0000G\u0000r\u0000o\u0000u\u0000p\u0000 \u0000M\u0000e\u0000m\u0000b\u0000e\u0000r\u0000s\u0000h\u0000i\u0000p\u0000,\u0000E\u0000-\u0000m\u0000a\u0000i\u0000l\u0000 \u00001\u0000 \u0000-\u0000 \u0000T\u0000y\u0000p\u0000e\u0000,\u0000E\u0000-\u0000m\u0000a\u0000i\u0000l\u0000 \u00001\u0000 \u0000-\u0000 \u0000V\u0000a\u0000l\u0000u\u0000e\u0000,\u0000E\u0000-\u0000m\u0000a\u0000i\u0000l\u0000 \u00002\u0000 \u0000-\u0000 \u0000T\u0000y\u0000p\u0000e\u0000,\u0000E\u0000-\u0000m\u0000a\u0000i\u0000l\u0000 \u00002\u0000 \u0000-\u0000 \u0000V\u0000a\u0000l\u0000u\u0000e\u0000,\u0000P\u0000h\u0000o\u0000n\u0000e\u0000 \u00001\u0000 \u0000-\u0000 \u0000T\u0000y\u0000p\u0000e\u0000,\u0000P\u0000h\u0000o\u0000n\u0000e\u0000 \u00001\u0000 \u0000-\u0000 \u0000V\u0000a\u0000l\u0000u\u0000e\u0000,\u0000W\u0000e\u0000b\u0000s\u0000i\u0000t\u0000e\u0000 \u00001\u0000 \u0000-\u0000 \u0000T\u0000y\u0000p\u0000e\u0000,\u0000W\u0000e\u0000b\u0000s\u0000i\u0000t\u0000e\u0000 \u00001\u0000 \u0000-\u0000 \u0000V\u0000a\u0000l\u0000u\u0000e\u0000\r\u0000\n\u0000\u0010\u0004;\u00045\u0004:\u0004A\u00040\u0004=\u00044\u0004@\u0004,\u0000\u0010\u0004;\u00045\u0004:\u0004A\u00040\u0004=\u00044\u0004@\u0004,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000*\u0000 \u0000M\u0000y\u0000 \u0000C\u0000o\u0000n\u0000t\u0000a\u0000c\u0000t\u0000s\u0000 \u0000:\u0000:\u0000:\u0000 \u0000*\u0000 \u0000F\u0000r\u0000i\u0000e\u0000n\u0000d\u0000s\u0000,\u0000*\u0000 \u0000O\u0000t\u0000h\u0000e\u0000r\u0000,\u0000s\u0000t\u0000a\u0000r\u0000s\u0000h\u0000y\u0000n\u0000i\u0000n\u0000@\u0000g\u0000m\u0000a\u0000i\u0000l\u0000.\u0000c\u0000o\u0000m\u0000,\u0000,\u0000,\u0000,\u0000,\u0000,\u0000\r\u0000\n\u0000"

It's seems that google files had a strange encoding:

enca tmp/google.csv                                                                                                                                                                                                    
Universal character set 2 bytes; UCS-2; BMP
  CRLF line terminators
  Byte order reversed in pairs (1,2 -> 2,1)

File content:

Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory Server,Mileage,Occupation,Hobby,Sensitivity,Priority,Subject,Notes,Group Membership,E-mail 1 - Type,E-mail 1 - Value,E-mail 2 - Type,E-mail 2 - Value,Phone 1 - Type,Phone 1 - Value,Website 1 - Type,Website 1 - Value
Александр,Александр,,,,,,,,,,,,,,,,,,,,,,,,,* My Contacts ::: * Friends,* Other,starshynin@gmail.com,,,,,,
Was it helpful?

Solution

You may need to specify the encoding when opening the file. Try using something like this until you manage to decode the file:

File.open(file.path, "rb:UTF-16BE").read.encode("utf-8")

The encoding of your file seems to be UTF-16, so try UTF-16, UTF-16LE and UTF-16BE.

After that you can try to feed the encoded data into a CSV reader like this:

CSV.open(File.open(file.path, "rb:UTF-16BE")) do |csv|

and process the file. You may need to re-encode the data into UTF-8 at some point. It all depends on your use case.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top