Question

I was trying to export the data from Mongo to Oracle. I used to below approach.

Step 1 : Export the data to CS file usign mongoExport command. 
Step 2 : Read the data through a java code, do the necessary data transformation. 
Step 3 : Insert the data to Oracle 

Issue is that, when any of the comment section has a new line character ('\n'), the data is moving to next line and java read fails to process the document.

There is a open bug with 10gen for this, JIRA. Has any one faced issue. Is there is a workaround for this ?

Was it helpful?

Solution

As with many formatting nuances in CSV, there is no agreed "standard" for how to handle embedded newline characters in a CSV field.

A common implementation is RFC-4180: Common Format and MIME Type for Comma-Separated Values (CSV) Files, which suggests:

6) Fields containing line breaks (CRLF), double quotes, and commas
should be enclosed in double-quotes.

For example:

"aaa","b CRLF
bb","ccc" CRLF
zzz,yyy,xxx

This is the format that mongoexport is currently using. If you use a CSV parser compliant with RFC-4180 (eg. SuperCSV as suggested by @evanchooly) it should handle the quoted newlines as expected.

If you need an alternative to the format used by mongoexport or need more flexibility in your output, you can always write your own export script.

OTHER TIPS

Are you trying to parse the csv manually? If so, take a look at http://opencsv.sourceforge.net/ or http://supercsv.sourceforge.net/ and see if they help.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top