Question

I'm trying to import a large csv file wiht 27797 rows into MySQL. Here is my code:

load data local infile 'foo.csv' into table bar fields terminated by ',' enclosed by '"' lines terminated by '\n' ignore 1 lines;

It works fine. However, some rows of this file containing backslashes (\), for example:

"40395383771234304","40393156566585344","84996340","","","2011-02-23 12:59:44 +0000","引力波宇宙广播系统零号控制站","@woiu 太好了"
"40395151830421504","40392270645563392","23063222","","","2011-02-23 12:58:49 +0000","引力波宇宙广播系统零号控制站","@wx0 确切地讲安全电压是\""不高于36V\""而不是\""36V\"", 呵呵. 话说要如何才能测它的电压呢?"
"40391869477158912","40390512645124096","23063222","","","2011-02-23 12:45:46 +0000","引力波宇宙广播系统零号控制站","@wx0 这是别人的测量结果, 我没验证过. 不过麻麻的感觉的确是存在的, 而且用适配器充电时麻感比用电脑的前置USB接口充电高"

"15637769883","15637418359","35192559","","","2010-06-07 15:44:15 +0000","强互作用力宇宙探测器","@Hc95 那就不是DOS程序啦,只是个命令行程序,就像Android里的adb.exe。$ adb push d:\hc95.tar.gz /tmp/ $ adb pull /system/hc95/eyes d:\re\"

After importing, lines with backslashes will be broken.

How could I fix it? Should I use sed or awk to substitute all \ with \ (within 27797 rows...)? Or this can be fixed by just modifying the SQL query?

Was it helpful?

Solution

This is abit more of a discussion than a direct answer. Do you need the double quotes in the middle of the values in the final data (in the DB)? The fact that you have a large amount of data to munge doesn't present any problems at all.

The "" thing is what Oracle does for quotes inside strings. I think whatever built that file attempted to escape the quote sequence. This is the string manual for MySQL. Either of these is valid::

select "hel""lo", "\"hello";

I would tend to do the editing separately to the import, so it easier/faster to see if things worked. If your text file is less than 10MB, it shouldn't take more than a minute to update it via sed.

sed -e 's/\\//' foo.csv

From your comments, you can set the escape char to be something other than '\'.

ESCAPED BY 'char'

This means the loader should verbatim add the values. If it gets too complicated, if you base64() the data before you insert it, this will stop any tools from breaking the UTf8 sequences.

OTHER TIPS

What I did in a similar situation was to create a java string first in a test application. Then compile the test class and fix any errors that I found.

For example:

`String me= "LOAD DATA LOCAL INFILE 'X:/access.log/' REPLACE INTO TABLE `logrecords"+"`\n"+
"FIELDS TERMINATED BY \'|\'\n"+
"ENCLOSED BY \'\"\'\n"+
     "ESCAPED BY \'\\\\\'\n"+
     "LINES TERMINATED BY \'\\r\\n\'(\n"+
     "`startDate` ,\n"+
     "`IP` ,\n"+
     "`request` ,\n"+
     "`threshold` ,\n"+
     "`useragent`\n"+
     ")";
    System.out.println("" +me);

enter code here

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top