Question

I have a CSV file that follows this pattern:

cust_id,cust_name,cust_add_1,cust_add_2,cust_city,cust_state,cust_zip,cust_email
100024,BALE #DIANA & ROY,2944 SOME RD, ,AKRON,OH,44556,an@other.net
100139,SMITH #JOHN & LINDA,1569 ANOTHER WAY, ,SARASOTA,FL,65478,
100263,DOLE #BOB,5947 LONG RD, ,GRANITE FALLS,NC,12345,
.
.
. continued

This file represents thousands and thousands of rows of data, and I am trying to insert all of it into a MySQL table. I created the corresponding table with the following statement:

CREATE TABLE customer (
    cust_id INT(7) NOT NULL,
    cust_name VARCHAR(40) NOT NULL,
    cust_add_1 VARCHAR(50) NOT NULL,
    cust_add_2 VARCHAR(50),
    cust_city VARCHAR(20) NOT NULL,
    cust_state CHAR(2) NOT NULL,
    cust_zip INT(5) NOT NULL,
    cust_email VARCHAR(60),
    PRIMARY KEY (cust_id)
);

Then I tried the following on the command line:

msyqlimport -u root --local csv_import ~/path/to/customer.csv

When I tried SELECT * FROM customer after running mysqlimport, I was given the following result-set:

+---------+-----------+------------+------------+-----------+------------+----------+------------+
| cust_id | cust_name | cust_add_1 | cust_add_2 | cust_city | cust_state | cust_zip | cust_email |
+---------+-----------+------------+------------+-----------+------------+----------+------------+
|       0 |           |            | NULL       |           |            |          | NULL       |
+---------+-----------+------------+------------+-----------+------------+----------+------------+
1 row in set (0.00 sec)

I'm not very experienced in the area of importing CSV data into MySQL, and any help would be greatly appreciated.

UPDATE

As per ThisSuitIsBlackNot's answer, I tried the following command (BTW the --local option is needed):

mysqlimport -u root --local --fields-terminated-by=',' --fields-optionally-enclosed-by='"' --lines-terminated-by='\n' csv_import ~/path/to/customer.csv

This produced slightly better results but wasn't what I need, so I tried it like this:

mysql> LOAD DATA LOCAL INFILE '~/path/to/customer.csv'
    -> INTO TABLE customer
    -> FIELDS TERMINATED BY ','
    -> OPTIONALLY ENCLOSED BY '"'
    -> LINES TERMINATED BY '\n';

After both of these, SELECT * FROM customer produced this result-set:

+---------+-----------+------------+------------+-----------+------------+----------+-------------------+
| cust_id | cust_name | cust_add_1 | cust_add_2 | cust_city | cust_state | cust_zip | cust_email        |
+---------+-----------+------------+------------+-----------+------------+----------+-------------------+
100024 |0 | cust_name | cust_add_1 | cust_add_2 | cust_city | cu         | cust_    | cust_email
+---------+-----------+------------+------------+-----------+------------+----------+-------------------+
1 row in set (0.00 sec)

This is obviously not what I need, but it's getting closer.

Was it helpful?

Solution

Field separators

The default field separator for mysqlimport is a tab character. You need to tell mysqlimport to split fields on commas instead:

mysqlimport --fields-terminated-by=',' ...

This isn't quite enough, though, because if your CSV follows the spec, fields containing commas will be surrounded by double quotes. You need to tell mysqlimport that it shouldn't treat commas inside quotes as field separators:

mysqlimport --fields-terminated-by=',' --fields-optionally-enclosed-by='"' ...

In some CSVs, all fields are double quoted, whether they contain commas or not. This is not the case with your data, but for future reference, mysqlimport also provides the option

--fields-enclosed-by=string

I'm not sure, but I assume performance would be better on a CSV where all fields are quoted if you use --fields-enclosed-by='"' instead of --fields-optionally-enclosed-by='"'.

Line separators

The default line separator for mysqlimport is a newline \n. This should work for files generated on UNIX/Linux/Mac OS X, but if the EOL sequence in your file is something different (for example, \r\n in most files created on Windows), you need to specify that as well:

mysqlimport --lines-terminated-by='\r\n'

Apparently, Wordpad uses '\r' as a line ending, as did Macs prior to Mac OS X.

Skipping column headers

Generally you don't want to import the names of your columns into your table, since the table already has its own column names. You can skip the first X number of lines in your file like this:

mysqlimport --ignore-lines=X

Putting it all together

To import a CSV file where

  • fields that contain commas are double quoted
  • fields that don't contain commas are not quoted
  • lines end in \r\n
  • the first line contains column names, which you want to ignore

you would run

mysqlimport --fields-terminated-by=',' \
            --fields-optionally-enclosed-by='"' \
            --lines-terminated-by='\r\n' \
            --ignore-lines=1 \
            db_name table_name.csv

Note that mysqlimport determines the table to import into by simply stripping off the file extension, so the above example will try to import the data from table_name.csv into the table_name table. Also, if you're running this on the server, you shouldn't need the --local option, although the wording in the documentation is a little obtuse.

You can also use the equivalent SQL statement

LOAD DATA INFILE 'table_name.csv' INTO TABLE table_name
  FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
  LINES TERMINATED BY '\r\n'
  IGNORE 1 LINES;

(mysqlimport is just a command-line interface to LOAD DATA INFILE)

See the docs for LOAD DATA INFILE for more details.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top