Question

I'm using LOAD DATA INFILE to upload a .csv into a table.

This is the table I have created in my db:

CREATE TABLE expenses (entry_id INT NOT NULL AUTO_INCREMENT, PRIMARY KEY(entry_id), 
ss_id INT, user_id INT, cost FLOAT, context VARCHAR(100), date_created DATE);

This is some of the sample data I'm trying to upload (some of the rows have data for every column, some are missing the date column):

1,1,20,Sandwiches after hike,
1,1,45,Dinner at Yama,
1,2,40,Dinner at Murphys,
1,1,40.81,Dinner at Yama,
1,2,1294.76,Flight to Taiwan,1/17/2011
1,2,118.78,Grand Hyatt @ Seoul,1/22/2011
1,1,268.12,Seoul cash withdrawal,1/8/2011

Here is the LOAD DATA command which I can't get to work:

LOAD DATA INFILE '/tmp/expense_upload.csv'
INTO TABLE expenses (ss_id, user_id, cost, context, date)
;

This command completes, uploads the correct number of rows into the table but every field is NULL. Anytime I try to add FIELDS ENCLOSED BY ',' or LINES TERMINATED BY '\r\n' I get a syntax error.

Other things to note: the csv was created in MS Excel.

If anyone has tips or can point me in the right direction it would be much appreciated!

Was it helpful?

Solution

First of all I'd change FLOAT to DECIMAL for cost

CREATE TABLE expenses 
(
  entry_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, 
  ss_id INT, 
  user_id INT, 
  cost DECIMAL(19,2), -- use DECIMAL instead of FLOAT
  context VARCHAR(100), 
  date_created DATE
);

Now try this

LOAD DATA INFILE '/tmp/sampledata.csv' 
INTO TABLE expenses  
    FIELDS TERMINATED BY ',' 
           OPTIONALLY ENCLOSED BY '"'
    LINES  TERMINATED BY '\n' -- or \r\n
(ss_id, user_id, cost, context, @date_created)
SET date_created = IF(CHAR_LENGTH(TRIM(@date_created)) > 0, 
                      STR_TO_DATE(TRIM(@date_created), '%m/%d/%Y'), 
                      NULL);

What id does:

  1. it uses correct syntax for specifying fields and columns terminators
  2. since your date values in the file are not in a proper format, it first reads a value to a user/session variable then if it's not empty it converts it to a date, otherwise assigns NULL. The latter prevents you from getting zero dates 0000-00-00.

OTHER TIPS

Here is my advice. Load the data into a staging table where all the columns are strings and then insert into the final table. This allows you to better check the results along the way:

CREATE TABLE expenses_staging (entry_id INT NOT NULL AUTO_INCREMENT,
                               PRIMARY KEY(entry_id), 
                               ss_id varchar(255),
                               user_id varchar(255),
                               cost varchar(255),
                               context VARCHAR(100),
                               date_created varchar(255)
                              );

LOAD DATA INFILE '/tmp/expense_upload.csv'
    INTO TABLE expenses_staging (ss_id, user_id, cost, context, date);

This will let you see what is really being loaded. You can then load this data into the final table, doing whatever data transformations are necessary.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top