Question

So I'm trying import a .csv file directly into mysql using load data local infile command but I'm running into fields that are remnants an excel formula field and I don't know how to get rid of them. The content for the first field in the example below has an = in front of it.

The table structure is setup to allow the first field to be a VARCHAR(100) but I'd like to make it an INT if possible. Here is a sample of the csv content that is being uploaded.

"MID","DBA Name","Partner ID","Partner Name","Sub Partner ID","Sub Partner Name","Active Months","Bonus Amount","Bonus Applied Date","Partner Percentage","Partner Share","Total Payment"
="0008788014065741","company2","7968","me,"11839","Joe Blow","0","$50.00","","","","$350.64"
="0008788014065756","company2","7968","you","11839","Joe Blow","0","$50.00","","","","$294.60"

And here is the mysql load command that I'm using to import the data:

sql = """
    LOAD DATA LOCAL INFILE '%(upload)s' IGNORE INTO TABLE `%(table)s`
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' 
    LINES TERMINATED BY '\\r\\n' 
    IGNORE 1 LINES ;
      """ % {"upload": file, "table": report}
self.db.query( sql )

Is there something that can be done on the import in terms of a regex or something? I don't know, I'm just grasping at straws here...

Thanks for the input!

Was it helpful?

Solution

You can do it two ways with LOAD DATA INFILE.

First read first field values as is and then strip equal sign = and double quotes from it in SET clause. Also you most likely want to do other transformations while you're loading data like:

  • set actual NULLs when your fields are empty
  • strip dollar signs from monetary values
  • you probably have to transform date values (but your sample data doesn't have them so there is no information to deduce from)
LOAD DATA LOCAL INFILE '/path/to/your/file.csv' 
IGNORE INTO TABLE table_name
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' 
    LINES TERMINATED BY '\r\n'
    IGNORE 1 LINES 
(@MID, DBAName, PartnerID, PartnerName, SubPartnerID, SubPartnerName, ActiveMonths,
 @BonusAmount, @BonusAppliedDate, @PartnerPercentage, @PartnerShare, @TotalPayment)
SET MID = TRIM(BOTH '"' FROM SUBSTR(@MID, 2)), -- here we get rid of equal sign and double quotes
    BonusAmount  = TRIM(LEADING '$' FROM NULLIF(@BonusAmount, '')),
    BonusAppliedDate = NULLIF(@BonusAppliedDate, ''),
    PartnerPercentage = NULLIF(@PartnerPercentage, ''),
    PartnerShare = TRIM(LEADING '$' FROM NULLIF(@PartnerShare, '')),
    TotalPayment = TRIM(LEADING '$' FROM NULLIF(@TotalPayment, ''))

Second approach is to leverage LINES STARTING BY clause

LOAD DATA LOCAL INFILE '/path/to/your/file.csv' 
IGNORE INTO TABLE table_name
    FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' 
    LINES TERMINATED BY '\r\n' STARTING BY '='
    IGNORE 1 LINES 
(MID, DBAName, PartnerID, PartnerName, SubPartnerID, SubPartnerName, ActiveMonths, @BonusAmount, BonusAppliedDate, PartnerPercentage, PartnerShare, @TotalPayment)
SET BonusAmount  = TRIM(LEADING '$' FROM NULLIF(@BonusAmount, '')),
    BonusAppliedDate = NULLIF(@BonusAppliedDate, ''),
    PartnerPercentage = NULLIF(@PartnerPercentage, ''),
    PartnerShare = TRIM(LEADING '$' FROM NULLIF(@PartnerShare, '')),
    TotalPayment = TRIM(LEADING '$' FROM NULLIF(@TotalPayment, ''))

Now if your target table schema looks something like

CREATE TABLE table_name 
(
    MID BIGINT, 
    DBAName           VARCHAR(100),
    PartnerID         INT,
    PartnerName       VARCHAR(100),
    SubPartnerID      INT,
    SubPartnerName    VARCHAR(100),
    ActiveMonths      INT,
    BonusAmount       DECIMAL(19, 2),
    BonusAppliedDate  DATE,
    PartnerPercentage DECIMAL(3, 2),
    PartnerShare      DECIMAL(19, 2),
    TotalPayment      DECIMAL(19, 2)
);

Then after loading with either method here is what we get in the table

mysql> select * from table_name;
+---------------+----------+-----------+-------------+--------------+----------------+--------------+-------------+------------------+-------------------+--------------+--------------+
| MID           | DBAName  | PartnerID | PartnerName | SubPartnerID | SubPartnerName | ActiveMonths | BonusAmount | BonusAppliedDate | PartnerPercentage | PartnerShare | TotalPayment |
+---------------+----------+-----------+-------------+--------------+----------------+--------------+-------------+------------------+-------------------+--------------+--------------+
| 8788014065741 | company2 |      7968 | me          |        11839 | Joe Blow       |            0 |       50.00 | NULL             |              NULL |         NULL |       350.64 |
| 8788014065756 | company2 |      7968 | you         |        11839 | Joe Blow       |            0 |       50.00 | NULL             |              NULL |         NULL |       294.60 |
+---------------+----------+-----------+-------------+--------------+----------------+--------------+-------------+------------------+-------------------+--------------+--------------+
2 rows in set (0.00 sec)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top