Question

I have an issue with loading data from a csv file. The question is, without preprocessing, is it possible to have an exception in the following code

load data local infile 'Program.csv' into table program_table 
    fields terminated by ',' 
    LINES TERMINATED BY '\r\n' 
    ignore 1 lines
    (@program_code, @field_of_study, @area_of_study, @degree_level)
    SET
        program_code = nullif(@program_code,''), 
        field_of_study = nullif(@field_of_study,''),
        area_of_study = nullif(@area_of_study,''),
        degree_level = nullif(@degree_level,'');

that ignores the "terminate by" character when it's followed by a (space). The problem I've encountered is within the csv file there are lines like the following

ZLD6,Administration/Business,BUSINESS MGMT. SMALL BUSINESS, CONVEYANCING,Doctorate.

The BUSINESS MGMT. SMALL BUSINESS, CONVEYANCING should be one field, but the field termination character ',' separates int into two. That is the fields would be ZLD6, Administration/Business, BUSINESS MGMT. SMALL BUSINESS, CONVEYANCING, and Doctorate (5 fields instead of 4).

Any suggestions on possible solutions would be great.

Was it helpful?

Solution

To answer your question yes I'm sure it is possible to employ a scheme like jkavalik suggested where you:

  1. Import the entire file into a single column/cell in a temp table
  2. Parse the malformed csv string using increasingly obtuse trickery as each edge case is discovered.

I'm sure a smarter person than me could write an event driven, real-time, web-enabled, protocol buffer communicating, graphical user interface, autonomous aircraft landing system for use in hurricanes using only SQL, but why would they? You should use the best tool for the job. If you can't correct the csv exporter to properly escape the data then you should properly escape the data manually or use a programming language that will make this task easier.

In my opinion, SQL is only going to make your life more difficult, and as such you should choose a different tool.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top