Question

I have a legacy php web app that performs csv --> database imports, into a 'master' table that stores entity data, and an entity-attribute-value table that stores dynamic data for each entity.

The import process is a line-by-line iteration through the csv file, with an INSERT into the master table and multiple INSERTs into the EAV table for each line.

This process is SLOOW, and what little know about mysql tuning tells me that a LOAD DATA statement is generally far faster than a series of INSERTs; however, because of the EAV process the iteration would still have to occur, though based off the results of a database query rather than the csv file.

  • Is it worth it to make the modification?

  • Does it make a difference if there are dozens of millions of records in each file, with generally less than 2/3 of the file fields actually being mapped to attributes?

Was it helpful?

Solution

Sounds like a useful modification. What I would do is pre-process the CSV into two files - master and eav tables. The tricky part is establishing some sort of linkage between these two files so you can insert into the eav table with the correct foreign key.

The problem is simplified if:

  1. you can lock out any other write access to the system while you execute the load
  2. the master table primary key is an incrementing integer

In that case, you can easily "know" the eav foreign key value ahead of time and set appropriately before loading data for either table.

If not, you'll need to figure out how to get the primary key value for the master table records, post LOAD DATA, and link up with the eav records accordingly.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top