Question

I have written a program in C to parse large XML files and then create files with insert statements. Some other process would ingest the files into a MySQL database. This data will serve as a indexing service so that users can find documents easily.

I have chosen InnoDB for the ability of row-level locking. The C program will be generating any where from 500 to 5 million insert statements on a given invocation.

What is the best way to get all this data into the database as quickly as possible? The other thing to note is that the DB is on a separate server. Is it worth moving the files over to that server to speed up inserts?

EDIT: This table won't really be updated, but rows will be deleted.

Was it helpful?

Solution

  • Use the mysqlimport tool or the LOAD DATA INFILE command.
  • Temporarily disable indices that you don't need for data integrity

OTHER TIPS

I'd do at least these things according to this link:

  1. Move the files there and connect over the unix socket
  2. Generate, instead of the INSERTS, a LOAD DATA INFILE file
  3. Disabling indexes during the loading

MySQL with the standard table formats is wonderfully fast as long as it's a write-only table; so the first question is whether you are going to be updating or deleting. If not, don't go with innosys - there's no need for locking if you are just appending. You can truncate or rename the output file periodically to deal with table size.

1. Make sure you use a transaction.

Transactions eliminate the

INSERT, SYNC-TO-DISK

repetition phase and instead all the disk IO is performed when you COMMIT the transaction.

2. Make sure to utilize connection compression

Raw text + GZip compressed stream ~= as much as 90% bandwidth saving in some cases.

3. Utilise the parallel insert notation where possible

INSERT INTO TableName(Col1,Col2) VALUES (1,1),(1,2),(1,3) 

( Less text to send, shorter action )

If you can't use LOAD DATA INFILE like others have suggested, use prepared queries for inserts.

Really depends on the engine. If you're using InnoDB, do use transactions (you can't avoid them - but if you use autocommit, each batch is implicitly in its own txn), but make sure they're neither too big or too small.

If you're using MyISAM, transactions are meaningless. You may achieve better insert speed by disabling and enabling indexes, but that is only good on an empty table.

If you start with an empty table, that's generally best.

LOAD DATA is a winner either way.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top