Question

I am trying to create a web application, primary objective is to insert request data into database.

Here is my problem, One request itself contains 10,000 to 1,00,000 data sets of information (Each data set needs to be inserted separately as a row in the database)

I may get multiple request on this application concurrently, so its necessary for me to make the inserts fast.

I am using MySQL database, Which approach is better for me, LOAD DATA or BATCH INSERT or is there a better way than these two?

How will your application retrieve this information? - There will be another background thread based java application that will select records from this table process them one by one and delete them.

Can you queue your requests (batches) so your system will handle them one batch at a time? - For now we are thinking of inserting it to database straightaway, but yes if this approach is not feasible enough we may think of queuing the data.

Do retrievals of information need to be concurrent with insertion of new data? - Yes, we are keeping it concurrent.

Here are certain answers to your questions, Ollie Jones

Thankyou!

Was it helpful?

Solution

Ken White's comment mentioned a couple of useful SO questions and answers for handling bulk insertion. For the record volume you are handling, you will enjoy the best success by using MyISAM tables and LOAD DATA INFILE data loading, from source files in the same file system that's used by your MySQL server.

What you're doing here is a kind of queuing operation. You receive these batches (you call them "requests") of records (you call them "data sets.) You put them into a big bucket (your MySQL table). Then you take them out of the bucket one at a time.

You haven't described your problem completely, so it's possible my advice is wrong.

Is each record ("data set") independent of all the others?

Does the order in which the records are processed matter? Or would you obtain the same results if you processed them in a random order? In other words, do you have to maintain an order on the individual records?

What happens if you receive two million-row batches ("requests") at approximately the same time? Assuming you can load ten thousand records a second (that's fast!) into your MySQL table, this means it will take 200 seconds to load both batches completely. Will you try to load one batch completely before beginning to load the second?

Is it OK to start processing and deleting the rows in these batches before the batches are completely loaded?

Is it OK for a record to sit in your system for 200 or more seconds before it is processed? How long can a record sit? (this is called "latency").

Given the volume of data you're mentioning here, if you're going into production with living data you may want to consider using a queuing system like ActiveMQ rather than a DBMS.

It may also make sense simply to build a multi-threaded Java app to load your batches of records, deposit them into a Queue object in RAM (a ConcurrentLinkedQueue instance may be suitable) and process them one by one. This approach will give you much more control over the performance of your system than you will have by using a MySQL table as a queue.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top