Question

I would like to ask for some advices concerning my problem. I have a batch that does some computation (multi threading environement) and do some inserts in a table. I would like to do something like batch insert, meaning that once I got a query, wait to have 1000 queries for instance, and then execute the batch insert (not doing it one by one).

I was wondering if there is any design pattern on this. I have a solution in mind, but it's a bit complicated:

  • build a method that will receive the queries

  • add them to a list (the string and/or the statements)

  • do not execute until the list has 1000 items

The problem : how do I handle the end ? What I mean is, the last 999 queries, when do I execute them since I'll never get to 1000 ? What should I do ?

I'm thinking at a thread that wakes up every 5 minutes and check the number of items in a list. If he wakes up twice and the number is the same , execute the existing queries.

Does anyone has a better idea ?

Was it helpful?

Solution

Your database driver needs to support batch inserting. See this.

Have you established your system is choking on network traffic because there is too much communication between the service and the database? If not, I wouldn't worry about batching, until you are sure you need it.

You mention that in your plan you want to check every 5 minutes. That's an eternity. If you are going to get 1000 items in 5 minutes, you shouldn't need batching. That's ~ 3 a second.

Assuming you do want to batch, have a process wake up every 2 seconds and commit whatever is queued up. Don't wait five minutes. It might commit 0 rows, it might commit 10...who cares...With this approach, you don't need to worry that your arbitrary threshold hasn't been met.

I'm assuming that the inserts come in one at a time. If your incoming data comes in n at once, I would just commit every incoming request, no matter how many inserts happen. If your messages are coming in as some sort of messaging system, it's asynchronous anyway, so you shouldn't need to worry about batching. Under high load, the incoming messages just wait till there is capacity to handle them.

OTHER TIPS

Add a commit kind of method to that API that will be called to confirm all items have been added. Also, the optimum batch size is somewhere in the range 20-50. After that the potential gain is outweighed by the bookkeeping necessary for a growing number of statements. You don't mention it explicitly, but of course you must use the dedicated batch API in JDBC.

If you need to keep track of many writers, each in its own thread, then you'll also need a begin kind of method and you can count how many times it was called, compared to how many times commit was called. Something like reference-counting. When you reach zero, you know you can flush your statement buffer.

This is most amazing concept , I have faced many time.So, according to your problem you are creating a batch and that batch has 1000 or more queries for insert . But , if you are inserting into same table with repeated manner.

To avoid this type of situation you can make the insert query like this:-

INSERT INTO table1 VALUES('4','India'),('5','Odisha'),('6','Bhubaneswar')

It can execute only once with multiple values.So, better you can keep all values inside any collections elements (arraylist,list,etc) and finally make a query like above and insert it once.

Also you can use SQL Transaction API.(Commit,rollback,setTraction() ) etc.

Hope ,it will help you. All the best.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top