Question

I'm trying to explain my problem by examples. I've a long running statement like

UPDATE <table_A>
INNER JOIN <table_B> ON [...]
LEFT JOIN <table_C> ON [...]
LEFT JOIN <table_D> ON [...]
LEFT JOIN <table_E> ON [...]
SET <table_A>.<col_A>=X
WHERE <table_A>.<col_A>=Y AND COALESCE(<table_C>.<id>,<table_D>.<id>,<table_E>.<id> IS NULL

This statement runs on big tables (two of them contain 7+ million rows per table). The update runs 3-5 minutes. In another sessions there is done in high concurrency

UPDATE <table_C> SET <col_A>=Z WHERE <id> IN ([...])

or

DELETE FROM <table_C> WHERE <id> IN ([...])

When the big UPDATE runs, then these concurrent UPDATE and DELETES die with lockwait timeout or deadlocks after one or two minutes. All JOIN columns are indexed (standard indexes). I've already tried to do

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
[BIG UPDATE];
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;

but that doesn't help. Data consistency on <table_A> is not so important (it's no problem if it contains rows which do not exist in <table_C>...<table_E> anymore). The most important is, that the small UPDATE/DELETEs on <table_C>...<table_E> are being processed.

Was it helpful?

Solution

Since it's generally a bad idea to run updates this big on a live database, I suggest you break down your big update.

Here is not the most optimized way to do it, but i'm sure you'll manage to optimize it yourself.

Run in loop:

  1. SELECT Id, ColA FROM TableA ORDER BY Id DESC LIMIT 10 OFFSET (iteration)*10
  2. Second loop, take rows from previous result where tableA.colA=Y
    2.1. SELECT Id FROM TableB WHERE ID=id_from_current_iteration
    2.2. SELECT Id FROM TableC WHERE ID=id_from_current_iteration
    2.3 If both previous queries returned null go to the next step, otherwise proceed to the next iteration 2.4 UPDATE TableA SET ColA=X WHERE ID=id_from_current_iteration

In other words - avoid joins.
This will take longer than a single update, but it will work.
First step to optimize it will be batching queries.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top