Question

I'm having some issues dealing with updating and inserting millions of row in a MySQL Database. I need to flag 50 million rows in Table A, insert some data from the marked 50 million rows into Table B, then update those same 50 million rows in Table A again. There are about 130 million rows in Table A and 80 million in Table B.

This needs to happen on a live server without denying access to other queries from the website. The problem is while this stored procedure is running, other queries from the website end up locked and the HTTP request times out.

Here's gist of the SP, a little simplified for illustration purposes:

CREATE DEFINER=`user`@`localhost` PROCEDURE `MyProcedure`(  
  totalLimit  int
)
BEGIN
  SET @totalLimit = totalLimit; 
  /* Prepare new rows to be issued */
  PREPARE STMT FROM 'UPDATE tableA SET `status` = "Being-Issued" WHERE `status` = "Available" LIMIT ?';
  EXECUTE STMT USING @totalLimit;
  /* Insert new rows for usage into tableB */
  INSERT INTO tableB (/* my fields */)
    SELECT /* some values from TableA */ 
    FROM tableA
    WHERE `status` = "Being-Issued";
  /* Set rows as being issued */
  UPDATE tableB SET `status` = 'Issued' WHERE `status` = 'Being-Issued';
END$$

DELIMITER ;
Was it helpful?

Solution 2

What about this? It basically calls the original stored procedure in a loop until the total amount needed is reached, and having a sleep period in between calls (like 2 seconds) to allow other queries to process.

increment is the amount to do at one time (using 10,000 in this case)
totalLimit is the total amount to be processed
sleepSec is the amount of time to rest between calls

BEGIN
SET @x = 0;
REPEAT
    SELECT SLEEP(sleepSec);
    SET @x = @x + increment;
    CALL OriginalProcedure( increment );

    UNTIL @x >= totalLimit
END REPEAT;
END$$

Obviously it could use a little math to make sure the increment doesn't go over the total limit if its not evenly divisible, but it appears to work (by work I mean allow other queries to still be processed from web requests), and seems to be faster overall as well.

Any insight here? Is this a good idea? Bad idea?

OTHER TIPS

Processing 50M rows three times will be slow irrespective of what you're doing.

Make sure your updates are affecting smaller-sized, disjoint sets. And execute each of them one by one, rather than each and every one of them within the same transaction.

If you're doing this already and MySQL is misbehaving, try this slight tweak to your code:

create a temporary table

begin

insert into tmp_table
select your stuff
limit ?
for update

do your update on A using tmp_table

commit

begin
do your insert on B using tmp_table
do your update on A using tmp_table
commit

this should keep locks for a minimal time.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top