Question

I've got a php-cli script which makes +2 million mysql queries(1/2 select statements using indexes, 1/2 row inserts of a couple varchar(25), int(5) columns ) on an amazon server's crontab. The script itself takes hours to run. I'd like to get it down to ~5 minutes.

I'm wondering how I might scale up the row inserts to cut the time?(The script is essentially a foreach loop over some api data, so I'm also looking into ways of forking the script per iteration ((as a separate issue))

I've been looking at mysql cluster but know nothing about data warehousing, etc. Does anyone know if this would even be the right place to start looking? Would something like Hive/Hadoop be more appropriate?

--edit I'm on an ec2 micro-instance for context. I expect that migration to heavier duty instances will be required, but am wondering if there's any mysql-like alternative to plain old mysql daemon that might help

Was it helpful?

Solution

You will see a huge jump in performance by just upgrading from a micro instance. Micro instances use a burstable CPU profile, you will be running at very low performance for much of that import.

There are a number of things you can do to optimize inserts. Committing as much as you can in a single transaction (InnoDB tables) or disabling index updates (MyISAM tables) can "group" work together in a more efficient manor.

At some point with updating large volumes of data, you will hit the limits of I/O. With EC2 you may want to test RAID configurations or provisioned I/O to see how you can best improve performance.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top