Question

I need to populate a MySQL table with random SHA-1 hash values, generated by PHP function. I`m trying to optimize the insert by splitting it in chunks of 10000. My question is: Is the following approach efficient? Here is the code.

//MySQL server connection routines are above this point
if ($select_db) {
$time_start = microtime(true);
//query
$query = 'INSERT INTO sha1_hash (sha1_hash) VALUES ';
for ($i=1; $i<1000001; $i++) {
 $query .= "('".sha1(genRandomString(8))."'),";
    $count++;
    if ($count ==10000) {
    //result
 $result = mysql_query(rtrim($query,',')) or die ('Query error:'.mysql_error());
    if ($result) mysql_free_result($result);
    $count = 0;
    }
}

$time_end = microtime(true);
echo '<br/>'. ($time_end - $time_start);
}

//function to generate random string
function genRandomString($length)
{
$charset='abcdefghijklmnopqrstuvwxyz0123456789';
$count = strlen($charset);
 while ($length--) {
  $str .= $charset[mt_rand(0, $count-1)];
 }
return $str;
}

EDIT: The $time_start and $time_end variables are ONLY for performance testing purposes. Also the MySQL table has two fields only: ID int(11) UNSIGNED NOT NULL AUTO_INCREMENT and sha1_hash varchar(48) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, the engine is MyISAM EDIT2: The computer hardware point of view is not related to the question.

Was it helpful?

Solution

Inserts are generally done in large batches because indexes are updated after each insert. Batching allows you to insert many records then update the indexes only once at the end instead of after each row.

However, in the case of an auto-incrementing primary key index, the index has to be extended in order to even add the new row, so you're not saving anything there since you don't have any other indexes.

Batching also saves on some overhead in the parsing of the queries and locking. However, you might also consider using parameterized queries (PDO).

Inserting one record at a time using PDO's parameterized query would also be very fast, since MySQL only has to parse the query once, and from then on, it uses a low overhead binary transfer of the row data.

You might lock the table before the insertion begins with LOCK TABLES. This will save a little bit in table lock overhead.

Also, since SHA1 will always be 40 character hex encoded ASCII value, you should consider using CHAR(40) instead of VARCHAR(). This will speed things up as well. Also, if the SHA1 column is indexed, use a single-byte character set instead of UTF8 to reduce the size of the index and speed things up.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top