Updating Large Amount of Data to MySQL Database
-
15-10-2022 - |
Question
I am using an API service from a provider. The API usage looks like -
https://api.thesite.com/getTable1Records?offset=0
https://api.thesite.com/getTable2Records?offset=0
https://api.thesite.com/getTable3Records?offset=0
(NOT the real addresses) returns JSON of 1000 records for each API call.
For the first time, I retrieved all the records and saved it to my database server. The user searching/processing is run on my database server. It is the way the API service provider recommeded.
The API service provider updates their database whenever the data are changed. I am not able to know when they change and what they change. They might add new records/ update the existings/ delete some. I need to update my database periodically (weekly, every Monday OR 2 times a week is ok).
Here is my PHP code which update one of the table
// Update Table1
echo "STARTED@" . time() . "<br />\n"; // just for log
$offset = 0;
$username = "username";
$password = "password";
$url = "https://api.thesite.com/getTable1Records";
$c = curl_init();
do{
curl_setopt($c, CURLOPT_URL, "$url?offset=$offset" );
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($c, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($c, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($c, CURLOPT_USERPWD, "$username:$password");
$json = curl_exec($c);
$phpobj = json_decode($json);
$offset += 1000;
update($phpobj);
echo "1000UPDATED@" . time() . "<br />\n"; // just for log
}while( count($phpobj) > 0 );
echo "ENDED@" . time() . "<br />\n"; // just for log
function update($phpobj){
$host = "localhost";
$user = "root";
$pass = "";
$db = "theapitest";
$link = mysqli_connect($host, $user, $pass, $db);
for( $i=0; $i<count($phpobj); $i++){
$row = $phpobj[$i];
$id = mysqli_real_escape_string($link, $row->id);
$name = mysqli_real_escape_string($link, $row->name);
$query = "INSERT INTO `tablename` VALUES('$id', '$name')
ON DUPLICATE KEY UPDATE `name`='$name'";
mysqli_query($link, $query);
}
mysqli_close($link);
}//end function
The problems are
- It is too slow. Some tables have million records. (Any better ways?)
- Some tables have no primary key (I cannot use INSERT INTO ... ON DUPLICATE KEY UPDATE)
- For the record deletion (I don't know how to do; I think delete all records and add all again is not the best idea)
No correct solution
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow