Question

I have a php script that steps through a folder containing tab delimited files, parsing them line by line and inserting the data into a mysql database. I cannot use LOAD TABLE because of security restrictions on my server and I do not have access to the configuration files. The script works just fine parsing 1 or 2 smaller files but when when working with several large files I get a 500 error. There do not appear to be any error logs containing messages pertaining to the error, at least none that my hosting provider gives me access to. Below is the code, I am also open to suggestions for alternate ways of doing what I need to do. Ultimately I want this script to fire off every 30 minutes or so, inserting new data and deleting the files when finished.

EDIT: After making the changes Phil suggested, the script still fails but I now have the following message in my error log "mod_fcgid: read data timeout in 120 seconds", looks like the script is timing out, any idea where I can change the timeout setting?

$folder = opendir($dir);
    while (($file = readdir($folder)) !== false) {
        $filepath = $dir . "/" . $file;

        //If it is a file and ends in txt, parse it and insert the records into the db
        if (is_file($filepath) && substr($filepath, strlen($filepath) - 3) == "txt") {
            uploadDataToDB($filepath, $connection);
        }
    }

function uploadDataToDB($filepath, $connection) {
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);
    ini_set('max_execution_time', 300);

    $insertString = "INSERT INTO dirty_products values(";

    $count = 1;

    $file = @fopen($filepath, "r");

    while (($line = fgets($file)) !== false) {
        $values = "";
        $valueArray = explode("\t", $line);
        foreach ($valueArray as $value) {
            //Escape single quotes
            $value = str_replace("'", "\'", $value);
            if ($values != "")
                $values = $values . ",'" . $value . "'";
            else
                $values = "'" . $value . "'";
        }

        mysql_query($insertString . $values . ")", $connection);
        $count++;
    }

    fclose($file);

    echo "Count: " . $count . "</p>";
}
Was it helpful?

Solution

First thing I'd do is use prepared statements (using PDO).

Using the mysql_query() function, you're creating a new statement for every insert and you may be exceeding the allowed limit.

If you use a prepared statement, only one statement is created and compiled on the database server.

Example

function uploadDataToDB($filepath, $connection) {
    ini_set('display_errors', 'On');
    error_reporting(E_ALL);
    ini_set('max_execution_time', 300);

    $db = new PDO(/* DB connection parameters */);
    $stmt = $db->prepare('INSERT INTO dirty_products VALUES (
                         ?, ?, ?, ?, ?, ?)');
    // match number of placeholders to number of TSV fields

    $count = 1;

    $file = @fopen($filepath, "r");

    while (($line = fgets($file)) !== false) {
        $valueArray = explode("\t", $line);
        $stmt->execute($valueArray);
        $count++;
    }

    fclose($file);
    $db = null;

    echo "Count: " . $count . "</p>";
}

Considering you want to run this script on a schedule, I'd avoid the web server entirely and run the script via the CLI using cron or whatever scheduling service your host provides. This will help you avoid any timeout configured in the web server.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top