Question

I'm currently having more than 400k records in my mysql table. The structure is the following:

table structure

The function i am using:

function cron_hour_counts()
{
    $subids = get_subids();
    array_push($subids, '');
    $from = '2011-10-20';//last_updated_date('tb_hour_counts');
    $to = '2011-10-20';//last_date();
    $days = days_interval($from, $to);
    $result_array = array();

    foreach ($subids as $subid)
    {
        for ($i = 0; $i < $days; $i++)
        {
            $hour = '00:00';
            for ($t = 0; $t <= 23; $t++)
            {
                if ($t == 0)
                {
                    $chour = date('H:i', strtotime($hour . '+' . $t . 'hour'));
                    $phour = date('H:i', strtotime('23:59'));

                    $date = date('Y-m-d', strtotime($from . '+' . $i . 'day'));
                    $date_prev = date('Y-m-d', strtotime($date . '- 1 day'));
                }
                else
                {
                    $chour = date('H:i', strtotime($hour . '+' . $t . 'hour'));
                    $phour = date('H:i', strtotime($chour . '-1 hour'));

                    $date = date('Y-m-d', strtotime($from . '+' . $i . 'day'));
                    $date_prev = $date;
                }

                $unique_id_query = mysql_query("SELECT (SELECT COUNT(DISTINCT `id`,`subid`) FROM `tb_stats` WHERE (`date` < '" . mysql_real_escape_string($date) . "') OR (`date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "')" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") - (SELECT COUNT(DISTINCT `id`,`subid`) FROM `tb_stats` WHERE (`date` < '" . mysql_real_escape_string($date_prev) . "') OR (`date` = '" . mysql_real_escape_string($date_prev) . "' AND `time` <= '" . mysql_real_escape_string($phour) . "')" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") AS `unique_ids`");
                $unique_id_result = mysql_fetch_assoc($unique_id_query);

                $total_id_query = mysql_query("SELECT COUNT(DISTINCT `id`,`subid`) AS `total_ids` FROM `tb_stats` WHERE `date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : ""));
                $total_id_result = mysql_fetch_assoc($total_id_query);

                $unique_ip_query = mysql_query("SELECT (SELECT COUNT(DISTINCT `ip`,`subid`) FROM `tb_stats` WHERE (`date` < '" . mysql_real_escape_string($date) . "') OR (`date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "')" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") - (SELECT COUNT(DISTINCT `ip`,`subid`) FROM `tb_stats` WHERE `date` <= '" . mysql_real_escape_string($date_prev) . "' AND `time` <= '" . mysql_real_escape_string($phour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : "") . ") AS `unique_ips`");
                $unique_ip_result = mysql_fetch_assoc($unique_ip_query);

                $total_ip_query = mysql_query("SELECT COUNT(DISTINCT `ip`,`subid`) AS `total_ips` FROM `tb_stats` WHERE `date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : ""));
                $total_ip_result = mysql_fetch_assoc($total_ip_query);

                $global_query = mysql_query("SELECT COUNT(`id`) AS `global` FROM `tb_stats` WHERE `date` = '" . mysql_real_escape_string($date) . "' AND `time` <= '" . mysql_real_escape_string($chour) . "'" . (!empty($subid) && is_numeric($subid) ? " AND `subid` = '" . mysql_real_escape_string($subid) . "'" : ""));
                $global_result = mysql_fetch_assoc($global_query);

                $result = array();
                $result['date'] = $date;
                $result['hour'] = $chour;
                $result['subid'] = $subid;
                $result['unique_ids'] = $unique_id_result['unique_ids'];
                $result['total_ids'] = $total_id_result['total_ids'];
                $result['unique_ips'] = $unique_ip_result['unique_ips'];
                $result['total_ips'] = $total_ip_result['total_ips'];
                $result['global'] = $global_result['global'];

                $result_array[] = $result;
            }
        }
    }
    //db insert
    print_r($result_array);
}

Having 20 subids, and one day period it takes 40 minutes to execute. Any tips on speeding this up?

Was it helpful?

Solution

This is my solution. It works 20 times faster.

function cron_hour_counts()
{
    $subids = get_subids();
    //array_push($subids, '');
    $from = '2011-10-20';//last_updated_date('tb_hour_counts');
    $to = '2011-10-20';//last_date();
    $days = days_interval($from, $to);
    $result_array = array();

    for ($i = 0; $i < $days; $i++)
    {
        $hour = '00:00';
        for ($t = 0; $t <= 23; $t++)
        {
            $date = date('Y-m-d', strtotime($from . '+' . $i . 'day'));
            $currentHour = date('H:i', strtotime($hour . '+' . $t . 'hour'));
            $nextHour = date('H:i', strtotime($currentHour . '+59 minutes'));

            $unique_ids_query = mysql_query("
                SELECT COUNT(id) AS unique_ids,subid
                FROM 
                (
                    SELECT id,subid,date,time
                    FROM tb_stats
                    WHERE date <= '" . mysql_real_escape_string($date) . "'
                    GROUP BY id,subid
                ) AS id_inner
                WHERE date = '" . mysql_real_escape_string($date) . "'
                AND time BETWEEN '" . mysql_real_escape_string($currentHour) . "' AND '" . mysql_real_escape_string($nextHour) . "'
                GROUP BY subid;
            ");
            pull_data('unique_ids', $date, $currentHour, $unique_ids_query, $subids, $result_array);            

            $unique_ips_query = mysql_query("
                SELECT COUNT(ip) AS unique_ips,subid
                FROM 
                (
                    SELECT ip,subid,date,time
                    FROM tb_stats
                    WHERE date <= '" . mysql_real_escape_string($date) . "'
                    GROUP BY ip,subid
                ) AS ip_inner
                WHERE date = '" . mysql_real_escape_string($date) . "'
                AND time BETWEEN '" . mysql_real_escape_string($currentHour) . "' AND '" . mysql_real_escape_string($nextHour) . "'
                GROUP BY subid;
            ");
            pull_data('unique_ips', $date, $currentHour, $unique_ips_query, $subids, $result_array);

            $total_ids_query = mysql_query("
                SELECT COUNT(DISTINCT id,subid) AS total_ids,subid
                FROM tb_stats
                WHERE date = '" . mysql_real_escape_string($date) . "'
                AND `time` <= '" . mysql_real_escape_string($nextHour) . "'
                GROUP BY subid
                ");
            pull_data('total_ids', $date, $currentHour, $total_ids_query, $subids, $result_array);

            $total_ips_query = mysql_query("
                SELECT COUNT(DISTINCT ip,subid) AS total_ips,subid
                FROM tb_stats
                WHERE date = '" . mysql_real_escape_string($date) . "'
                AND `time` <= '" . mysql_real_escape_string($nextHour) . "'
                GROUP BY subid;
            ");
            pull_data('total_ips', $date, $currentHour, $total_ips_query, $subids, $result_array);

            $global_query = mysql_query("
                SELECT COUNT(id) AS global,subid
                FROM tb_stats
                WHERE date = '" . mysql_real_escape_string($date) . "'
                AND time <= '" . mysql_real_escape_string($nextHour) . "'
                GROUP BY subid;
            ");
            pull_data('global', $date, $currentHour, $global_query, $subids, $result_array);
        }
    }

    print_r($result_array);
}

OTHER TIPS

Optimize your queries.

Here's an example of a query that can DEFINITELY be optimized:

You posted the following query as an example of a unique_ids_query in your comment above:

SELECT (SELECT COUNT(DISTINCT id,subid) FROM tb_stats WHERE subid = '1' AND date <= '2011-10-20') - (SELECT COUNT(DISTINCT id,subid) FROM tb_stats WHERE subid = '1' AND date <= '2011-10-19') AS unique_ids;

Essentially the query is getting the change in distinct id, subid combinations between 2011-10-20 and 2011-10-19 where the subid is '1'. You're doing this by first counting ALL of the records below the 2011-10-20 date and then counting ALL of the records below the 2011-10-19 date. You also have THREE SELECT statements in that query.

Unless I'm mistaken thats the same as counting all of the records between 2011-10-19 and 2011-10-20, which you could do with the following:

SELECT COUNT(DISTINCT id,subid) AS unique_ids FROM tb_stats WHERE subid = '1' AND date <= '2011-10-20' AND date >= '2011-10-19';

You should also start using MySQLi or PDO in PHP if possible for store procedures, which could also be a performance booster.

In addition you should run as many queries as possible over a single connection to reduce connection latency (it adds up!)

One final potential boon would be to write MySQL functions. It would be possible to run the query above without using COUNT or DISTINCT with a MySQL function which would be a performance booster beyond the boost given by running it as a function.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top