Question

I've been banging my head all day at this and still can't come up with some working answers.

I want to track all my visitors by cookie on my website and save the current URL (or path) that the user visits with the unique user key included in a mysql row.

After some time and some unique visitors I want to create a tree of paths that the users followed through my site. (something like Google Analytics does with there visitor flow reports).

But I can't somehow figure out how to make the query that walks through these rows and creates a "tree" of urls (with count or percentage)

If someone could help me out, that will be much appreciated.

-- Edit I already have a mysql database and tables in place

CREATE TABLE `journies` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `site_id` int(11) NOT NULL,
  `profile_id` int(11) NOT NULL,
  `url` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `profiles` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `site_id` int(11) NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

-- update 2 Oke, this may explain better what I want to do.

I want to create a tree with urls like this:

                                                  /- domain.com/news (100%)
                     /- domain.com/about (33%) --/
domain.com (100%) --/
                    \- domain.com/contact (33%)
                    \- domain.com/news (33%) --\
                                                \- domain.com/news/id/1 (100%)

Ofcourse this may be done with multiple queries (although it will be great if it could be done with one ;))

Was it helpful?

Solution

Okay, this was a doozy, but I think I've finally got something for you to work with.

The premise is to pull out the URLs and counts from the database, grouping and sorting them by url. Store those results into a master array. Then, find the base domain of the urls, which I called the trunk. Loop through the master array and find any items that have the same base domain as the trunk and find their percentages.

One other thing to note is that these are level-specific. So cnn.com/employees (2 levels) would be on the same level as cnn.com/news.

Here is the SQL data that I used for this:

INSERT INTO `journies` (`id`, `site_id`, `profile_id`, `url`, `created_at`, `updated_at`) VALUES
    (1, 1, 1, 'domain.com', '2014-02-19 15:34:54', '0000-00-00 00:00:00'),                      
    (2, 1, 1, 'domain.com/about', '2014-02-19 15:35:57', '0000-00-00 00:00:00'),                
    (3, 1, 1, 'domain.com/contact', '2014-02-19 15:36:12', '0000-00-00 00:00:00'),              
    (4, 1, 1, 'domain.com/news', '2014-02-19 15:36:29', '0000-00-00 00:00:00'),                 
    (5, 1, 1, 'domain.com/news/id/1', '2014-02-19 15:39:26', '0000-00-00 00:00:00'),            
    (6, 1, 1, 'domain.com/contact', '2014-02-19 15:50:26', '0000-00-00 00:00:00'),              
    (7, 1, 1, 'cnn.com/news/id/1', '2014-02-19 16:00:02', '0000-00-00 00:00:00'),               
    (8, 1, 1, 'cnn.com/news', '2014-02-19 16:00:15', '0000-00-00 00:00:00'),                    
    (9, 1, 1, 'cnn.com', '2014-02-19 16:00:25', '0000-00-00 00:00:00'),                         
    (10, 1, 1, 'cnn.com', '2014-02-19 16:46:16', '0000-00-00 00:00:00'),                        
    (11, 1, 1, 'cnn.com', '2014-02-19 16:46:16', '0000-00-00 00:00:00'),                        
    (12, 1, 1, 'domain.com/news/id/1', '2014-02-20 08:47:23', '0000-00-00 00:00:00'),           
    (13, 1, 1, 'domain.com/news/id/1', '2014-02-20 08:47:23', '0000-00-00 00:00:00'),           
    (14, 1, 1, 'domain.com/news/id/2', '2014-02-20 08:53:29', '0000-00-00 00:00:00'),           
    (15, 1, 1, 'domain.com/prices', '2014-02-20 12:40:44', '0000-00-00 00:00:00'),              
    (16, 1, 1, 'domain.com/prices', '2014-02-20 12:40:44', '0000-00-00 00:00:00'),              
    (17, 1, 1, 'cnn.com/employees/friekot', '2014-02-20 15:23:34', '0000-00-00 00:00:00'),      
    (18, 1, 1, 'cnn.com/employees', '2014-02-20 15:23:34', '0000-00-00 00:00:00');              

And here is the code that I came up with:

<?php



$link = mysqli_connect("localhost", "user", "pass", "database");






// SET THE DEFAULTS
$trunk_array = array();
$master_array = array();






// PULL OUT THE DATA FROM THE DATABASE
$q_get_tracking_info = "SELECT *, COUNT(url) AS url_count FROM journies WHERE site_id = 1 AND profile_id = 1 GROUP BY url ORDER BY url;";
$r_get_tracking_info = mysqli_query($link, $q_get_tracking_info) or trigger_error("Cannot Get Tracking Info: (".mysqli_error().")", E_USER_ERROR);



while ($row_get_tracking_info = mysqli_fetch_array($r_get_tracking_info)) {



    $url = $row_get_tracking_info['url'];
    $url_count = $row_get_tracking_info['url_count'];






    // EXPLODE THE DOMAIN PARTS
    $domain_parts = explode('/', $url);






    // FIND THE TOTAL COUNTS FOR EACH LEVEL OF ARRAY
    // - SO THAT WE CAN DIVIDE BY IT LATER TO GET THE PERCENTAGE
    $count = count($domain_parts);
    if (!isset($level_totals[$domain_parts[0]])) {
        $level_totals[$domain_parts[0]] = array();
    }

    if (isset($level_totals[$domain_parts[0]][$count])) {
        $level_totals[$domain_parts[0]][$count] += $url_count;
    }
    else {
        $level_totals[$domain_parts[0]][$count] = $url_count;
    }






    // BUILD A TRUNK ARRAY SO WE CAN DEFINE SECTIONS
    if ($url == $domain_parts[0]) {
        $trunk_array[] = array($url, $url_count);
    }






    // BUILD A MASTER ARRAY OF THE ITEMS AS WE WILL LAY THEM OUT
    $master_array[$url] = $url_count;


}






// FIND THE TOTAL TRUNK COUNT SO WE CAN DIVIDE BY IT LATER
$total_trunk_count = 0;
foreach ($trunk_array AS $trunk_array_key => $trunk_array_val) {
    foreach($trunk_array_val AS $trunk_count_val) {
        $total_trunk_count += $trunk_count_val[0];
    }
}






// LOOP THROUGH THE TRUNK ITEMS AND PULL OUT ANY MATCHES FOR THAT TRUNK
foreach ($trunk_array AS $trunk_item_key => $trunk_item_val) {



    $trunk_item = $trunk_item_val[0];
    $trunk_count = $trunk_item_val[1];






    // FIND THE PERCENTAGE THIS TRUNK WAS ACCESSED
    $trunk_percent = round(($master_array[$trunk_item] / $total_trunk_count) * 100);






    // PRINT THE TRUNK OUT    
    print "<BR><BR>".$trunk_item.' - ('.$trunk_percent.'%)';






    // LOOP THROUGH THE MASTER ARRAY AND GET THE RESULTS FOR ANY PATHS UNDER THE TRUNK
    foreach ($master_array AS $master_array_key => $master_array_val) {



        // PERFORM A MATCH FOR DOMAINS BELONGING TO THIS PARTICULAR TRUNK
        if (preg_match('/^'.$trunk_item.'/', $master_array_key)) {



            // SET A DEFAULT DELIMITER PAD
            $delimiter_pad = '';






            // EXPLODE EACH PATH INTO PARTS AND COUNT HOW MANY PARTS WE HAVE
            $domain_parts_2 = explode('/', $master_array_key);
            $count = count($domain_parts_2);






            // SET THE DELIMITER FOR HOW FAR DOWN ON THE TREE WE ARE
            // EACH INDENT WILL HAVE 8 SPACES
            for ($i = 2; $i <= $count; $i++) {
                $delimiter_pad .= '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;';
            }






            // SINCE WE ALREADY PRINTED OUT THE TRUNK, WE WILL ONLY SHOW ITEMS THAT ARE NOT THE TRUNK
            if ($master_array_key != $trunk_item) {



                // FIND THE PERCENTAGE OF THE ITEM, GIVEN THEIR LEVEL IN THE TREE
                $path_percentage = round(($master_array[$master_array_key] / $level_totals[$trunk_item][$count]) * 100);






                // PRINT OUT THE PATH AND PERCENTAGE
                print "<BR>".$delimiter_pad."|- ".$master_array_key.' - ('.$path_percentage.'%)';



            }



        }



    }



}

In the end, all of that outputs this:

cnn.com - (75%)
        |- cnn.com/employees - (50%)
                |- cnn.com/employees/friekot - (100%)
        |- cnn.com/news - (50%)
                        |- cnn.com/news/id/1 - (100%)

domain.com - (25%)
        |- domain.com/about - (17%)
        |- domain.com/contact - (33%)
        |- domain.com/news - (17%)
                        |- domain.com/news/id/1 - (75%)
                        |- domain.com/news/id/2 - (25%)
        |- domain.com/prices - (33%)

There may be an easier way to do this, but this is the method that came to mind for me. I hope this works for you!

OTHER TIPS

Okay, so you can write a query like this which will look for a particular user and site and spit out each url that they visited in the order they visited them.

$sql = "SELECT url FROM journies WHERE site_id = 1 AND profile_id = 1 ORDER BY created_at";

$result = mysqli_query($connection, $sql);

while ($row = mysqli_fetch_array($result)) {
    $page_array[] = $row['url'];
}

$page_list = implode("\n =>", $page_array);

This will output:

domain.com
 => domain.com/about
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top