Okay, this was a doozy, but I think I've finally got something for you to work with.
The premise is to pull out the URLs and counts from the database, grouping and sorting them by url. Store those results into a master array. Then, find the base domain of the urls, which I called the trunk. Loop through the master array and find any items that have the same base domain as the trunk and find their percentages.
One other thing to note is that these are level-specific. So cnn.com/employees
(2 levels) would be on the same level as cnn.com/news
.
Here is the SQL data that I used for this:
INSERT INTO `journies` (`id`, `site_id`, `profile_id`, `url`, `created_at`, `updated_at`) VALUES
(1, 1, 1, 'domain.com', '2014-02-19 15:34:54', '0000-00-00 00:00:00'),
(2, 1, 1, 'domain.com/about', '2014-02-19 15:35:57', '0000-00-00 00:00:00'),
(3, 1, 1, 'domain.com/contact', '2014-02-19 15:36:12', '0000-00-00 00:00:00'),
(4, 1, 1, 'domain.com/news', '2014-02-19 15:36:29', '0000-00-00 00:00:00'),
(5, 1, 1, 'domain.com/news/id/1', '2014-02-19 15:39:26', '0000-00-00 00:00:00'),
(6, 1, 1, 'domain.com/contact', '2014-02-19 15:50:26', '0000-00-00 00:00:00'),
(7, 1, 1, 'cnn.com/news/id/1', '2014-02-19 16:00:02', '0000-00-00 00:00:00'),
(8, 1, 1, 'cnn.com/news', '2014-02-19 16:00:15', '0000-00-00 00:00:00'),
(9, 1, 1, 'cnn.com', '2014-02-19 16:00:25', '0000-00-00 00:00:00'),
(10, 1, 1, 'cnn.com', '2014-02-19 16:46:16', '0000-00-00 00:00:00'),
(11, 1, 1, 'cnn.com', '2014-02-19 16:46:16', '0000-00-00 00:00:00'),
(12, 1, 1, 'domain.com/news/id/1', '2014-02-20 08:47:23', '0000-00-00 00:00:00'),
(13, 1, 1, 'domain.com/news/id/1', '2014-02-20 08:47:23', '0000-00-00 00:00:00'),
(14, 1, 1, 'domain.com/news/id/2', '2014-02-20 08:53:29', '0000-00-00 00:00:00'),
(15, 1, 1, 'domain.com/prices', '2014-02-20 12:40:44', '0000-00-00 00:00:00'),
(16, 1, 1, 'domain.com/prices', '2014-02-20 12:40:44', '0000-00-00 00:00:00'),
(17, 1, 1, 'cnn.com/employees/friekot', '2014-02-20 15:23:34', '0000-00-00 00:00:00'),
(18, 1, 1, 'cnn.com/employees', '2014-02-20 15:23:34', '0000-00-00 00:00:00');
And here is the code that I came up with:
<?php
$link = mysqli_connect("localhost", "user", "pass", "database");
// SET THE DEFAULTS
$trunk_array = array();
$master_array = array();
// PULL OUT THE DATA FROM THE DATABASE
$q_get_tracking_info = "SELECT *, COUNT(url) AS url_count FROM journies WHERE site_id = 1 AND profile_id = 1 GROUP BY url ORDER BY url;";
$r_get_tracking_info = mysqli_query($link, $q_get_tracking_info) or trigger_error("Cannot Get Tracking Info: (".mysqli_error().")", E_USER_ERROR);
while ($row_get_tracking_info = mysqli_fetch_array($r_get_tracking_info)) {
$url = $row_get_tracking_info['url'];
$url_count = $row_get_tracking_info['url_count'];
// EXPLODE THE DOMAIN PARTS
$domain_parts = explode('/', $url);
// FIND THE TOTAL COUNTS FOR EACH LEVEL OF ARRAY
// - SO THAT WE CAN DIVIDE BY IT LATER TO GET THE PERCENTAGE
$count = count($domain_parts);
if (!isset($level_totals[$domain_parts[0]])) {
$level_totals[$domain_parts[0]] = array();
}
if (isset($level_totals[$domain_parts[0]][$count])) {
$level_totals[$domain_parts[0]][$count] += $url_count;
}
else {
$level_totals[$domain_parts[0]][$count] = $url_count;
}
// BUILD A TRUNK ARRAY SO WE CAN DEFINE SECTIONS
if ($url == $domain_parts[0]) {
$trunk_array[] = array($url, $url_count);
}
// BUILD A MASTER ARRAY OF THE ITEMS AS WE WILL LAY THEM OUT
$master_array[$url] = $url_count;
}
// FIND THE TOTAL TRUNK COUNT SO WE CAN DIVIDE BY IT LATER
$total_trunk_count = 0;
foreach ($trunk_array AS $trunk_array_key => $trunk_array_val) {
foreach($trunk_array_val AS $trunk_count_val) {
$total_trunk_count += $trunk_count_val[0];
}
}
// LOOP THROUGH THE TRUNK ITEMS AND PULL OUT ANY MATCHES FOR THAT TRUNK
foreach ($trunk_array AS $trunk_item_key => $trunk_item_val) {
$trunk_item = $trunk_item_val[0];
$trunk_count = $trunk_item_val[1];
// FIND THE PERCENTAGE THIS TRUNK WAS ACCESSED
$trunk_percent = round(($master_array[$trunk_item] / $total_trunk_count) * 100);
// PRINT THE TRUNK OUT
print "<BR><BR>".$trunk_item.' - ('.$trunk_percent.'%)';
// LOOP THROUGH THE MASTER ARRAY AND GET THE RESULTS FOR ANY PATHS UNDER THE TRUNK
foreach ($master_array AS $master_array_key => $master_array_val) {
// PERFORM A MATCH FOR DOMAINS BELONGING TO THIS PARTICULAR TRUNK
if (preg_match('/^'.$trunk_item.'/', $master_array_key)) {
// SET A DEFAULT DELIMITER PAD
$delimiter_pad = '';
// EXPLODE EACH PATH INTO PARTS AND COUNT HOW MANY PARTS WE HAVE
$domain_parts_2 = explode('/', $master_array_key);
$count = count($domain_parts_2);
// SET THE DELIMITER FOR HOW FAR DOWN ON THE TREE WE ARE
// EACH INDENT WILL HAVE 8 SPACES
for ($i = 2; $i <= $count; $i++) {
$delimiter_pad .= ' ';
}
// SINCE WE ALREADY PRINTED OUT THE TRUNK, WE WILL ONLY SHOW ITEMS THAT ARE NOT THE TRUNK
if ($master_array_key != $trunk_item) {
// FIND THE PERCENTAGE OF THE ITEM, GIVEN THEIR LEVEL IN THE TREE
$path_percentage = round(($master_array[$master_array_key] / $level_totals[$trunk_item][$count]) * 100);
// PRINT OUT THE PATH AND PERCENTAGE
print "<BR>".$delimiter_pad."|- ".$master_array_key.' - ('.$path_percentage.'%)';
}
}
}
}
In the end, all of that outputs this:
cnn.com - (75%)
|- cnn.com/employees - (50%)
|- cnn.com/employees/friekot - (100%)
|- cnn.com/news - (50%)
|- cnn.com/news/id/1 - (100%)
domain.com - (25%)
|- domain.com/about - (17%)
|- domain.com/contact - (33%)
|- domain.com/news - (17%)
|- domain.com/news/id/1 - (75%)
|- domain.com/news/id/2 - (25%)
|- domain.com/prices - (33%)
There may be an easier way to do this, but this is the method that came to mind for me. I hope this works for you!