Building a web analytics query. The goal of this query is to find out the average page views per session of people who viewed a certain page, so we can report on data such as:
- "sessions that loaded our home page averaged 2.1 pages"
- "sessions that loaded a particular article averages 2.4 pages"
and so on.
I'm using HyperSQL DB. All data is one table that basically looks like this:
session_id | event | page_id
1 | 'page load' | 1
1 | 'user action' | 1
1 | 'page load' | 2
2 | 'page load' | 1
3 | 'page load' | 1
3 | 'page load' | 2
3 | 'user action' | 2
3 | 'page load' | 3
... etc ...
In my queries/attempts so far, I am grouping by PageID. I need to get the Session IDs that reference this initial set of Page IDs, and then query again to get all Page IDs that are referenced by my new set of Session IDs.
THEN, I want to AVG the 'page load' events for this set of Session IDs.
Make sense?
I've tried a number of things, but am inexperienced enough with SQL that I can't crack it. I've tried some inner joins, and some subqueries (which gave me cardinality violations).
Update
The desired output would look something like:
page_id | sessions_including_this_page | avg_pages_per_session
1 | 2 | 2.1
2 | 4 | 1.7
Thanks!
Update 2
If I were doing this in our server side Javascript, it would look something like this:
var events = {
{ session_id: 1, event: 'page_load', page_id:1 },
{ session_id: 1, event: 'page_load', page_id:2 },
{ session_id: 1, event: 'page_load', page_id:3 },
{ session_id: 2, event: 'page_load', page_id:1 },
{ session_id: 3, event: 'page_load', page_id:1 },
{ session_id: 3, event: 'page_load', page_id:2 }
};
// get session IDs that loaded page_id = 2
var sessions_viewing_page2 = []; // array to store session IDs
for ( var i in events ) {
if ( events[i].page_id === 2 ) sessions_viewing_page2.push( events[i].session_id );
}
// so now: sessions_viewing_page2 = [1,3];
// get total page loads for those sessions that viewed page_id==2
// we'll iterate through events again
// and check if a session ID is in our array
var pageloads_per_session = {}; // obj to store page load counts by session ID
for (var j in events) {
if ( sessions_viewing_page2.indexOf( events[j].session_id ) != -1 ) {
// are we already incrementing this session ID?
if ( !pageloads_per_session[events[j].session_id] ) pageloads_per_session[events[j] = 1;
else pageloads_per_session[events[j]++;
}
}
// this gives us
// pageloads_per_session[1] = 3;
// pageloads_per_session[3] = 2;
// then, since I know each session_id in pageloads_per_session viewed page_id==2... I can calculate "average page loads per session that viewed page_id == 2".
// in this case... we have 2 distinct sessions (1,3), and 5 total page loads (3+2)... for an average of 2.5 page loads per session that included page_id == 2.
// quite a mouthful. thanks!
`