Вопрос

Building a web analytics query. The goal of this query is to find out the average page views per session of people who viewed a certain page, so we can report on data such as:

  • "sessions that loaded our home page averaged 2.1 pages"
  • "sessions that loaded a particular article averages 2.4 pages"

and so on.

I'm using HyperSQL DB. All data is one table that basically looks like this:

session_id | event | page_id 1 | 'page load' | 1 1 | 'user action' | 1 1 | 'page load' | 2 2 | 'page load' | 1 3 | 'page load' | 1 3 | 'page load' | 2 3 | 'user action' | 2 3 | 'page load' | 3 ... etc ...

In my queries/attempts so far, I am grouping by PageID. I need to get the Session IDs that reference this initial set of Page IDs, and then query again to get all Page IDs that are referenced by my new set of Session IDs.

THEN, I want to AVG the 'page load' events for this set of Session IDs.

Make sense? I've tried a number of things, but am inexperienced enough with SQL that I can't crack it. I've tried some inner joins, and some subqueries (which gave me cardinality violations).

Update The desired output would look something like:

page_id | sessions_including_this_page | avg_pages_per_session 1 | 2 | 2.1 2 | 4 | 1.7

Thanks!

Update 2 If I were doing this in our server side Javascript, it would look something like this:

var events = {
{ session_id:  1,  event: 'page_load', page_id:1 },
{ session_id:  1,  event: 'page_load', page_id:2 },
{ session_id:  1,  event: 'page_load', page_id:3 },
{ session_id:  2,  event: 'page_load', page_id:1 },
{ session_id:  3,  event: 'page_load', page_id:1 },
{ session_id:  3,  event: 'page_load', page_id:2 }
};

// get session IDs that loaded page_id = 2
var sessions_viewing_page2 = [];  // array to store session IDs
for ( var i in events ) {
    if ( events[i].page_id === 2 ) sessions_viewing_page2.push( events[i].session_id );
}
// so now:  sessions_viewing_page2 = [1,3];

// get total page loads for those sessions that viewed page_id==2
// we'll iterate through events again
// and check if a session ID is in our array
var pageloads_per_session = {}; // obj to store page load counts by session ID
for (var j in events) {
  if ( sessions_viewing_page2.indexOf( events[j].session_id ) != -1 ) {
    // are we already incrementing this session ID?    
    if ( !pageloads_per_session[events[j].session_id] ) pageloads_per_session[events[j] = 1;
    else pageloads_per_session[events[j]++; 
  }
}
// this gives us
// pageloads_per_session[1] = 3;
// pageloads_per_session[3] = 2;

// then, since I know each session_id in pageloads_per_session viewed page_id==2... I can calculate "average page loads per session that viewed page_id == 2".
// in this case... we have 2 distinct sessions (1,3), and 5 total page loads (3+2)... for an average of 2.5 page loads per session that included page_id == 2.

// quite a mouthful.  thanks!

`

Это было полезно?

Решение

I think this is what you want:

select a.page_id, a.num_ses, avg(c.num_pg_ld_sespg) as avg_ses_pg_exist
  from (select page_id, count(distinct session_id) as num_ses
          from tbl
        where event = 'page load'
         group by page_id) a
  join (select session_id, count(*) as num_pg_ld_ses
          from tbl
          where event = 'page load'
         group by session_id) b
  join (select session_id, page_id, count(*) as num_pg_ld_sespg
          from tbl
        where event = 'page load'
         group by session_id, page_id) c
    on a.page_id = c.page_id
   and b.session_id = c.session_id
 group by a.page_id, a.num_ses
 order by a.page_id

See sqlfiddle test at: http://sqlfiddle.com/#!2/d79a2/1/0

Note that I added one row other than your example data: insert into tbl values (2, 'page load', 1);

Because as is the example data would have an average - in the 3rd column - of 1.

I'm calculating the 3rd column's average as the average number of page loads per session where the session has at least one page load for the page on the given row, but the '# of page loads' part of that statement considers all page loads and not just those for the page on the given row.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top