Question

I have a table that stores the page hits on a web application, storing

unique_row_id  http_session_id  page_name   page_hit_timestamp
----------------------------------------------------------------
0              123456789        index.html  2010-01-20 15:00:00
1              123456789        info.html   2010-01-20 15:00:05
2              123456789        faq.html    2010-01-20 15:00:15
3              987654321        index.html  2010-01-20 16:00:00
4              987654321        faq.html    2010-01-20 16:00:05
5              987654321        info.html   2010-01-20 16:00:15
6              111111111        index.html  2010-01-20 16:01:00
7              111111111        faq.html    2010-01-20 16:01:05
8              111111111        info.html   2010-01-20 16:01:15

I want to run a sql query that will show me the most common page that users end browsing on.

So my initial thinking is that in my (java) app, I can run a query that will select the distinct http_session_id values from the table, and then for each distinct http_session_id, run another query that gets the page with the 'latest' page_hit_timestamp, and sum a total for of all these pages. (For the sample data above, I'd have a count of 2 for info.html and a count of 1 for faq.html.)

But, what I'd like to know is this: is there a way to combine these two queries into a single sql statement - or would I have to go down the stored procedure route for that ?

I've had a look at using join, but I can't figure out if its applicable in this scenario.

PS - I know that I could use the likes of Google Analytics in my app to provide this info for me but a) this is a mobile web app so not great for off the shelf analytics tools, and b) I'm just curious to know if this can be done in SQL.

Was it helpful?

Solution

This should do what you want:

select 1.page_name, count(*) as ExitPageCount
from WebLog l
inner join (
    select http_session_id, max(page_hit_timestamp)
    from WebLog
    group by session
) lm on l.http_session_id = lm.http_session_id and l.page_hit_timestamp = lm.page_hit_timestamp
group by 1.page_name

OTHER TIPS

SELECT http_session_id, page_name, COUNT(page_name), MAX(page_hit_timestamp)
    FROM table
    GROUP BY http_session_id, page_name

This will return a row for each http_session_id and page_name combination, and that row will contain:

  • http_session_id
  • page_name
  • the count of how many times the (http_session_id+page_name) combination occurs in the table
  • the latest (MAX) timestamp for the combination

Can you provide your two queries, I could turn them into a JOIN for you easily, or possibly a subquery depending upon your needs.

The query below lists the last accessed pages,

select http_session_id,page_name,page_hit_timestamp from 
(select row_number() over( partition by t.http_session_id order by t.page_hit_timestamp desc) rn,t.* from weblog t
) where rn=1;


if you want count, then the query below may help

select page_name,count(*) from (select 
row_number() over( partition by t.http_session_id order by t.page_hit_timestamp desc) rn,t.* from weblog t
) where rn=1
group by page_name;
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top