Question

I'm trying to put together a SQL query to gather the top 10 looked-at news items within the past week. I also need it to filter the duplicate ip addresses that have looked at the same news item.

Each time a user enters a page the user's browser query string is taken.
Here's an example of the db setup:

datetime   | ipaddress     | querystring
-----------------------------------------
9/12/2011  | 65.65.65.651  | newsid=3512
9/12/2011  | 65.65.65.658  | newsid=3512
10/12/2011 | 65.65.65.653  | newsid=3514
11/12/2011 | 65.65.65.656  | newsid=3515
11/12/2011 | 65.65.65.651  | newsid=3515
13/12/2011 | 65.65.65.651  | newsid=3516
14/12/2011 | 65.65.65.650  | newsid=3516
14/12/2011 | 65.65.65.650  | newsid=3516

My failed attempt:

 SELECT DISTINCT TOP 10 ipaddress, querystring, Count(*) AS thecount
      FROM [thedb].[dbo].[tblwebstats] 
      WHERE querystring LIKE '%newsid=%' AND datetime > (1 week ago)
      GROUP BY querystring, ipaddress
      ORDER BY Count(*) DESC

Please help me out :)

Was it helpful?

Solution

How about something like this?

select top 10 querystring, count(querystring) as popularity
from 
(
    select distinct ipaddress, querystring
    from 
    (
        select [datetime], ipaddress, querystring
        from tblwebstats
        where querystring LIKE '%newsid=%' AND [datetime] > dateadd(day, -7, getdate())
    ) as datefilter
) as distinctfilter
group by querystring
order by popularity desc

This query does the following (innermost to outermost):

  1. Filters the original table by date range and querystring as required
  2. Reduces the results from (1) down to distinct pairs of (IP address, querystring), ignoring date
  3. Counts the unique querystring occurrences from (2) and returns the top 10 of them in descending order by count.

OTHER TIPS

I assume when you say "...filter the duplicate ip addresses..." you only want the same news article, requested from the same IP address, to be counted once (per day?)

If so you need to filter the duplicates before getting the articles, try something like:

WITH Unique_Requests AS ( 
    SELECT DISTINCT datetime, ipaddress, querystring
    FROM [thedb].[dbo].[tblwebstats]
    WHERE datetime >= DATEADD(week, -1, CURRENT_TIMESTAMP) AND
        querystring LIKE '%newsid=%'
)

SELECT TOP 10 querystring, Count(*) AS thecount
FROM Unique_Requests
GROUP BY querystring
ORDER BY Count(*) DESC

Not really happy with the statement below. Too much nesting without taking into temp tables. If you take the data into an additional temp tables, the cross apply will be less heavier.

DECLARE @t as table(Created datetime,IPAddress varchar(15),QueryString VARCHAR(20))

INSERT INTO @t(Created,IPAddress,QueryString) VALUES
('2012-11-9' ,'65.65.65.651' ,'newsid=3512' ),
('2012-11-9','65.65.65.658','newsid=3512'),
('2012-11-10','65.65.65.653','newsid=3514'),
('2011-12-11','65.65.65.656','newsid=3515'),
('2011-12-11','65.65.65.651','newsid=3515'),
('2011-12-13','65.65.65.651','newsid=3516'),
('2011-12-14','65.65.65.650','newsid=3516'),
('2011-12-14','65.65.65.650','newsid=3516')

SELECT TOP 10 QueryString,DistinctIp,COUNT(1) Counter FROM (
SELECT DISTINCT Created,IPAddress,DistinctIp,QueryString
FROM @t t
CROSS APPLY (SELECT DISTINCT COUNT(1) DistinctIp FROM @t WHERE Created = t.Created ANd QueryString = t.QueryString) g
WHERE Created >= CAST((GETDATE()-7) AS DATE) AND
    QueryString LIKE '%newsid=%'
) x
GROUP BY QueryString,DistinctIp 
ORDER BY Counter DESC

Result of the statement will contain additional count of distinct ip addresses.

QueryString|DistinctIp|Counter
newsid=3515|2|2
newsid=3512|2|2
newsid=3516|2|1
newsid=3516|1|1
newsid=3514|1|1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top