Top 10 occuring values in a table
-
15-03-2021 - |
Question
I'm trying to put together a SQL query to gather the top 10 looked-at news items within the past week. I also need it to filter the duplicate ip addresses that have looked at the same news item.
Each time a user enters a page the user's browser query string is taken.
Here's an example of the db setup:
datetime | ipaddress | querystring
-----------------------------------------
9/12/2011 | 65.65.65.651 | newsid=3512
9/12/2011 | 65.65.65.658 | newsid=3512
10/12/2011 | 65.65.65.653 | newsid=3514
11/12/2011 | 65.65.65.656 | newsid=3515
11/12/2011 | 65.65.65.651 | newsid=3515
13/12/2011 | 65.65.65.651 | newsid=3516
14/12/2011 | 65.65.65.650 | newsid=3516
14/12/2011 | 65.65.65.650 | newsid=3516
My failed attempt:
SELECT DISTINCT TOP 10 ipaddress, querystring, Count(*) AS thecount
FROM [thedb].[dbo].[tblwebstats]
WHERE querystring LIKE '%newsid=%' AND datetime > (1 week ago)
GROUP BY querystring, ipaddress
ORDER BY Count(*) DESC
Please help me out :)
Solution
How about something like this?
select top 10 querystring, count(querystring) as popularity
from
(
select distinct ipaddress, querystring
from
(
select [datetime], ipaddress, querystring
from tblwebstats
where querystring LIKE '%newsid=%' AND [datetime] > dateadd(day, -7, getdate())
) as datefilter
) as distinctfilter
group by querystring
order by popularity desc
This query does the following (innermost to outermost):
- Filters the original table by date range and querystring as required
- Reduces the results from (1) down to distinct pairs of (IP address, querystring), ignoring date
- Counts the unique querystring occurrences from (2) and returns the top 10 of them in descending order by count.
OTHER TIPS
I assume when you say "...filter the duplicate ip addresses..." you only want the same news article, requested from the same IP address, to be counted once (per day?)
If so you need to filter the duplicates before getting the articles, try something like:
WITH Unique_Requests AS (
SELECT DISTINCT datetime, ipaddress, querystring
FROM [thedb].[dbo].[tblwebstats]
WHERE datetime >= DATEADD(week, -1, CURRENT_TIMESTAMP) AND
querystring LIKE '%newsid=%'
)
SELECT TOP 10 querystring, Count(*) AS thecount
FROM Unique_Requests
GROUP BY querystring
ORDER BY Count(*) DESC
Not really happy with the statement below. Too much nesting without taking into temp tables. If you take the data into an additional temp tables, the cross apply will be less heavier.
DECLARE @t as table(Created datetime,IPAddress varchar(15),QueryString VARCHAR(20))
INSERT INTO @t(Created,IPAddress,QueryString) VALUES
('2012-11-9' ,'65.65.65.651' ,'newsid=3512' ),
('2012-11-9','65.65.65.658','newsid=3512'),
('2012-11-10','65.65.65.653','newsid=3514'),
('2011-12-11','65.65.65.656','newsid=3515'),
('2011-12-11','65.65.65.651','newsid=3515'),
('2011-12-13','65.65.65.651','newsid=3516'),
('2011-12-14','65.65.65.650','newsid=3516'),
('2011-12-14','65.65.65.650','newsid=3516')
SELECT TOP 10 QueryString,DistinctIp,COUNT(1) Counter FROM (
SELECT DISTINCT Created,IPAddress,DistinctIp,QueryString
FROM @t t
CROSS APPLY (SELECT DISTINCT COUNT(1) DistinctIp FROM @t WHERE Created = t.Created ANd QueryString = t.QueryString) g
WHERE Created >= CAST((GETDATE()-7) AS DATE) AND
QueryString LIKE '%newsid=%'
) x
GROUP BY QueryString,DistinctIp
ORDER BY Counter DESC
Result of the statement will contain additional count of distinct ip addresses.
QueryString|DistinctIp|Counter
newsid=3515|2|2
newsid=3512|2|2
newsid=3516|2|1
newsid=3516|1|1
newsid=3514|1|1