how to get num uniques week to date but having the unique period roll with date
-
01-07-2021 - |
Question
Very simplified, a table with some sample data:
action_date account_id
1/1/2010 123
1/1/2010 123
1/1/2010 456
1/2/2010 123
1/3/2010 789
For the data above, I need a query that will give the following:
action_date num_events num_unique_accounts num_unique_accounts_wtd
1/1/2010 3 2 2
1/2/2010 1 1 2
1/3/2010 1 1 3
As you can see here, num_unique_accounts_wtd gives a kind of rolling end date for the unique period...
At first, one would think a query of the form
WITH
events AS
(
SELECT
action_date
, COUNT(account_id) num_events
, COUNT(DISTINCT account_id) num_unique_accounts
FROM actions
GROUP BY action_date
)
SELECT
action_date
, num_events
, num_unique_accounts
, SUM(num_unique_accounts) OVER (PARTITION BY NEXT_DAY(action_date, 'Monday') - 7 ORDER BY action_date ASC) num_unique_accounts_wtd
FROM events
would work but if you look closely it just adds the num_unique_accounts each day.. if the query were to run, for clarity, for 1/2/2010 it would give num_unique_accounts_wtd = 3 because of the 2 + 1.
Any ideas?
EDIT: Added one more row of data and output for clarity
Solution 2
It seemed the answer may have been to be able to modify the analytical function to include something of the form
COUNT(DISTINCT ...) OVER (PARTITION BY ... ORDER BY ... RANGE BETWEEN ... AND ...)
because RANGE BETWEEN allows expressions so the PARTITION BY window can be further subsetted in order to get what we're looking for -- unfortunately Oracle gives a
ORA-30487 DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY
error so we can't use this.
Subsequent to googling the error I found others attempting the same thing (here and here) and within the links two answers were found -- one of which I used for my real-world data.
For reference, the answer for this question with the model in the original post would be something of the form:
SELECT action_date, COUNT(account_id) num_attempts, MAX(num_accounts) num_unique_accounts_wtd
FROM
(
SELECT
action_date
, account_id
, SUM(is_unique) OVER (PARTITION BY NEXT_DAY(action_date, 'Monday') - 7 ORDER BY action_date ASC, account_id ASC) num_accounts
FROM
(
SELECT
action_date
, account_id
, CASE
WHEN LAG(account_id) OVER (PARTITION BY NEXT_DATE(action_date, 'Monday') - 7, account_id ORDER BY action_date ASC) = account_id
THEN 0
ELSE 1
END is_unique
FROM
actions
)
)
GROUP BY action_date
So the data is
- iterated and determines if, for the week for each account number, it is unique or not
- then for each week, first order the set by action date then account_id and creating a running total
- group by action date and take the max week to date number
OTHER TIPS
I would split the events query in 2:
WITH
events1 AS
(
SELECT
NEXT_DAY(action_date, 1) - 7 week
, action_date
, COUNT(account_id) num_events
, COUNT(DISTINCT account_id) num_unique_accounts
FROM actions
GROUP BY action_date
),
events2 AS
(
SELECT NEXT_DAY(action_date, 1) - 7 week
, COUNT(DISTINCT account_id) num_unique_accounts_wtd
FROM actions
GROUP BY NEXT_DAY(action_date, 1)
)
SELECT events1.*, events2.num_unique_accounts_wtd
FROM events1, events2
WHERE events1.week = events2.week
where events1 will select the number of distinct accounts for a day, while events2 will select the number of distinct accounts per week.
EDIT: I understand now the request. But the only idea that I have will be quite heavy if the number of rows in the actions table is very high:
WITH
events AS
(
SELECT
NEXT_DAY(action_date, 1) - 7 week
, action_date
, COUNT(account_id) num_events
, COUNT(DISTINCT account_id) num_unique_accounts
FROM actions
GROUP BY action_date
)
SELECT events.*,
(SELECT COUNT(DISTINCT(account_id))
FROM actions
WHERE action_date < events.week + 7) as num_unique_accounts_wtd
FROM events
ORDER BY events.action_date
As you see, the idea is to (re)count all the distinct account_id for each row of the events subquery.