how to get num uniques week to date but having the unique period roll with date

https://stackoverflow.com/questions/12380514

01-07-2021
|

Question

Very simplified, a table with some sample data:

action_date account_id
1/1/2010    123
1/1/2010    123
1/1/2010    456
1/2/2010    123
1/3/2010    789

For the data above, I need a query that will give the following:

action_date num_events  num_unique_accounts  num_unique_accounts_wtd
1/1/2010    3           2                    2
1/2/2010    1           1                    2
1/3/2010    1           1                    3

As you can see here, num_unique_accounts_wtd gives a kind of rolling end date for the unique period...

At first, one would think a query of the form

WITH
    events AS
    (
        SELECT
            action_date
            , COUNT(account_id) num_events
            , COUNT(DISTINCT account_id) num_unique_accounts
        FROM     actions
        GROUP BY action_date
    )
SELECT
    action_date
    , num_events
    , num_unique_accounts
    , SUM(num_unique_accounts) OVER (PARTITION BY NEXT_DAY(action_date, 'Monday') - 7 ORDER BY action_date ASC) num_unique_accounts_wtd
FROM events

would work but if you look closely it just adds the num_unique_accounts each day.. if the query were to run, for clarity, for 1/2/2010 it would give num_unique_accounts_wtd = 3 because of the 2 + 1.

Any ideas?

EDIT: Added one more row of data and output for clarity

Solution 2

It seemed the answer may have been to be able to modify the analytical function to include something of the form

COUNT(DISTINCT ...) OVER (PARTITION BY ... ORDER BY ... RANGE BETWEEN ... AND ...)

because RANGE BETWEEN allows expressions so the PARTITION BY window can be further subsetted in order to get what we're looking for -- unfortunately Oracle gives a

ORA-30487 DISTINCT functions and RATIO_TO_REPORT cannot have an ORDER BY

error so we can't use this.

Subsequent to googling the error I found others attempting the same thing (here and here) and within the links two answers were found -- one of which I used for my real-world data.

For reference, the answer for this question with the model in the original post would be something of the form:

SELECT    action_date, COUNT(account_id) num_attempts, MAX(num_accounts) num_unique_accounts_wtd
FROM
(
    SELECT
        action_date
        , account_id
        , SUM(is_unique) OVER (PARTITION BY NEXT_DAY(action_date, 'Monday') - 7 ORDER BY action_date ASC, account_id ASC) num_accounts
    FROM
    (
        SELECT
            action_date
            , account_id
            , CASE
                WHEN LAG(account_id) OVER (PARTITION BY NEXT_DATE(action_date, 'Monday') - 7, account_id ORDER BY action_date ASC) = account_id 
                THEN 0
                ELSE 1
            END is_unique
            FROM
                actions
    )
)
GROUP BY  action_date

So the data is

iterated and determines if, for the week for each account number, it is unique or not
then for each week, first order the set by action date then account_id and creating a running total
group by action date and take the max week to date number

OTHER TIPS

I would split the events query in 2:

WITH
    events1 AS
    (
        SELECT 
               NEXT_DAY(action_date, 1) - 7 week
             , action_date             
             , COUNT(account_id) num_events
             , COUNT(DISTINCT account_id) num_unique_accounts
        FROM     actions
        GROUP BY action_date
    ),
    events2 AS
    (
        SELECT NEXT_DAY(action_date, 1) - 7 week               
             , COUNT(DISTINCT account_id) num_unique_accounts_wtd
        FROM     actions
        GROUP BY NEXT_DAY(action_date, 1)
    )
SELECT events1.*, events2.num_unique_accounts_wtd
  FROM events1, events2 
 WHERE events1.week = events2.week

where events1 will select the number of distinct accounts for a day, while events2 will select the number of distinct accounts per week.

EDIT: I understand now the request. But the only idea that I have will be quite heavy if the number of rows in the actions table is very high:

WITH
events AS
(
    SELECT 
           NEXT_DAY(action_date, 1) - 7 week
         , action_date             
         , COUNT(account_id) num_events
         , COUNT(DISTINCT account_id) num_unique_accounts
    FROM     actions
    GROUP BY action_date 
)      
SELECT events.*, 
      (SELECT COUNT(DISTINCT(account_id)) 
         FROM actions 
        WHERE action_date < events.week + 7) as num_unique_accounts_wtd
 FROM events
ORDER BY events.action_date

As you see, the idea is to (re)count all the distinct account_id for each row of the events subquery.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow