Question

I'm attempting to count the number of items if they are not within 30 seconds of the first item in a "group". I'm having a hard time figuring this out.

So, I have this table:

WITH ADates AS (
    SELECT
          Id
        , SharedId
        , TheDateTime
    FROM (VALUES
        (CAST(1 AS int), CAST(1 AS int), CAST('2019-01-01 01:01:00.00' AS datetime2(7))),
        (2, 1, '2019-01-01 01:01:33.00'),
        (3, 1, '2019-01-01 01:02:00.00'),
        (4, 1, '2019-01-01 01:02:01.00'),
        (5, 1, '2019-01-01 01:02:04.00'),
        (6, 1, '2019-01-01 01:06:15.00'),
        (7, 2, '2019-01-01 01:06:00.00'),
        (8, 2, '2019-01-01 01:06:45.00'),
        (9, 1, '2019-01-01 01:02:31.00'),
        (10, 2, '2019-01-01 01:06:05.00'),
        (11, 2, '2019-01-01 01:06:46.00'),
    ) X (Id, SharedId, TheDateTime)
)

So, the expected result that I'm looking for is:

+==========+=======+
| SharedId | Count |
+==========+=======+
| 1        | 4     |
+----------+-------+
| 2        | 2     |
+----------+-------+

The numbers are determined by:

  1. Count since first in new group.
  2. Not within 30 seconds of previous group so it is a new group and count.
  3. Don't count since it is within 30 seconds of 2.
  4. Don't count since it is within 30 seconds of 2.
  5. Count since not within 30 seconds of previous group (Item 2).
  6. Count since not within 30 seconds of previous group (item 2).
  7. Count in new group for SharedId.
  8. Count since not within previous grouping.

I'm thinking I should be doing a Window Function for this. Just not sure how to have it rely on just the first of the group.

Was it helpful?

Solution

Problem Categorization

I was looking for a way to use an analytical function that keeps track of inline manipulations. A single run analytical function is only able to perform so much, but not to the extend to solve this problem. The problem with nesting analytical functions is that we loose information about our dynamic pattern.

To allow dynamic inline pattern matching, in Oracle you can use MATCH_RECOGNIZE. I had no clue how to do it in Sql Server though. Then I came across a similar problem, which got resolved using a recursive CTE.


Proposed Solution

Fiddle with solution

- SharedId    GroupStartDateTime
- 1           01/01/2019 01:01:00
- 1           01/01/2019 01:01:33
- 1           01/01/2019 01:02:04
- 1           01/01/2019 01:06:15
- 2           01/01/2019 01:06:00
- 2           01/01/2019 01:06:45 
6 rows

CteBase and CteRecursive are heavely inspired by Bogdan Sahlean's answer on this related question.

WITH CteBase
AS
(
        SELECT  v.SharedId,
                v.TheDateTime,
                ROW_NUMBER() OVER(PARTITION BY v.SharedId ORDER BY v.TheDateTime) 
                       AS RowNum
        FROM    ADates v
),  CteRecursive
AS
(
        SELECT  crt.SharedId,
                crt.TheDateTime,
                crt.TheDateTime AS GroupStartDateTime,
                crt.RowNum,
                1 AS SharedIdRowNum
        FROM    CteBase crt
        WHERE   crt.RowNum = 1
        UNION ALL
        SELECT  crt.SharedId,
                crt.TheDateTime,
                CASE 
                    WHEN DATEDIFF(SECOND, prv.GroupStartDateTime, crt.TheDateTime) <= 30 
                    THEN prv.GroupStartDateTime 
                    ELSE crt.TheDateTime 
                END,
                crt.RowNum,
                CASE 
                    WHEN DATEDIFF(SECOND, prv.GroupStartDateTime, crt.TheDateTime) <= 30 
                    THEN prv.SharedIdRowNum + 1
                    ELSE 1
                END             
        FROM    CteBase crt
        INNER JOIN CteRecursive prv ON crt.SharedId = prv.SharedId 
        AND     crt.RowNum = prv.RowNum + 1
)
SELECT SharedId, Count(*) as [COUNT] FROM (
        SELECT  r.SharedId,
                r.GroupStartDateTime
        FROM    CteRecursive r
        WHERE   r.SharedIdRowNum = 1
) X
GROUP BY SharedId;

OTHER TIPS

It has something to do with item 6 being after item 7 which is in another group. Because of that it is seen as a new count in my query, resulting in 5. Is this time with id=6 correct? can it be after id=7?

The query has gotten so complex that I cannot explain, or it too late already (🍺), which is not good, but …. 😊

The next one seems to produce the correct results (with the 9 records):

WITH ADates AS (
    SELECT
          Id
        , SharedId
        , TheDateTime
    FROM (VALUES
        (CAST(1 AS int), CAST(1 AS int), CAST('2019-01-01 01:01:00.00' AS datetime2(7))),
        (2, 1, '2019-01-01 01:01:33.00'),
        (3, 1, '2019-01-01 01:02:00.00'),
        (4, 1, '2019-01-01 01:02:01.00'),
        (5, 1, '2019-01-01 01:02:04.00'),
        (6, 1, '2019-01-01 01:06:15.00'),
        (7, 2, '2019-01-01 01:06:00.00'),
        (9, 1, '2019-01-01 01:02:31.00'),
        (8, 2, '2019-01-01 01:06:45.00')
    ) X (Id, SharedId, TheDateTime)
),
TMPADates AS (
SELECT 
    Id, 
    SharedId, 
    TheDateTime, 
    --DATEADD(S, 30, TheDateTime ) TheDateTime30, 
    ISNULL((SELECT MIN(TheDateTime) 
            FROM ADates t2 
            WHERE t2.TheDateTime BETWEEN DATEADD(s,-30,ADates.TheDateTime) AND ADates.TheDateTime
              and ADates.SharedId=t2.SharedId),TheDateTime) ingroup
FROM ADates
),
TMPAdates2 AS (
SELECT 
    Id, 
    SharedId, 
    TheDateTime, 
    ingroup,
    -- ISNULL(LAG(ingroup) OVER (ORDER BY SharedId,ingroup DESC),ingroup) as ingroup2
    ISNULL(LAG(ingroup) OVER (PARTITION BY SharedId ORDER BY SharedId,ingroup DESC),ingroup) as ingroup2
FROM TMPADates 
)
SELECT 
    SharedId,  COUNT(DISTINCT ingroup2) As Count
FROM TMPADates2
GROUP BY  SharedId

What about:

    WITH ADates AS (
    SELECT
          Id
        , SharedId
        , TheDateTime
    FROM (VALUES
        (CAST(1 AS int), CAST(1 AS int), CAST('2019-01-01 01:01:00.00' AS datetime2(7))),
        (2, 1, '2019-01-01 01:01:33.00'),
        (3, 1, '2019-01-01 01:02:00.00'),
        (4, 1, '2019-01-01 01:02:01.00'),
        (5, 1, '2019-01-01 01:02:04.00'),
        (6, 1, '2019-01-01 01:06:15.00'),
        (7, 2, '2019-01-01 01:06:00.00'),
        (8, 2, '2019-01-01 01:06:45.00')
    ) X (Id, SharedId, TheDateTime)
),
TMPADates AS (
SELECT 
    Id, 
    SharedId, 
    TheDateTime, 
    DATEADD(S, 30, TheDateTime ) TheDateTime30, 
    ISNULL((SELECT MIN(TheDateTime) FROM ADates t2 WHERE t2.TheDateTime BETWEEN DATEADD(s,-30,ADates.TheDateTime) AND ADates.TheDateTime),TheDateTime) ingroup
FROM ADates
)
SELECT 
    SharedId,  COUNT(DISTINCT ingroup) AS Count
FROM TMPADates
GROUP BY  SharedId
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top