Grouping values into buckets
-
18-02-2021 - |
Question
I'm working a query and I wasn't sure if my current approach is the "best" approach. I got a table filled with values and a table with buckets that I use to group those values.
There are 2 types of buckets:
- A priority bucket, if items apply to the conditions they should fall in this bucket and be excluded for any following bucket.
- A simple range bucket, the values should fall within a range.
I set up a simplified version that can be used to test.
the [Filter]
column in #tempBUcket
here is used for the priority buckets
The table definitions
create table #TempData([Id] int, [value] int);
create table #TempBucket([Id] int, [FromVal] int,[ToVal] int,[Filter] int);
Some test data
INSERT INTO #TempData (Id,value)
VALUES (1,1),(2,2),(3,3),(4,4),(5,5),
(6,6),(7,7),(8,8),(9,9),(10,1),
(11,2),(12,3),(13,4),(14,5),(15,6),
(16,7),(17,8),(18,9),(19,10),(20,11);
INSERT INTO #TempBucket(Id,FromVal,ToVal,Filter)
VALUES (1,0,3,0),(2,3,6,0),
(3,6,9,0),(4,9,12,0),
(5,NULL,NULL,1),(6,NULL,NULL,2);
Right now I have this query which works and gives the results I want but I feel this could be done better and more efficiently.
WITH filteredData as
(SELECT tv.*,COALESCE(tb1.Id,tb2.Id) as BucketId FROM #TempData tv
LEFT JOIN #TempBucket tb1 --prio bucket join
ON tb1.Filter IS NOT NULL AND tb1.Filter = tv.Value
LEFT JOIN #TempBucket tb2 --range bucket join
ON tb2.Fromval < tv.value AND tv.value <= tb2.ToVal
)
SELECT Count(fd.value) as count ,BucketId FROM filteredData fd GROUP BY BucketId;
Solution
My solution does only 1 scan
of #TempBucket
but it adds an additional sort
, I'm not sure it's more efficient, you should test it on volumes of data:
;with cte as
(
select t.*, tb.id as BucketId,
row_number() over(partition by t.id order by tb.FromVal) as rn
from #TempData t
join #TempBucket tb
on (t.value = tb.filter)
or (tb.Fromval < t.value AND t.value <= tb.ToVal)
)
select count(id) as cnt, BucketId
from cte
where rn = 1
group by BucketId;
To give a priority to the row in priority bucket
it sorts by tb.FromVal
which is null in case of priority bucket, alternatively you can sort on tb.Filter desc
, I just don't know the exact criteria of priority bucket
whether the FromVal,ToVal
are null
or Filter = 0