join two Oracle tables, date to not overlapping date ranges
-
12-06-2021 - |
题
I have two tables:
- trips: id_trip, id_object, trip_date, delta (8980026 rows)
- ranges: id_range, id_object, date_since, date_until (18490 rows)
I need to optimize the following select statement
select r.id_range, sum(t.delta) sum_deltas
from trips t,
ranges r
where t.id_object = r.id_object
and t.trip_date between r.date_since and r.date_until
group by r.id_range
according to the condition there is always exactly one matching row for trip in 'ranges'
- the trips table is constantly growing, but there are no updates or deletions
- table ranges may change from time to time in any way (deletions, updates, inserts), so function based index is not the way :(
- there are indexes on id_object (in both tables) and date_since (in trips)
Does anyone have an idea how to speed things up, is it even possible?
解决方案
It's always possible to speed things up; it just may not be worth the time / effort / money / disk-space / additional overheads etc.
Firstly please use the explicit join syntax. It's been the SQL standard for a few decades now and it helps avoid a lot of potential errors. Your query would become:
select r.id_range, sum(t.delta) sum_deltas
from trips t
join ranges r
on t.id_object = r.id_object
and t.trip_date between r.date_since and r.date_until
group by r.id_range
This query would imply that you need two indexes - unique if possible. On ranges
you should have an index on id_object, date_since, date_until
. The index on trips
would be id_object, trip_date
. If trips
were smaller I might consider adding delta
on to the end of that index so you never enter the table at all but only do a index scan. As it stands you're going to have to do a table access by index rowid.
Having written all that your problem may be slightly different. You're going to be full-scanning both tables with this query. Your problem might be the indexes. If the optimizer is using the indexes then it's possible you're doing an index unique/range scan for each id_object
in trips
or ranges
and then, because of the use of columns not in the indexes you will be doing an table access by index rowid. This can be massively expensive.
Try adding a hint to force a full-scan of both tables:
select /*+ full(t) full(r) */ r.id_range, sum(t.delta) sum_deltas
from trips t
join ranges r
on t.id_object = r.id_object
and t.trip_date between r.date_since and r.date_until
group by r.id_range
其他提示
You may want to look at your data segmentation (i.e. partition your data by certain dates, causing the query to only hit the appropriate partitions) and indices, these could probably speed up the querying process.
Also, you could consider a data warehouse... You say Trips never gets updated or deleted so it is an ideal candidate for denormalization into a data structure more suited to report generation and ad-hoc queries.