Optimize SQL with Interbase
Question
I was inspired by the good answers from my previous question about SQL. Now this SQL is run on a DB with Interbase 2009. It is about 21 GB in size.
SELECT DistanceAsMeters, AddrDistance.Bold_Id, AddrDistance.Created, AddressFrom.CityName_CO as FromCity, AddressTo.CityName_CO as ToCity
FROM AddrDistance
LEFT JOIN Address AddressFrom ON AddrDistance.FromAddress = AddressFrom.Bold_Id
LEFT JOIN Address AddressTo ON AddrDistance.ToAddress = AddressTo.Bold_Id
Where DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0
and not AddrDistance.bold_id in (select bold_id from DistanceQueryTask)
Order By Created Desc
There are 840000 rows with AddrDistance 190000 rows with Address and 4 with DistanceQueryTask.
The question is, can this be done faster? I guess, the same query is run many times select bold_id from DistanceQueryTask. Note that I'm not interested in stored procedures, just plain SQL :)
EDIT1 Here is the current execution plan:
Statement: SELECT DistanceAsMeters, AddrDistance.Bold_Id, AddrDistance.Created, AddressFrom.CityName_CO as FromCity, AddressTo.CityName_CO as ToCity
FROM AddrDistance
LEFT JOIN Address AddressFrom ON AddrDistance.FromAddress = AddressFrom.Bold_Id
LEFT JOIN Address AddressTo ON AddrDistance.ToAddress = AddressTo.Bold_Id
Where DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0
and not AddrDistance.bold_id in (select bold_id from DistanceQueryTask)
Order By Created Desc
PLAN (DISTANCEQUERYTASK INDEX (RDB$PRIMARY218))
PLAN SORT (JOIN (JOIN (ADDRDISTANCE NATURAL,ADDRESSFROM INDEX (RDB$PRIMARY234)),ADDRESSTO INDEX (RDB$PRIMARY234)))
And yes, DistanceQueryTask is meant to have a low number if rows in the database.
Solution
Using Left Join and subqueries will slow down any query.
You can get some improvements with the correct indexes (on Bold_id, DistanceMeters, PseudoDistanceAsCostKm ) remember that more indexes increase the size of the database
OTHER TIPS
I suppose bold_id is your key, and thus properly indexed.
Then replacing the subselect and the not...in by a join might help the optimizer.
SELECT DistanceAsMeters, Bold_Id, Created, AddressFrom.CityName_CO as FromCity, AddressTo.CityName_CO as ToCity
FROM AddrDistance
LEFT JOIN Address AddressFrom ON AddrDistance.FromAddress = AddressFrom.Bold_Id
LEFT JOIN Address AddressTo ON AddrDistance.ToAddress = AddressTo.Bold_Id
LEFT JOIN DistanceQueryTask ON AddrDistance.bold_id = DistanceQueryTask.bold_id
Where DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0
and DistanceQueryTask.bold_id is null
Order By Created Desc
Create an index for this part: (DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0) because it does a (bad) table scan for it: ADDRDISTANCE NATURAL
And try to use the join instead of subselect as stated by Francois.
As Daniel and Andre sugges an index helps a lot.
I would suggest this index (DistanceMeters, PseudoDistanceAsCostKm, Bold_id), because the first 2 parts of the index is constant, then its a smal portion of the index that is needed to read.
If it is a fact that FromAddress and/or ToAddress exist you can change the LEFT JOIN to INNER JOIN, because it is often faster (the query optimizer can make some assumptions).