Question

I was inspired by the good answers from my previous question about SQL. Now this SQL is run on a DB with Interbase 2009. It is about 21 GB in size.

SELECT DistanceAsMeters, AddrDistance.Bold_Id, AddrDistance.Created, AddressFrom.CityName_CO as FromCity, AddressTo.CityName_CO as ToCity
FROM AddrDistance
LEFT JOIN Address AddressFrom ON AddrDistance.FromAddress = AddressFrom.Bold_Id
LEFT JOIN Address AddressTo ON AddrDistance.ToAddress = AddressTo.Bold_Id
Where  DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0
       and not AddrDistance.bold_id in (select bold_id from DistanceQueryTask)
Order By Created Desc

There are 840000 rows with AddrDistance 190000 rows with Address and 4 with DistanceQueryTask.

The question is, can this be done faster? I guess, the same query is run many times select bold_id from DistanceQueryTask. Note that I'm not interested in stored procedures, just plain SQL :)

EDIT1 Here is the current execution plan:

Statement: SELECT DistanceAsMeters, AddrDistance.Bold_Id, AddrDistance.Created, AddressFrom.CityName_CO as FromCity, AddressTo.CityName_CO as ToCity
FROM AddrDistance
LEFT JOIN Address AddressFrom ON AddrDistance.FromAddress = AddressFrom.Bold_Id
LEFT JOIN Address AddressTo ON AddrDistance.ToAddress = AddressTo.Bold_Id
Where  DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0
       and not AddrDistance.bold_id in (select bold_id from DistanceQueryTask)
Order By Created Desc

PLAN (DISTANCEQUERYTASK INDEX (RDB$PRIMARY218))
PLAN SORT (JOIN (JOIN (ADDRDISTANCE NATURAL,ADDRESSFROM INDEX (RDB$PRIMARY234)),ADDRESSTO INDEX (RDB$PRIMARY234)))

And yes, DistanceQueryTask is meant to have a low number if rows in the database.

Was it helpful?

Solution

Using Left Join and subqueries will slow down any query.

You can get some improvements with the correct indexes (on Bold_id, DistanceMeters, PseudoDistanceAsCostKm ) remember that more indexes increase the size of the database

OTHER TIPS

I suppose bold_id is your key, and thus properly indexed.
Then replacing the subselect and the not...in by a join might help the optimizer.

SELECT DistanceAsMeters, Bold_Id, Created, AddressFrom.CityName_CO as FromCity, AddressTo.CityName_CO as ToCity
FROM AddrDistance
LEFT JOIN Address AddressFrom ON AddrDistance.FromAddress = AddressFrom.Bold_Id
LEFT JOIN Address AddressTo ON AddrDistance.ToAddress = AddressTo.Bold_Id
LEFT JOIN DistanceQueryTask ON AddrDistance.bold_id = DistanceQueryTask.bold_id
Where  DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0
  and DistanceQueryTask.bold_id is null
Order By Created Desc

Create an index for this part: (DistanceAsMeters = 0 and PseudoDistanceAsCostKm = 0) because it does a (bad) table scan for it: ADDRDISTANCE NATURAL

And try to use the join instead of subselect as stated by Francois.

As Daniel and Andre sugges an index helps a lot.
I would suggest this index (DistanceMeters, PseudoDistanceAsCostKm, Bold_id), because the first 2 parts of the index is constant, then its a smal portion of the index that is needed to read.

If it is a fact that FromAddress and/or ToAddress exist you can change the LEFT JOIN to INNER JOIN, because it is often faster (the query optimizer can make some assumptions).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top