Enhancing Performance

https://stackoverflow.com/questions/12262513

30-06-2021
|

Question

I'm not as clued up on shortcuts in SQL so I was hoping to utilize the brainpower on here to help speed up a query I'm using. I'm currently using Oracle 8i.

I have a query:

SELECT 
    NAME_CODE, ACTIVITY_CODE, GPS_CODE 
FROM
    (SELECT 
         a.NAME_CODE, b.ACTIVITY_CODE, a.GPS_CODE, 
         ROW_NUMBER() OVER (PARTITION BY a.GPS_DATE ORDER BY b.ACTIVITY_DATE DESC) AS RN
     FROM GPS_TABLE a, ACTIVITY_TABLE b
     WHERE a.NAME_CODE = b.NAME_CODE
       AND a.GPS_DATE >= b.ACTIVITY_DATE 
       AND TRUNC(a.GPS_DATE) > TRUNC(SYSDATE) - 2)
WHERE 
    RN = 1

and this takes about 7 minutes give or take 10 seconds to run.

Now the GPS_TABLE is currently 6.586.429 rows and continues to grow as new GPS coordinates are put into the system, each day it grows by about 8.000 rows in 6 columns.

The ACTIVITY_TABLE is currently 1.989.093 rows and continues to grow as new activities are put into the system, each day it grows by about 2.000 rows in 31 columns.

So all in all these are not small tables and I understand that there will always be a time hit running this or similar queries. As you can see I'm already limiting it to only the last 2 days worth of data, but anything to speed it up would be appreciated.

Solution

Your strongest filter seems to be the filter on the last 2 days of GPS_TABLE. It should filter the GPS_TABLE to about 15k rows. Therefore one of the best candidate for improvement is an index on the column GPS_DATE.

You will find that your filter TRUNC(a.GPS_DATE) > TRUNC(SYSDATE) - 2 is equivalent to a.GPS_DATE > TRUNC(SYSDATE) - 2, therefore a simple index on your column will work if you change the query. If you can't change it, you could add a function-based index on TRUNC(GPS_DATE).

Once you have this index in place, we need to access the rows in ACTIVITY_TABLE. The problem with your join is that we will get all the old activities and therefore a good portion of the table. This means that the join as it is will not be efficient with index scans.

I suggest you define an index on ACTIVITY_TABLE(name_code, activity_date DESC) and a PL/SQL function that will retrieve the last activity in the least amount of work using this index specifically:

CREATE OR REPLACE FUNCTION get_last_activity (p_name_code VARCHAR2, 
                                              p_gps_date DATE) 
RETURN ACTIVITY_TABLE.activity_code%type IS
   l_result ACTIVITY_TABLE.activity_code%type;
BEGIN
   SELECT activity_code
     INTO l_result
     FROM (SELECT activity_code
             FROM activity_table
            WHERE name_code = p_name_code
              AND activity_date <= p_gps_date
            ORDER BY activity_date DESC)
     WHERE ROWNUM = 1;
   RETURN l_result;
END;

Modify your query to use this function:

SELECT a.NAME_CODE,
       a.GPS_CODE,
       get_last_activity(a.name_code, a.gps_date)
  FROM GPS_TABLE a
 WHERE trunc(a.GPS_DATE) > trunc(sysdate) - 2

OTHER TIPS

Optimising an SQL query is generally done by:

Add some indexes
Try a different way to get the same information

So, start by adding an index for ACTIVITY_DATE, and perhaps some other fields that are used in the conditions.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow