Question

My task is to optimize a query in Oracle SQL where a table is joined with itself using conditions concerning parsed fragments of varchar data in one of its columns. To my understanding, Oracle won't use indices, since column names in the ON clause occur only as arguments of functions. And the query takes nearly forever to complete. Creating a table with processed REF data (see below) would solve the problem, but this is out of question for other reasons.

I've prepared a simplified version of the problem for illustration (I am fairly sure this is the clue, so I extracted the relevant part of a much more complicated query). The 'transactions' table has the following columns:

  • TRAN -- a 10-digit number being the code of transaction,
  • STORE -- the code of a store in which transaction was made,
  • DATE -- transaction's date,
  • REF -- reference code to a different transaction (in case of returns etc.). This code is in form: [store code] * [two last digits of transactions year] * [7 last digits of TRAN without left-side zeroes], so it can look like this: '142*09*3234'. Basically, REF points to some other row in the table, but has to be somewhat processed before it can be used.

    SELECT *
    FROM transactions t1
        JOIN transactions t2
        ON ( t2.store = substr(t1.REF, 1, instr(t1.REF, '*') - 1)
            AND to_char(t2.DATE, 'yy') = substr(t1.REF, instr(t1.REF, '*', 1, 1) + 1), instr(t1.REF, '*', 1, 2) - 1)
            AND to_number(substr(to_char(t2.TRAN), -7)) = to_number(substr(t1.REF, instr(t1.REF, '*', 1, 2) + 1))
           )
    

I have no experience dealing with SQL optimization, so I'd appreciate any advise in good direction.

Was it helpful?

Solution 2

Basically, you're screwed because of a bad design, welcome to my world =)

Anyway, when I look at it, you only have 2 fields you can use to JOIN the transaction back to its referenced transaction : STORE and DATE (although the latter is rather 'rough'). As they decided to store only the last 7 digits of the transaction the only way to speed this up would be to add a new field that stores those 7 digits. But if you'd go that road it would make much more sense to simply store the entire REF syntax in a new (calculated) field.

It most certainly would be the least painful solution to this problem as you would be able to add an index on said field and then change your query to

SELECT *
  FROM transactions t1
  JOIN transactions t2
    ON t2.TRAN_AS_REF = t1.REF

However, from what I understand you'll not be able/allowed to add an extra (calculated) field to the table ?! Adding another table that would hold this information so you can use it to link the information would also be a solution although it would add some complexity to make sure the data always is up to date ! But, anyway, from what I understand this is not an option here. Another work-around might be to create a view that retuns TRAN and its REF-equivalent. You could materialize said view and use that as a link in the join. This -- just like the calculated field approach -- would have the benefit of always being up to date without the need of extra logic and/or changes to the logic already in place.

Or finally, based on Wernfrieds suggestion, maybe it is possible to create a index that indexes the TRAN as REF syntax ? I have no experience with this, but it sounds like it would be an option.

The index would then be something along the lines of

CREATE INDEX ind_ref ON transactions ( STORE + '*' + to_char(DATE, 'yy') + '*' + substr(to_char(TRAN), -7)) )

while your query would then become something like this:

SELECT *
FROM transactions t1
    JOIN transactions t2
    ON ( (t2.STORE + '*' + to_char(t2.DATE, 'yy') + '*' + substr(to_char(t2.TRAN), -7)) = t1.REF )

and hopefully the server would then pick the index to go after the right t2 record. But like I said, I don't have experience with this, but it's worth a shot IMHO.

Anyway, if ALL of this would be a nono and you can only optimize the query, then I would suggest to make the best of it by using both the STORE and the DATE information as good as possible like this:

 SELECT *
  FROM transactions t1
  JOIN transactions t2
    ON (    
            t2.store = substr(t1.REF, 1, instr(t1.REF, '*') - 1)
        AND t2.DATE BETWEEN to_date('0101' + substr(t1.REF, instr(t1.REF, '*', 1, 1) + 1), instr(t1.REF, '*', 1, 2) - 1)
                        AND to_date('3112' + substr(t1.REF, instr(t1.REF, '*', 1, 1) + 1), instr(t1.REF, '*', 1, 2) - 1)
        AND t2.TRAN % 10000000 = to_number(substr(t1.REF, instr(t1.REF, '*', 1, 2) + 1))
       )

Thinking about it, if you'd be able to add this index

CREATE INDEX ind_tst ON transactions (STORE, DATE, TRAN % 10000000)

then it might well be that the query above already is quite an improvement on what you have now. And thinking out loud this means you could just as well try this:

CREATE INDEX ind_tst ON transactions (STORE, to_char(DATE, 'yy'), TRAN % 10000000)

SELECT *
  FROM transactions t1
  JOIN transactions t2
    ON (    
            t2.store = substr(t1.REF, 1, instr(t1.REF, '*') - 1)
        AND to_char(t2.DATE, 'yy') = substr(t1.REF, instr(t1.REF, '*', 1, 1) + 1), instr(t1.REF, '*', 1, 2) - 1)
        AND t2.TRAN % 10000000 = to_number(substr(t1.REF, instr(t1.REF, '*', 1, 2) + 1))
       )

Hope this helps out a bit. As I have no experience with Oracle you'll probably need to fix the syntax here and there, sorry about that... Anyway, good luck!

OTHER TIPS

You can create "Function based indexes" in Oracle. Try this one:

CREATE INDEX ind_1 ON transactions (SUBSTR(REF, 1, INSTR(REF, '*') - 1));
CREATE INDEX ind_2 ON transactions (SUBSTR(REF, INSTR(REF, '*', 1, 1) + 1), INSTR(REF, '*', 1, 2) - 1));
CREATE INDEX ind_3 ON transactions (TO_NUMBER(SUBSTR(TO_CHAR(TRAN), -7)));
CREATE INDEX ind_4 ON transactions (TO_NUMBER(SUBSTR(REF, INSTR(REF, '*', 1, 2) + 1)));
CREATE INDEX ind_5 ON transactions (TO_CHAR(DATE, 'yy'));

However, you should check the explain plan and drop those indexes which are not used. You an also create virtual columns and create the index there.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top