Teradata Optimizer Equal vs Like in SQL

https://stackoverflow.com/questions/3712177

sql
teradata

02-10-2019
|

Question

I am currently trying to optimize some bobj reports where our backend is Teradata. The Teradata optimizer seems very finicky and I was wondering if anyone has come up with a solution or a workaround to get the optimizer to treat likes in a similar regard to equals.

My issue is that we allow the user to input one of two methods:
 1. Enter the Number:
    or
 2. Enter a Number like:

Option one performs like a dream while option two is dragging our query times from 6 seconds to 2 minutes.

In addition to this; does anyone know of any good articles, discussions, vidoes, etc.. on optimizing SQL statements for the teradata optimizer?

Solution

Because the column is defined as a VARCHAR and you are using the LIKE operator you eliminate the possibility of using the PI for single AMP access. Remember, the primary indexes first job is distributing the data across the AMPs in the system. Because you are using the LIKE operator against the PI the optimizer must perform an 'all AMP' operation to satisfy the LIKE operator.

WHERE MyPIColumn LIKE '123%'

The hashing of values starting with 123 can and will end up on multiple AMPs.

WHERE MyPIColum = '123'

The hashing of 123 will place every single record on the same AMP. Querying for '123' will always be a single AMP operation.

Statistics on this may help with row estimates but will likely not eliminate the 'all AMP' operation.

Is this a Unique PI or Non-Unique PI?
Why was the data type chosen to be character over numeric? Although GT(E) or LT(E) would likely result in the same 'All-AMP' operation'.
Is this PI shared by other tables in the model to facilitate AMP local join strategies?

OTHER TIPS

I'd take it that Number is indexed? Teradata uses hashing for indexing, so equals will result in the index being used, while like will result in a full table scan.

If you have a genuine need for using like, there's not an awful lot you can do. One thing you could try is using Substr(Number, 1, 3) = '123' rather than Number LIKE '123%'. I've gotten small performance improvements from this in the past, but don't expect anything spectacular.

You will need a fulltext index / pre-tokenized index, e.g. lucene, and also a two parse search.

e.g. When insert a "12345" to your database, create link from "1", "12", "123", "234"...etc to "12345".

Then, when use find something like "123**", find "123" from the lookup table and the seek to the record "12345"

If you are doing a direct VARCHAR comparison, ie

Column LIKE 'VALUE'

then you could try to use a NUSI on that column. Make sure that you collect statistics for the table's primary index and for the index

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow