Question

I have a query in this form that will on average take ~100 in clause elements, and at some rare times > 1000 elements. If greater than 1000 elements, we will chunk the in clause down to 1000 (an Oracle maximum).

The SQL is in the form of

SELECT * FROM tab WHERE PrimaryKeyID IN (1,2,3,4,5,...)

The tables I am selecting from are huge and will contain millions more rows than what is in my in clause. My concern is that the optimizer may elect to do a table scan (our database does not have up to date statistics - yeah - I know ...)

Is there a hint I can pass to force the use of the primary key - WITHOUT knowing the index name of the primary Key, perhaps something like ... /*+ DO_NOT_TABLE_SCAN */?

Are there any creative approaches to pulling back the data such that

  1. We perform the least number of round-trips
  2. We we read the least number of blocks (at the logical IO level?)
  3. Will this be faster ..
SELECT * FROM tab WHERE PrimaryKeyID = 1
  UNION
SELECT * FROM tab WHERE PrimaryKeyID = 2
  UNION
SELECT * FROM tab WHERE PrimaryKeyID = 2
  UNION ....
Was it helpful?

Solution

If the statistics on your table are accurate, it should be very unlikely that the optimizer would choose to do a table scan rather than using the primary key index when you only have 1000 hard-coded elements in the WHERE clause. The best approach would be to gather (or set) accurate statistics on your objects since that should cause good things to happen automatically rather than trying to do a lot of gymnastics in order to work around incorrect statistics.

If we assume that the statistics are inaccurate to the degree that the optimizer would be lead to believe that a table scan would be more efficient than using the primary key index, you could potentially add in a DYNAMIC_SAMPLING hint that would force the optimizer to gather more accurate statistics before optimizing the statement or a CARDINALITY hint to override the optimizer's default cardinality estimate. Neither of those would require knowing anything about the available indexes, it would just require knowing the table alias (or name if there is no alias). DYNAMIC_SAMPLING would be the safer, more robust approach but it would add time to the parsing step.

If you are building up a SQL statement with a variable number of hard-coded parameters in an IN clause, you're likely going to be creating performance problems for yourself by flooding your shared pool with non-sharable SQL and forcing the database to spend a lot of time hard parsing each variant separately. It would be much more efficient if you created a single sharable SQL statement that could be parsed once. Depending on where your IN clause values are coming from, that might look something like

SELECT *
  FROM table_name
 WHERE primary_key IN (SELECT primary_key
                         FROM global_temporary_table);

or

SELECT *
  FROM table_name
 WHERE primary_key IN (SELECT primary_key
                         FROM TABLE( nested_table ));

or

SELECT *
  FROM table_name
 WHERE primary_key IN (SELECT primary_key
                         FROM some_other_source);

If you got yourself down to a single sharable SQL statement, then in addition to avoiding the cost of constantly re-parsing the statement, you'd have a number of options for forcing a particular plan that don't involve modifying the SQL statement. Different versions of Oracle have different options for plan stability-- there are stored outlines, SQL plan management, and SQL profiles among other technologies depending on your release. You can use these to force particular plans for particular SQL statements. If you keep generating new SQL statements that have to be re-parsed, however, it becomes very difficult to use these technologies.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top