Domanda

Edit:

Database - Oracle 11gR2 - Over Exadata (X2)


Am framing a problem investigation report for past issues, and I'm slightly confused with a situation below.

Say , I have a table MYACCT. There are 138 Columns. It holds 10 Million records. With regular updates of at least 1000 records(Inserts/Updates/Deletes) per hour.

Primary Key is COL1 (VARCHAR2(18)) (Application rarely uses this, except for a join with other tables)

There's Another Unique Index over COL2 VARCHAR2(9)). Which is the one App regularly uses. What ever updates I meant previously happens based on both these columns. Whereas any SELECT only operation over this table, always refer COL2. So COL2 would be our interest.

We do a Query below,

SELECT COUNT(COL2) FROM MYACCT; /* Use the Unique Column (Not PK) */

There's no issue with the result, whereas I was the one recommending to change it as

SELECT COUNT(COL1) FROM MYACCT; /* Use the primary Index

I just calculated the time taken for actual execution

Query using the PRIMARY KEY was faster by `0.8-1.0 seconds always!

Now, I am trying to explain this behavior. Just drafting the Explain plan behind these Queries.

Query 1: (Using Primary Key)

SELECT COUNT(COL1) FROM MYACCT;

Plan :

SQL> select * from TABLE(dbms_xplan.display);
Plan hash value: 2417095184

---------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Rows  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |     1 | 11337   (1)| 00:02:17 |
|   1 |  SORT AGGREGATE               |         |     1 |            |          |
|   2 |   INDEX STORAGE FAST FULL SCAN| PK_ACCT |    10M| 11337   (1)| 00:02:17 |
---------------------------------------------------------------------------------

9 rows selected.

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
      41332  consistent gets
          0  physical reads
          0  redo size
        210  bytes sent via SQL*Net to client
        346  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

Query 2:(NOT using Primary Key)

SELECT COUNT(COL2) FROM MYACCT;

Plan :

Plan hash value: 1130703739

------------------------------------------------------------------------------------------
| Id  | Operation                     | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |          |     1 |    10 |  7868   (1)| 00:01:35 |
|   1 |  SORT AGGREGATE               |          |     1 |    10 |            |          |
|   2 |   INDEX STORAGE FAST FULL SCAN| MYINDX01 |    10M|    95M|  7868   (1)| 00:01:35 |
------------------------------------------------------------------------------------------

9 rows selected.

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      28151  consistent gets
         23  physical reads
        784  redo size
        233  bytes sent via SQL*Net to client
        346  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed

We can find in terms of Cost and Time the Query without primary key wins. Then how come the execution time of primary key is faster???

EDIT:

SQL> select segment_name, bytes from dba_segments where segment_name in ('MYINDX01','PK_ACCT');
PK_ACCT               343932928
MYINDX01              234881024
È stato utile?

Soluzione

You're reading more data from the PK index than you are from the other one. COL1 is VARCHAR2(18) while COL2 is VARCHAR(9), which doesn't necessarily mean anything but implies you probably have values in COL1 that are consistently longer than those in COL2. They will therefore use more storage, both in the table and in the index, and the index scan has to pull more data from the block buffer and/or disk for the PK-based query.

The execution statistics show that; 41332 consistent gets for the PK-based query, and only 28151 for the faster one, so it's doing more work with the PK. And the segment sizes show it too - for the PK you need to read about 328M, for the UK only 224M.

The block buffer is likely to be crucial if you're seeing the PK version run faster sometimes. In the example you've shown both queries are hitting the block buffer - the 23 physical reads are a trivial number, If the index data wasn't cached consistently then you might see 41k consistent gets versus 28k physical reads, which would likely reverse the apparent winner as physical reads from disk will be slower. This often manifests if running two queries back to back shows one faster, but reversing the order they run shows the other as faster.

You can't generalise this to 'PK query is slower then UK query'; this is because of your specific data. You'd probably also get better performance if your PK was actually a number column, rather than a VARCHAR2 column holding numbers, which is never a good idea.

Altri suggerimenti

Given a statement like

select count(x) from some_table

If there is a covering index for column x, the query optimizer is likely to use it, so it doesn't have to fetch the [ginormous] data page.

It sounds like the two columns (col1 and col2) involved in your similar queries are both indexed[1]. What you don't say is whether or not either of these indices are clustered.

That can make a big difference. If the index is clustered, the leaf node in the B-tree that is the index is the table's data page. Given how big your rows are (or seem to be), that means a scan of the clustered index will likely move a lot more data around -- meaning more paging -- than it would if it was scanning a non-clustered index.

Most aggregate functions eliminate nulls in computing the value of the aggregate function. count() is a little different. count(*) includes nulls in the results while count(expression) excludes nulls from the results. Since you're not using distinct, and assuming your col1 and col2 columns are not null, you might get better performance by trying

select count(*) from myacct

or

select count(1) from myacct

so the optimizer doesn't have to consider whether or not the column is null.

Just a thought.

[1]And I assume that they are the only column in their respective index.

Your PK query is doing 0 physical reads, suggesting you have the results in memory. So even though the execution plan looks slower, it's performing faster. The COL2 query is doing 23 physical reads.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top