Impact of COUNT() Based on Column in a Table

Question 1

You're reading more data from the PK index than you are from the other one. COL1 is VARCHAR2(18) while COL2 is VARCHAR(9), which doesn't necessarily mean anything but implies you probably have values in COL1 that are consistently longer than those in COL2. They will therefore use more storage, both in the table and in the index, and the index scan has to pull more data from the block buffer and/or disk for the PK-based query.

The execution statistics show that; 41332 consistent gets for the PK-based query, and only 28151 for the faster one, so it's doing more work with the PK. And the segment sizes show it too - for the PK you need to read about 328M, for the UK only 224M.

The block buffer is likely to be crucial if you're seeing the PK version run faster sometimes. In the example you've shown both queries are hitting the block buffer - the 23 physical reads are a trivial number, If the index data wasn't cached consistently then you might see 41k consistent gets versus 28k physical reads, which would likely reverse the apparent winner as physical reads from disk will be slower. This often manifests if running two queries back to back shows one faster, but reversing the order they run shows the other as faster.

You can't generalise this to 'PK query is slower then UK query'; this is because of your specific data. You'd probably also get better performance if your PK was actually a number column, rather than a VARCHAR2 column holding numbers, which is never a good idea.

Question 2

Given a statement like

select count(x) from some_table

If there is a covering index for column x, the query optimizer is likely to use it, so it doesn't have to fetch the [ginormous] data page.

It sounds like the two columns (col1 and col2) involved in your similar queries are both indexed[1]. What you don't say is whether or not either of these indices are clustered.

That can make a big difference. If the index is clustered, the leaf node in the B-tree that is the index is the table's data page. Given how big your rows are (or seem to be), that means a scan of the clustered index will likely move a lot more data around -- meaning more paging -- than it would if it was scanning a non-clustered index.

Most aggregate functions eliminate nulls in computing the value of the aggregate function. count() is a little different. count(*) includes nulls in the results while count(expression) excludes nulls from the results. Since you're not using distinct, and assuming your col1 and col2 columns are not null, you might get better performance by trying

select count(*) from myacct

or

select count(1) from myacct

so the optimizer doesn't have to consider whether or not the column is null.

Just a thought.

[1]And I assume that they are the only column in their respective index.

Question 3

Your PK query is doing 0 physical reads, suggesting you have the results in memory. So even though the execution plan looks slower, it's performing faster. The COL2 query is doing 23 physical reads.

Impact of COUNT() Based on Column in a Table

Query 1: (Using Primary Key)

Query 2:(NOT using Primary Key)