How can SELECT COUNT(*) different from count of all records in the table?

https://stackoverflow.com/questions/4331436

30-09-2019
|

Question

So I have a table:

CREATE TABLE TABLE_NAME (
    COLUMN_1   char(12)    NOT NULL,
    COLUMN_2   char(2)     NOT NULL,
    COLUMN_3   char(1)     NOT NULL,
    COLUMN_4   int         NOT NULL,
    COLUMN_5   char(2)     NOT NULL,
    COLUMN_6   money       NOT NULL,
    COLUMN_7   smallint    NOT NULL,
    COLUMN_8   varchar(10) NOT NULL,
    COLUMN_9   smallint    NOT NULL,
    COLUMN_10  datetime    NOT NULL
    Primary Key (COLUMN_1, COLUMN_2, COLUMN_3)
)

SELECT COUNT(*) returns a different value than SELECT DISTINCT COUNT(*). How can this be possible?

I also tried

SELECT COUNT(*) FROM (
    SELECT
        COLUMN_1,
        COLUMN_2,
        COLUMN_3,
        COLUMN_4,
        COLUMN_5,
        COLUMN_6,
        COLUMN_7,
        COLUMN_8,
        COLUMN_9,
        COLUMN_10
     FROM TABLE_NAME
    ) TMP

which returned the same count as the distinct query.

I'm a little tired, so I hope I'm not missing something obvious, but I can't see how with a primary key and all fields being NOT NULL, there can be a different total count than the count of unique records.

BTW, this is on Sybase ASE 15.

The discrepancy is a hundred or so records out of a half million. I'm also seeing this problem in several other tables, but chose just one for the example.

EDIT:

I should mention for the sake of completeness that I discovered this problem when writing a simple job to completely copy this table to a remote database. My application was recording a certain number of read/write operations, but failed QA because the number of records in the source database differed from the number of records in the target database. Both values were obtained via COUNT(*); the count returned from the target (Oracle 10g) was the same as the number of read/write operations recorded by my app. As all fields on the source table are defined NOT NULL and a primary key is defined, I was at a loss to explain how my application was losing a tiny number of records.

This is when I started using the alternate queries listed above, both of which agreed with my apps read/write count, as well as the COUNT() value returned from the target. In other words, the only value that did not match was the COUNT() on the source database.

Solution

In most databases that support it, count(*) doesn't actually retrieve all records and count them -- instead it fetches some metadata field that just tracks the number of rows (or approximate number of rows) presently stored in the table. On the other hand, when you do something that requires working with actual data, the dbms is going to fetch the rows anyway, and it will count them as you would expect it to.

Of course, it's reasonable to expect that, regardless of how it's implemented, the result of count(*) would be the same as more a complex but equivalent query. That would suggest then, that (maybe) your table's metadata is corrupted somehow. (I'd say this one is a good bet -- I'm not familiar with sybase specifically, but most dbms have a way to force rebuild the table metrics... that might be worth a try here).

Another possible explanation is that the database's internal table row counter is actually not designed to be 100% accurate. (this second possibility is pure educated speculation... I don't actually know whether this is true of Sybase's row counter or not, but it might be worth further investigation).

good luck!

OTHER TIPS

This is a Sybase ASE Answer for a Sybase ASE Question

It applies to any Standard SQL Compliant DBMS; it does not apply to others.

The answer you have chosen is not correct.

COUNT() is an ISO/IEC/ANSI Standard SQL expression. It is required to physically count the rows, which means it will be slow on a large table. It is required to not use the internal memory resident tables or catalogue tables ("metadata"). This is a run-time value, so if the table is active, it will keep changing for every execution
- What you place within the brackets is very important.
- COUNT(*) returns the number of rows including nulls
- COUNT(column) returns the number of rows where column is Not Null
- Yes, the placement of DISTINCTis also important. This forces the use of a work table. If you need to count a column that is not Unique, then you do not have a choice; but for unique columns, there is no need to use DISTINCT
- DISTINCT applies to a column or an expression derived from a column;
  COUNT (DISTINCT *)
  is meaningless ("distinct all columns"), and ASE 15 has a substantially improved parser, which catches such things (previous version woul dhave returned a less accurate error message).
- the rows actually read will depend on your ISOLATION LEVEL (the correct count will be returned for the level specified) and the current activity on the database
- the cleanest method, that avoids the weird results you are getting, is to use
  COUNT(PK_column)
- (Since this is a CW) Never use any form of COUNT() for an existence check, as it physically counts the rows. Always use IF EXISTS with the correct WHERE clause, because it will use the index only.
If you need an accurate count but do not want to read all the rows, there is a function to read catalogue table systabstats, which has a count of rows in each table. This returns instantaneously, regardless of table size. The currency of those values depends on how your server is configured for performance, flushing, checkpointing, etc. systabstats is updated from the memory-resident tables by two commands: UPDATE STATISTICS and "flush stats". Try this:
```
EXEC sp_flushstats
SELECT ROW_COUNT (DB_ID(), OBJECT_ID("table_name") )
```

Response to Comments

This Section is Not Relevant to Sybase

I have provided a clean explanation re your problem, and the subject, without explaining why the other two answers are incorrect, or why your second query returns unpredictable results.
Here is the Sybase ASE 15.0 Reference/Building Blocks Manual, COUNT() is on page 121. Note that icyrock has misquoted it, and both of you have mis-interpreted it, inadvertently of course. Your starting point was confusion, lack of distinction re * and DISTINCT, hence my attempt at a clear answer.
I made this a Community Wiki, therefore my answer is complete re the subject, normalised, so that it can stand alone as a complete answer to any question re COUNT().
In response to comments, for those people who have not heard (no offence, but there are a lot of freeware SQL users seeking answers here at SO). SQL is a Standard language, invented and progressed by IBM in the 1980's and accepted as a Standard by:
- International Standards Organisation,
- International Electrotechnical Commission and (Book Format),
- and copied by American National Standards Institute (free publication thanks to good old Digital Equipment Corp) somewhat later.
- None of the OpenSource or freeware "SQL" comply with the Standard. They provide some components (language structures, facilities, commands) of the Standard. And of those they provide, they seldom provide the Standard Requirement (eg. transaction handling, security).
- Therefore what Sybase and DB2 (rigid re Standards) do, and to a lesser extent what MS ("flexible" about implementations) and Oracle (stretches the definitions) do, because they are Standard-Compliant (without arguing the small variations), is light years away from the MySQLs and the PostGreSQLs of the world.
- Then there is non-Standard or anti-Standard category.
  MySQL/MyISAM provides COUNT() in a manner that is specificaly against the Standard (this is plainly evident from the MySQL Manual link provided by Lee; good for non-transactional apps).
  MySQL/InnoDB & BDB provide COUNT() in the Standard-compliant manner.
- All Standard-SQL vendors provide a large array of Extensions to the Standard
- Where the freeware has value is, they provide quite a number of Extensions (mostly instead of the Standard), to ease coding on web servers. And they are all different.
  .
The NOT NULL in the table definition cannot be immediately trusted, because the table may well have had null columns in it before the current definition was implemented. Nevertheless, Sybase and DB2 will count throws physical, as per requirement for Standard compliance. You can prove with with a series of counts:SELECT COUNT(column_1) from table_name; and then compare the counts.
The second query will further confuse you, yes, because when the inner table is created, and populated, the count will be accurate. Since you created it with an expectation, it fulfilled your expectation. But that does not prove anything about that original table.

If I'm not mistaken, judging from this:

http://manuals.sybase.com/onlinebooks/group-as/asg1250e/sqlug/@Generic__BookTextView/8007;pt=7889

you should use select count(distinct *). I'd expect select distinct count(*) to always return 1, as it says "give me the distinct rows, each of which is a count(*)", so there's always going to be one row, while select count(distinct *) gives you a count of distinct rows.

FWIW, the above seems to be for v12.5 (though I don't see any differences), here are the 15.0 docs:

http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.help.ase_15.0.blocks/html/blocks/blocks128.htm

It explicitly says the following:

count(*) finds the number of rows. count(*) does not take any arguments, and cannot be used with distinct. All rows are counted, regardless of the presence of null values.

You can use select count(distinct column_1) or so, but not select count(distinct *).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow