Question

I am seeing some very strange behavior on PostgreSQL 9.6.3 on RDS Aurora.

I am getting duplicate results from certain queries:

=> select count(id) from foos where id = 'deadbeef';
 count 
-------
 2
(1 row)

=> select id from foos where id = 'deadbeef';
            id            
--------------------------
 deadbeef
 deadbeef
(2 rows)

=> select id, created_at from foos where id = 'deadbeef';
            id            |         created_at         
--------------------------+----------------------------
 deadbeef                 | 2018-01-01 10:00:00.000000
(1 row)

(id values, timestamps, and table names have been obfuscated)

I do not have table inheritance on this table nor any others.

This appears to only affect queries that hit exactly one index on the foos table.

Because this appears to be isolated to individual indexes, I imagine running REINDEX may resolve this issue.

However, I have no idea how many indexes exhibit this behavior.

For instance, here is similar behavior for the same record via a different index:

=> select bar from foos where bar = 'qux';
                  bar                  
-----------------------------------------
 qux
 qux
(2 rows)

=> select id from foos where bar = 'qux';
            id            
--------------------------
 deadbeef
(1 row)

=> select bar, id from foos where bar = 'qux';
                  bar                    |            id            
-----------------------------------------+--------------------------
 qux                                     | deadbeef
(1 row)

=> select bar, created_at from foos where bar = 'qux';
                  bar                    |         created_at         
-----------------------------------------+----------------------------
 qux                                     | 2018-01-01 10:00:00.000000
(1 row)

Here are the table's relevant indexes:

Indexes:
    "pk_foos" PRIMARY KEY, btree (id)
    "index_foos_on_bar" UNIQUE, btree (bar)

Here's the explain plans of the first couple examples:

=> explain select id from foos where id = 'deadbeef';
                                 QUERY PLAN                                 
----------------------------------------------------------------------------
 Index Only Scan using pk_foos on foos  (cost=0.42..8.44 rows=1 width=25)
   Index Cond: (id = 'deadbeef'::text)
(2 rows)

=> explain select id, created_at from foos where id = 'deadbeef';
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Index Scan using pk_foos on foos  (cost=0.42..8.44 rows=1 width=33)
   Index Cond: (id = 'deadbeef'::text)
(2 rows)

What is going on here?

Or, how can I figure out what is going on here?

Was it helpful?

Solution

This certainly looks like a corrupt index. First things first, take a DB snapshot and store it someplace which no one has write access to.

This being RDS Aurora and not the community PostgreSQL, your first (and likely last) recourse would be to talk to AWS support.

You can try to reindex the entire database if you want, but if there is corruption in the indexes there might be corruption in the tables as well. And of course if you executed any data modifications which failed to update all the rows they should have, or if you modified any data based on business decisions which relied on queries with the wrong results, this type of corruption will be invisible. Re-indexing (or any continued use of the database) could destroy evidence that could be useful in forensic analysis of the corruption, hence the "first things first" above.

If you set up your data retention so that you can do point-in-time recovery into the far past, you could do tests to see how far back in time any specific instance of corruption appears.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top