What could cause a text comparison to fail?

https://dba.stackexchange.com/questions/139473

greenplum

02-10-2020
|

Question

DB Version: PostgreSQL 8.2.15 (Greenplum Database 4.3.4.1 build 2) on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 4.4.2 compiled on Feb 10 2015 14:15:10

I'm having a problem in one of our environments where the tables in one schema seem to behave incorrectly when doing simple queries, character columns are not comparing to literal strings or to other columns. Joins within the schema to other identical columns work, but joins to tables outside of the schema do not. Here's a simple example with literal strings.

I do this:

SELECT *
FROM carriers
WHERE carrier_code = 'FR'

Then I copy and paste the value from a character(32) column and do this:

SELECT *
FROM carriers
WHERE carrier_hash = '11aedd0e432747c2bcd97b82808d24a0'

This returns nothing. That's a straight copy-and-paste from that actual column as returned by the first query. If I do this, however:

SELECT *
FROM carriers
WHERE carrier_hash like '11aedd0e432747c2bcd97b82808d24a0'

That returns the same record. This also works:

SELECT *
FROM carriers
WHERE substring(carrier_hash,1,32) = '11aedd0e432747c2bcd97b82808d24a0'

Why might = and like behave differently with a simple string and no _ or % characters? This has been working for months. The only thing that's changed recently is we've backed up, cleared out, and restored the database. Could something have become messed up during the restore?

We have other tables in other schemas that have the same column, and those work just fine.

Our DBA recently restored this environment from a backup. Don't know if that is relevant.

La solution

This is occurring because something has changed in the way our data is being distributed. Some of our tables are not distributed in the same way as others - the records are on different nodes than the distribution algorithm says that they should be. So, when I run a query that looks for a specific value, the master sends that query only to the node that should hold that key value, and it doesn't find it. The same happens with a join - the segment node assumes that since both tables are distributed on the same join key, that the join can be performed locally. One of the tables is incorrectly distributed, so the join fails.

I don't yet know what has caused this, I will post an update if I find out. Any suggestions what it might be?

I have removed the postgresql tag as this problem turns out to be greenplum-specific.

Ok here's what happened. Our DBA was restoring from a backup, but it failed because the files in the backup didn't match the expected names. The restore wanted a file called something_0_2_something and _0_3_ and _0_4_ but the backup contained _0_22_ and _0_23_ and _0_24_ instead. So he raised a call with Pivotal, but because that was taking a while to get an answer, he just renamed them all subtracting 20 from the number and carried on with the restore. Seems like this has ended up with data being restored to the wrong nodes. We're still waiting for an answer from Pivotal about what went wrong with the file names and how we can fix it.

Further information: apparently the backup had backed up the mirror segments, not the primary segments, hence the different file names. We are working around it by taking down the primaries, restoring to the mirrors, bringing the primaries back up, and re-balancing. We don't know why it happened, the primary and mirror status looked fine. Pivotal have been looking at the system on webex and they say they've never seen this happen before so there's an achievement.

Licencié sous: CC-BY-SA avec attribution

Non affilié à dba.stackexchange