Why can't I do a “upper()” in my PostgreSQL database?
-
19-08-2019 - |
Question
I created a database in PostgreSQL with "encoding = 'UTF8'", and loaded some UTF8 data in it. Selecting works fine, but when I try to do a "WHERE UPPER(name) = 'FOO'" in a query, I get an error
ERROR: invalid multibyte character for locale
My research seems to indicate that this is because the PostgreSQL installation was "initdb"-ed with LANG=en_US
rather than LANG=en_US.UTF8
. Doing a "SHOW LC_COLLATE
" shows "en_US
". I don't want to have to dump and recreate all my databases, because several of them are PostGIS and it's a royal pain to recreate those. Is there a work-around, like a way to do the equivalent of "UPPER" that works for UTF8?
Update I ended up doing the dump, reinitdb, and restore of the database, and it was less painful than I thought it would be, except for a bit of a problem figuring out where the data was supposed to go because the postgres user doesn't set the PGDATA environment variable, and neither does any config file or shell script that I could find.
Solution
I don't think the workaround you want is feasible, but dump and restore of your PostGIS-enabled databases should work fine. I regularly dump and restore databases with the PostGIS functions and data with geom objects.
What kind of problems do you have?
OTHER TIPS
You diagnostic is right, it is a common problem with Unicode in PostgreSQL. The installation procedure tried to be smart and initdbed with the locale of the shell running it :-(
I suggest that, if you cannot dump and restore your database, you have a problem more serious and more urgent that uppercasing the data. IMHO, you should address this problem first, before you have to actually restore your data after a new version of PostgreSQL was issued (or after a hard disk failure).