Remove characters from PostgreSQL database causing encoding errors
-
10-10-2020 - |
Question
I have a PostgreSQL 10 database that uses WIN1252 encoding.
One of my columns has values that cause conversion errors when running a select from pgAdmin 4:
SELECT myfield FROM mydb.myschema."MYTABLE"
Returns:
ERROR: character with byte sequence 0x9d in encoding "WIN1252" has no
equivalent in encoding "UTF8"
SQL state: 22P05
Setting the client encoding to WIN1252 in pgAdmin 4 causes it to drop the connection to the database (I remember reading somewhere that pgAdmin 4 has issues with setting the client encoding, but track down the link).
Using psql and setting the client encoding to WIN1252, selecting one of the offending features:
SELECT myfield
FROM mydb.myschema."MYTABLE"
WHERE oid = 12345
Returns:
“A sample comment
Looking at the data that was originally imported into the DB, it appears that “curly quotes” were incorrectly imported.
Is there a way to remove these characters from the column - either replacing them with regular quotation marks, or simply deleting them?
Solution
Looking at the WIN1252 character set:
https://en.wikipedia.org/wiki/Windows-1252 , there is no character corresponding to that hexadecimal code 9D
. The same goes for 81
, 8D
, 8F
, 90
.
That's why it "has not equivalent" in UTF8: that character doesn't exist in the first place in the source map.
It doesn't error out when displayed or input in a session configured in WIN1252
encoding. When using the proper code page (chcp 1252
) with psql.exe on top of cmd.exe, these characters are displayed in my environment as a question mark inside a rectangular box.
Anyway, you might want to remove these with statements like (when using the WIN1252 client encoding):
UPDATE tablename SET field=replace(field, chr(x'9D'::int), '')
WHERE strpos(field,chr(x'9D'::int))>0;
Use psql if WIN1252 is unusable with PgAdmin. Once these characters are removed, you'll be able to switch later to UTF8 client encoding.
OTHER TIPS
I think you're reading it backwards. Instead try to SET
the client_encoding to utf8
;.
SET CLIENT_ENCODING TO 'utf8';