Question

I'm trying to create a Postgres database from a backup given to me by my client's previous software vendor (no longer in the picture.) But I'm struggling with an encoding issue. I'm rather unfamiliar with Postgres (MySQL is my thing), so please forgive my n00bness.

When I run psql dbname < /path/to/file.backup, it spits out all sorts of lines starting with invalid command and ending with strings of question marks and weird characters, and the last line of output is:

ERROR:  invalid byte sequence for encoding "UTF8": 0xf1 0x16 0x88 0x02

I opened the backup file in my terminal, and I see many SQL strings interspersed with non-printable characters (represented as "^@".) There are some strings toward the top which I think may be relevant here:

SET client_encoding = 'UTF8';

SET standard_conforming_strings = 'off';

CREATE DATABASE "cleaned_DB" WITH TEMPLATE = template0 ENCODING = 'UTF8' \
    LC_COLLATE = 'English_United States.1252' \
    LC_CTYPE = 'English_United States.1252';

So it looks like the database was using UTF8 encoding, but the software vendor's development machine was using WIN1252. And I guess the strings in the backup file are in WIN1252?

How can I get this database imported? For reference, my dev machine is running Mac OSX.

Was it helpful?

Solution

The backup is a "custom format" backup, not an SQL script. You restore it with the pg_restore command. See the docs for pg_restore.

The encoding thing will be a bigger issue. The situation with locales and encodings between Windows and Linux with PostgreSQL is terrible. pg_restore is likely to fail to create the DB if told to create it as part of the restore process, because the ctype English_United States.1252 does not exist on Mac OS X, it's a Windows-ism.

I think what you will have to do is CREATE DATABASE the database yourself with a corresponding LC_CTYPE and LC_COLLATE like en_US.utf-8. Then restore to the existing DB.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top