Question

I looked at similar questions, but still have not found a suitable solution.

On my Ubuntu OS I created some database by:

createdb PADB -W

And created a table.

create table teacher(
    id_teacher integer PRIMARY KEY,
    name varchar(120),
    experience integer 
);

NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "teacher_pkey" for table "teacher"

I want to add some data contains Cyrillic, but I got this error:

PADB=# insert into teacher (name, experience) values ("Пупкин Василий Иванович", 15);
ERROR:  invalid byte sequence for encoding "UTF8": 0xd0d0

Here is my lc settings:

PADB=# select name, setting from pg_settings where name like 'lc_%';
    name     |   setting   
-------------+-------------
 lc_collate  | ru_RU.UTF-8
 lc_ctype    | ru_RU.UTF-8
 lc_messages | ru_RU.UTF-8
 lc_monetary | ru_RU.UTF-8
 lc_numeric  | ru_RU.UTF-8
 lc_time     | ru_RU.UTF-8
(6 rows)

What is wrong?

Postgresql 9.1.11

Was it helpful?

Solution 2

I solved the problem, but I really don't know which of my actions were the most useful:
1) I rebuild and reinstalled postgreSQL with readline and zlib libraries (previously I run configure with keys --without-zlib and --without-readline).
2) I started to use single quotes instead of double.
Thank you all anyway.

OTHER TIPS

I suspect your client application is actually sending data in koi8-r or iso-8859-5 encoding, not utf-8, but your client_encoding is telling PostgreSQL to expect UTF-8.

Either convert the input data to utf-8, or change your client_encoding to match the input data.

Decoding your data with different encodings produces:

>>> print "\xd0\xd0".decode("utf-8")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 0: invalid continuation byte

>>> print "\xd0\xd0".decode("koi8-r")
пп

>>> print "\xd0\xd0".decode("iso-8859-5")
аа

However, rather strangely, your input doesn't appear to contain any of these. I'm a bit puzzled as to what encoding would turn Пупкин Василий Иванович into the byte sequence\xd0\xd0. So this isn't fully explained yet. In fact, I can't find any encoding of Пупкин Василий Иванович that produces that byte sequence, so I'm wondering if there's some double-encoding or similar mangling going on. I'd need to know more about your environment to say more; see comments on the original question.

Workaround: Place your data in a UTF-8 encoded csv file then import (/copy).
You could use Notepad++: Encoding > Convert to UTF-8 to create the file.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top