I solved the problem, but I really don't know which of my actions were the most useful:
1) I rebuild and reinstalled postgreSQL with readline and zlib libraries (previously I run configure with keys --without-zlib and --without-readline).
2) I started to use single quotes instead of double.
Thank you all anyway.
ERROR: invalid byte sequence for encoding "UTF8"
-
15-06-2023 - |
Question
I looked at similar questions, but still have not found a suitable solution.
On my Ubuntu OS I created some database by:
createdb PADB -W
And created a table.
create table teacher(
id_teacher integer PRIMARY KEY,
name varchar(120),
experience integer
);
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "teacher_pkey" for table "teacher"
I want to add some data contains Cyrillic, but I got this error:
PADB=# insert into teacher (name, experience) values ("Пупкин Василий Иванович", 15);
ERROR: invalid byte sequence for encoding "UTF8": 0xd0d0
Here is my lc settings:
PADB=# select name, setting from pg_settings where name like 'lc_%';
name | setting
-------------+-------------
lc_collate | ru_RU.UTF-8
lc_ctype | ru_RU.UTF-8
lc_messages | ru_RU.UTF-8
lc_monetary | ru_RU.UTF-8
lc_numeric | ru_RU.UTF-8
lc_time | ru_RU.UTF-8
(6 rows)
What is wrong?
Postgresql 9.1.11
Solution 2
OTHER TIPS
I suspect your client application is actually sending data in koi8-r
or iso-8859-5
encoding, not utf-8
, but your client_encoding
is telling PostgreSQL to expect UTF-8.
Either convert the input data to utf-8, or change your client_encoding
to match the input data.
Decoding your data with different encodings produces:
>>> print "\xd0\xd0".decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xd0 in position 0: invalid continuation byte
>>> print "\xd0\xd0".decode("koi8-r")
пп
>>> print "\xd0\xd0".decode("iso-8859-5")
аа
However, rather strangely, your input doesn't appear to contain any of these. I'm a bit puzzled as to what encoding would turn Пупкин Василий Иванович
into the byte sequence\xd0\xd0
. So this isn't fully explained yet. In fact, I can't find any encoding of Пупкин Василий Иванович
that produces that byte sequence, so I'm wondering if there's some double-encoding or similar mangling going on. I'd need to know more about your environment to say more; see comments on the original question.
Workaround: Place your data in a UTF-8 encoded csv file then import (/copy
).
You could use Notepad++: Encoding
> Convert to UTF-8
to create the file.