Pregunta

I have a set of image imported from MSSQL in csv. The file size is 1gb. Datatype in MSSQL is image. When I want to import to Postgres, datatype in bytea, error occured.

ERROR: invalid byte sequence for encoding "UTF8": 0xff
CONTEXT: COPY photo, line 1

When I look into the csv file, the image file is in

0xFFD8FFE000104A46494600010101006000600000FFE1...

My questions:

  1. What datatype in PostgreSQL can be used to import this type of file?
  2. How to retrieve image from this type of file using Postgres and PHP?

Solution that I tried:

  1. I tried to copy just three lines and save to new csv file, import it into the photo table, and it succeed. Weird, why is it when I want to import whole csv table, error occurred.
  2. I have tried this https://stackoverflow.com/a/22211207/3602791 in my php using sample image and it was a success, but when I want to retrieve the three lines image that I imported, it failed saying that my image have an error.

http://pastebin.com/WrfjFqY6 This is a sample of line in the csv. 2 columns, id and photo.

Anyone know how to solve this? Thanks in advance.

¿Fue útil?

Solución

As yenyen notes in the comments, the issue was that the input was UCS-2 (probably really UTF-16) encoded.

UCS-2 is a two-byte-per-character encoding that contains null bytes. If you tell PostgreSQL the file is utf-8 then it'll see the input as garbage full of invalid utf-8 sequences. If you tell PostgreSQL it's a simple 1-byte encoding like latin1, PostgreSQL will see the zero (null) byte and realise it's not latin-1 after all.

The trick here is to examine the input file with an editor that can show the raw bytes, not just use a text editor that automagically reads the BOM and loads it as encoded text. If in doubt use a hex editor.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top