How do a put binary data into postgres through HDBC?

Question

I beleive this is a bug in HDBC-postgresql. I can explain why I think this is so, and can give you a workaround that I put together and tested.

I would expect HDBC-postgresql to convert a bytestring to the appropriate format to be inserted, but you can quickly verify that it is instead expecting the bytestring to hold the octal-backspace-escaped values of the data. For instance,

run conn "INSERT INTO documents VALUES (?)" [toSql $ B.pack [92, 0x31, 0x30, 0x31]]

inserts the single character 'A' into the database! This only makes sense when you realize that [92, 0x31, 0x30, 0x31] is the ascii representation of "\101", and "\101" is the octal representation of 'A'. Because the octal-backspace-escaped strings guarantee allowing values in the range 32-127 to be passed directly through (see the link Richard Huxton supplied in the comments for the details), the insert query does in fact work properly for standard english text, and could have gone unnoticed....

run conn "INSERT INTO documents VALUES (?)" [toSql $ B.pack [65]]

also inserts 'A'. Values higher than 127 aren't guaranteed to work, and are interpreted based on the character encoding used. If you look at the HDBC-postgresql code, or the logs of a query, you can see that it is setting the variable 'client_encoding' to utf8. The data coming in from the bytestring is therefore expected to be valid utf8, and complains when it sees a sequence that could not exist as a utf8 character.

The proper fix would be to wait for the bug to be fixed by the HDBC-postgresql guys, but in the meantime, you can use this code as a workaround....

import Data.ByteString.Internal
import Data.Char
import Data.Word8
import Numeric
import Text.Printf

convert::B.ByteString->B.ByteString
convert x = B.pack (convert' =<< B.unpack x)
          where
            convert'::Word8->[Word8]
            convert' 92 = [92, 92]
            convert' x | x >= 32 && x < 128 = [x]
            convert' x = 92:map c2w (printf "%03o" x)

Now you can just use

run conn "INSERT INTO documents VALUES (?)" [toSql $ convert $ rawData mydoc]