Lisp: Need help getting correct behaviour from SBCL when converting octet stream to EUC-JP with malformed bytes

StackOverflow https://stackoverflow.com/questions/420300

  •  05-07-2019
  •  | 
  •  

Question

The following does not work in this particular case, complaining that whatever you give it is not a character.

(handler-bind ((sb-int:character-coding-error
                 #'(lambda (c)
                      (invoke-restart 'use-value #\?))))
    (sb-ext:octets-to-string *euc-jp* :external-format :euc-jp))

Where *euc-jp* is a variable containing binary of EUC-JP encoded text.

I have tried #\KATAKANA_LETTER_NI as well, instead of #\? and also just "". Nothing has worked so far.

Any help would be greatly appreciated!

EDIT: To reproduce *EUC-JP*, fetch http://blogs.yahoo.co.jp/akira_w0325/27287392.html using drakma.

Was it helpful?

Solution

There's an expression in SBCL 1.0.18's mb-util.lisp that looks like this:

(if code
    (code-char code)
    (decoding-error array pos (+ pos bytes) ,format
                    ',malformed pos))

I'm not very familiar with SBCL's internals, but this looks like a bug. The consequent returns a character, while the alternative returns a string (no matter what you give to it via USE-VALUE, it's always converted into a string by way of the STRING function; see the definition of DECODING-ERROR in octets.lisp).

OTHER TIPS

It works for me:

CL-USER> (handler-bind ((sb-int:character-coding-error
                         #'(lambda (c)
                             (declare (ignore c))
                             (invoke-restart 'use-value #\?))))
           (sb-ext:octets-to-string (make-array '(16)
                                                :element-type '(unsigned-byte 8)
                                                :initial-contents '#(181 65 217 66 164 67 181 217 164 223 164 222 164 185 161 163))
                                    :external-format :euc-jp))
"?A?B?C休みます。"

Might *euc-jp* be something other than a (vector (unsigned-byte 8))?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top