Bytes decoding in D

https://stackoverflow.com/questions/21090354

27-09-2022
|

Question

I've got some wrongly decoded text fragment. It was decoded like cp866, but in fact it should be utf-8 ("нажал кабан на баклажан" --> "╨╜╨░╨╢╨░╨╗ ╨║╨░╨▒╨░╨╜ ╨╜╨░ ╨▒╨░╨║╨╗╨░╨╢╨░╨╜"). I'd like to fix it, and I've already written the code in Python which solves the task:

broken = "╨╜╨░╨╢╨░╨╗ ╨║╨░╨▒╨░╨╜ ╨╜╨░ ╨▒╨░╨║╨╗╨░╨╢╨░╨╜"
fixed = bytes(broken, 'cp866').decode('utf-8')
print(fixed) # it will print 'нажал кабан на баклажан'

However, at first I was trying to solve this issue in D, but failed to find an answer. So, how can this task be solved in D?

Solution

At the moment, D does not have extensive native facilities for converting text between encodings.

Here are some options:

As ratchet freak mentioned, D does have std.encoding, but it does not cover many encodings at the moment.
On Windows, you could use std.windows.charset.fromMBSz and toMBSz, which wrap MultiByteToWideChar and WideCharToMultiByte.
You could simply embed the encodings that interest you in your program (example).
On POSIX, you could invoke the iconv program (example), or use the libiconv library (D1 binding).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow