Question

I've got some wrongly decoded text fragment. It was decoded like cp866, but in fact it should be utf-8 ("нажал кабан на баклажан" --> "╨╜╨░╨╢╨░╨╗ ╨║╨░╨▒╨░╨╜ ╨╜╨░ ╨▒╨░╨║╨╗╨░╨╢╨░╨╜"). I'd like to fix it, and I've already written the code in Python which solves the task:

broken = "╨╜╨░╨╢╨░╨╗ ╨║╨░╨▒╨░╨╜ ╨╜╨░ ╨▒╨░╨║╨╗╨░╨╢╨░╨╜"
fixed = bytes(broken, 'cp866').decode('utf-8')
print(fixed) # it will print 'нажал кабан на баклажан'

However, at first I was trying to solve this issue in D, but failed to find an answer. So, how can this task be solved in D?

Was it helpful?

Solution

At the moment, D does not have extensive native facilities for converting text between encodings.

Here are some options:

  • As ratchet freak mentioned, D does have std.encoding, but it does not cover many encodings at the moment.
  • On Windows, you could use std.windows.charset.fromMBSz and toMBSz, which wrap MultiByteToWideChar and WideCharToMultiByte.
  • You could simply embed the encodings that interest you in your program (example).
  • On POSIX, you could invoke the iconv program (example), or use the libiconv library (D1 binding).
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top