Question

Currently, I have this character ° (a degree symbol), that I need to convert it to /00B0. I noticed that there is a library called ICU for C/C++, but will I need to use such library? My input is encoded as ISO/IEC 8859-1.

Does the general C++ libraries have this DECODE function already implemented or is the ICU library needed for such operations?

If there is such a method to call upon a character such as ° please forward me to such or write up a quick example? :).

EDIT So I cycle through an entire line and when I see a special character, or rather some character that isn't an alpha character, digit character, '-' character, or ' ' character, I ask for the output of the character that didn't pass any of those tests.

I get an output like \303 which is an OCTAL format of the special character. heres the code I use to do the tests:

if (isalpha(aline[i+1]) || isdigit(aline[i+1]) || aline[i+1] == '-' || aline[i+1] == ' ')
   regionName.push_back(aline[i+1]);
else
   cout << aline[i+1] << endl;

So when the else statement is executed, I get octal outputs... by default... How would I change that to unicode format?

Example output:

\303
\203
\302
Was it helpful?

Solution 2

Welp, heres the answer I needed :) works great!!

include the following libraries:

#include <sstream>
#include <iomanip>

and pass any string you like to the function, it will encode all characters that are 'special'

static string EncodeNonASCIICharacters (std::string value)
{
    ostringstream stringBuilder;

    for (int i = 0; i < value.length(); i++)
    {

        unsigned int character = *reinterpret_cast<unsigned char *>(&(value[i]));
        if (character > 127)
        {
            stringBuilder << "\\u";
            stringBuilder << setw(4) << hex << setfill('0') << character;
        } else {
            string aValue;
            aValue += value[i];
            stringBuilder << aValue;
        }
    }

    return stringBuilder.str();
}

OTHER TIPS

There are three basic things when it comes to UNICODE.

  1. reading characters
  2. storing characters in memory
  3. writing/displaying characters

In unicode-applications the strings are usually stored as 2-byte characters. For 1 and 3 there is nothing in plain C++. For Point 2 standard C++ library offers you a class wstring for storing characters as 2-byte strings.

If you say "I have the char" what do you mean by that? Do you have it in a file? Do you read it from console? In both cases you have to know the encoding of your input source.

When displaying the char, you have to be sure, your GUI library can handle the unicode.

So basic steps in pseudo-code are:

 char* myData = "some local-encoding data";
 MyUnicodeCapableStrincClass myString = MyUnicodeCapableStrincClass::fromsomeLocalEncoding( myData );
 MyUnicodeCapableGuiTextControl.setText( myString );

Knowing this, you should find the code example in ICU documentation faster, I hope. I was not aware of ICU till now. (I'm using Qt - there the unicode is included since 1998.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top