문제

I know there is plenty of information about converting QString to char*, but I still need some clarification in this question.

Qt provides QTextCodecs to convert QString (which internally stores characters in unicode) to QByteArray, allowing me to retrieve char* which represents the string in some non-unicode encoding. But what should I do when I want to get a unicode QByteArray?

QTextCodec* codec = QTextCodec::codecForName("UTF-8");
QString qstr = codec->toUnicode("Юникод");
std::string stdstr(reinterpret_cast<const char*>(qstr.constData()), qstr.size() * 2 );  // * 2 since unicode character is twice longer than char
qDebug() << QString(reinterpret_cast<const QChar*>(stdstr.c_str()), stdstr.size() / 2); // same

The above code prints "Юникод" as I've expected. But I'd like to know if that is the right way to get to the unicode char* of the QString. In particular, reinterpret_casts and size arithmetics in this technique looks pretty ugly.

도움이 되었습니까?

해결책

The below applies to Qt 5. Qt 4's behavior was different and, in practice, broken.

You need to choose:

  1. Whether you want the 8-bit wide std::string or 16-bit wide std::wstring, or some other type.

  2. What encoding is desired in your target string?

Internally, QString stores UTF-16 encoded data, so any Unicode code point may be represented in one or two QChars.

Common cases:

  • Locally encoded 8-bit std::string (as in: system locale):

    std::string(str.toLocal8Bit().constData())
    
  • UTF-8 encoded 8-bit std::string:

    str.toStdString()
    

    This is equivalent to:

    std::string(str.toUtf8().constData())
    
  • UTF-16 or UCS-4 encoded std::wstring, 16- or 32 bits wide, respectively. The selection of 16- vs. 32-bit encoding is done by Qt to match the platform's width of wchar_t.

    str.toStdWString()
    
  • U16 or U32 strings of C++11 - from Qt 5.5 onwards:

    str.toStdU16String()
    str.toStdU32String()
    
  • UTF-16 encoded 16-bit std::u16string - this hack is only needed up to Qt 5.4:

    std::u16string(reinterpret_cast<const char16_t*>(str.constData()))
    

    This encoding does not include byte order marks (BOMs).

It's easy to prepend BOMs to the QString itself before converting it:

QString src = ...;
src.prepend(QChar::ByteOrderMark);
#if QT_VERSION < QT_VERSION_CHECK(5,5,0)
auto dst = std::u16string{reinterpret_cast<const char16_t*>(src.constData()),
                          src.size()};
#else
auto dst = src.toStdU16String();

If you expect the strings to be large, you can skip one copy:

const QString src = ...;
std::u16string dst;
dst.reserve(src.size() + 2); // BOM + termination
dst.append(char16_t(QChar::ByteOrderMark));
dst.append(reinterpret_cast<const char16_t*>(src.constData()),
           src.size()+1);

In both cases, dst is now portable to systems with either endianness.

다른 팁

Use this:

QString Widen(const std::string &stdStr)
{
    return QString::fromUtf8(stdStr.data(), stdStr.size());
}

std::string Narrow(const QString &qtStr)
{
    QByteArray utf8 = qtStr.toUtf8();
    return std::string(utf8.data(), utf8.size());
}

In all cases you should have utf8 in std::string.

You can get the QByteArray from a UTF-16 encoded QString using this:

QTextCodec *codec = QTextCodec::codecForName("UTF-16");
QTextEncoder *encoderWithoutBom = codec->makeEncoder( QTextCodec::IgnoreHeader );
QByteArray array  = encoderWithoutBom->fromUnicode( str );

This way you ignore the unicode byte order mark (BOM) at the beginning.

You can convert it to char * like:

int dataSize=array.size();
char * data= new char[dataSize];
for(int i=0;i<dataSize;i++)
{
    data[i]=array[i];
}

Or simply:

char *data = array.data();
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top