The below applies to Qt 5. Qt 4's behavior was different and, in practice, broken.
You need to choose:
Whether you want the 8-bit wide
std::string
or 16-bit widestd::wstring
, or some other type.What encoding is desired in your target string?
Internally, QString
stores UTF-16 encoded data, so any Unicode code point may be represented in one or two QChar
s.
Common cases:
Locally encoded 8-bit
std::string
(as in: system locale):std::string(str.toLocal8Bit().constData())
UTF-8 encoded 8-bit
std::string
:str.toStdString()
This is equivalent to:
std::string(str.toUtf8().constData())
UTF-16 or UCS-4 encoded
std::wstring
, 16- or 32 bits wide, respectively. The selection of 16- vs. 32-bit encoding is done by Qt to match the platform's width ofwchar_t
.str.toStdWString()
U16 or U32 strings of C++11 - from Qt 5.5 onwards:
str.toStdU16String() str.toStdU32String()
UTF-16 encoded 16-bit
std::u16string
- this hack is only needed up to Qt 5.4:std::u16string(reinterpret_cast<const char16_t*>(str.constData()))
This encoding does not include byte order marks (BOMs).
It's easy to prepend BOMs to the QString
itself before converting it:
QString src = ...;
src.prepend(QChar::ByteOrderMark);
#if QT_VERSION < QT_VERSION_CHECK(5,5,0)
auto dst = std::u16string{reinterpret_cast<const char16_t*>(src.constData()),
src.size()};
#else
auto dst = src.toStdU16String();
If you expect the strings to be large, you can skip one copy:
const QString src = ...;
std::u16string dst;
dst.reserve(src.size() + 2); // BOM + termination
dst.append(char16_t(QChar::ByteOrderMark));
dst.append(reinterpret_cast<const char16_t*>(src.constData()),
src.size()+1);
In both cases, dst
is now portable to systems with either endianness.