QString to unicode std::string

Question 1

The below applies to Qt 5. Qt 4's behavior was different and, in practice, broken.

You need to choose:

Whether you want the 8-bit wide std::string or 16-bit wide std::wstring, or some other type.
What encoding is desired in your target string?

Internally, QString stores UTF-16 encoded data, so any Unicode code point may be represented in one or two QChars.

Common cases:

Locally encoded 8-bit std::string (as in: system locale):
```
std::string(str.toLocal8Bit().constData())
```

UTF-8 encoded 8-bit std::string:

str.toStdString()

This is equivalent to:

std::string(str.toUtf8().constData())

UTF-16 or UCS-4 encoded std::wstring, 16- or 32 bits wide, respectively. The selection of 16- vs. 32-bit encoding is done by Qt to match the platform's width of wchar_t.
```
str.toStdWString()
```
U16 or U32 strings of C++11 - from Qt 5.5 onwards:
```
str.toStdU16String()
str.toStdU32String()
```
UTF-16 encoded 16-bit std::u16string - this hack is only needed up to Qt 5.4:
```
std::u16string(reinterpret_cast<const char16_t*>(str.constData()))
```
This encoding does not include byte order marks (BOMs).

It's easy to prepend BOMs to the QString itself before converting it:

QString src = ...;
src.prepend(QChar::ByteOrderMark);
#if QT_VERSION < QT_VERSION_CHECK(5,5,0)
auto dst = std::u16string{reinterpret_cast<const char16_t*>(src.constData()),
                          src.size()};
#else
auto dst = src.toStdU16String();

If you expect the strings to be large, you can skip one copy:

const QString src = ...;
std::u16string dst;
dst.reserve(src.size() + 2); // BOM + termination
dst.append(char16_t(QChar::ByteOrderMark));
dst.append(reinterpret_cast<const char16_t*>(src.constData()),
           src.size()+1);

In both cases, dst is now portable to systems with either endianness.

Question 2

Use this:

QString Widen(const std::string &stdStr)
{
    return QString::fromUtf8(stdStr.data(), stdStr.size());
}

std::string Narrow(const QString &qtStr)
{
    QByteArray utf8 = qtStr.toUtf8();
    return std::string(utf8.data(), utf8.size());
}

In all cases you should have utf8 in std::string.

Question 3

You can get the QByteArray from a UTF-16 encoded QString using this:

QTextCodec *codec = QTextCodec::codecForName("UTF-16");
QTextEncoder *encoderWithoutBom = codec->makeEncoder( QTextCodec::IgnoreHeader );
QByteArray array  = encoderWithoutBom->fromUnicode( str );

This way you ignore the unicode byte order mark (BOM) at the beginning.

You can convert it to char * like:

int dataSize=array.size();
char * data= new char[dataSize];
for(int i=0;i<dataSize;i++)
{
    data[i]=array[i];
}

Or simply:

char *data = array.data();