What to use to store Unicode (UTF-16) strings? (C++11)

Question

The encoding used for storing the data outside the program is the only one that matters.

That data is likely to be used from other software. Someone will want to write those strings and they'll probably use some kind of specialised editor or gasp a general-purpose text editor. UTF-8 has much better support from other software than UTF-16, and that's what I would recommend and why.

Inside the program, what encoding you use doesn't matter, as long as you do it consistently and don't mix them up in stupid ways.

Obviously, if you use the same encoding inside the program as you do outside of it, you don't need to perform any conversions and the risk of mixing them up and producing mojibake is not there.

The thing with pugixml using wchar_t is that the encoding it uses then depends on the size of wchar_t. If the size is 2, it uses UTF-16; if the size is 4 it uses UTF-32. pugixml also has the option to use UTF-8 with char by setting the PUGIXML_WCHAR_MODE macro appropriately, so you can use that instead.

If you use wchar_t API, stick to wstring. Remember: since we're inside the program, it doesn't matter if it's going to be UTF-16 or UTF-32, as long as we're consistent. If you use the char API, stick to string. You could, I guess, perform conversions from wchar_t to char16_t and use u16strings, but that wouldn't give much benefit.

The saving and loading functions in pugixml take an xml_encoding parameter that lets you pick what encoding will be on the data outside the program, and that doesn't have to match what you use internally. Pick whichever you find the most convenient.