Pergunta

How can I determine the length(number of characters) in a std::wstring?

Using myStr.length() gives the byte size(I think) but its not the number of characters. Do I need to create my own function to find the number of characters or is there a native C++ way or a native WinAPI way?

Foi útil?

Solução

std::wstring::length() will give you the number of characters, where character is defined as the atomic unit of the wstring object, i.e. a wchar. This is what the Standard means when it refers to characters (see this post for some more details on the use of the word in the Standard).

However, when it comes to Unicode characters, whether one wchar corresponds to one Unicode character depends on the encoding used inside the wstring. If UTF-16 is used, which is often (but not necessarily) the case, one wchar will correspond to one Unicode character only for the base multilingual plane (i.e. all character sets derived from ISO-8859 as well as most of the commonly used CJK characters, but not some of the more exotic (e.g. classical Chinese) characters)(*). If you want to get the character count right for all Unicode characters in that case, you need to use a Unicode-aware library (e.g. ICU), or code it yourself.

(*)There are additional problems if combining characters are used, as @一二三 points out correctly. Counting those correctly is also best done using appropriate libraries.

Outras dicas

If you want to know the length in wchar_t entities, use myStr.length(). If you want to know the size in Unicode codepoints you'll have to find a library that knows how to count those. You could also write one yourself - the rules for determining whether a codepoint encoded as UTF-16 uses one or two entities are not too hard, see http://en.wikipedia.org/wiki/Utf-16. To know if your wchar_t is 16 bits (vs. 32 bits) use sizeof(wchar_t) == 2.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top