char arr1[] = "I am a nice boy"; // Is this stored as UTF-8 (multi-byte) or ASCII?
This is stored in the compiler's execution charset. The compiler gets to choose what this is and should document it. GCC lets you set the execution encoding with the flag -fexec-charset=charset
but I think uses UTF-8 by default, MSVC uses the machine's 'encoding for non-Unicode applications' configured in the system language settings (which can never be UTF-8), and clang uses UTF-8 unconditionally.
char arr[] = "I'm a nice èboi"; // All characters should be ASCII except the 'è' one, which encoding is used for this?
cout << strlen(arr); // Returns 15 as ASCII, why?
The compiler execution charset actually doesn't have to be ASCII compatible at all. For example it could be EBDIC.
strlen(arr)
returns 15 because the string literal, encoded using the compiler execution charset, is 15 bytes long. Since the string literal is 15 characters long this probably means that the compiler execution charset used a single byte for each of those characters, including 'è'. (And since UTF-8 cannot encoded that string in only 15 bytes that conclusively indicates that your compiler is not using UTF-8 as the compiler execution charset.)
char arr2[] = "I'm a niße boy"; // And what encoding is it used here?
cout << strlen(arr2); // Returns 1514, what does this mean?
The encoding does not change based on the content of the string. The compiler will always use the execution charset. I'm assuming '1514' is a typo and strlen(arr2)
in fact return 14, because there are 14 characters in that string and since the earlier string seemed to use one byte per character as well.
If I choose "multi-byte character set" in my VS project configuration instead of "unicode", what does this mean and what will this affect?
That setting has nothing to do with the encodings used by the compiler. It just sets macros in Microsoft's headers to different things. TCHAR, all the macros that choose between *W and *A functions, etc.
In fact it's entirely possible to write a program using multi-byte character strings when you enable 'unicode' and one can also use unicode when you enable 'multi-byte character set'.
If UTF-32 usually use 4 bytes to encode a character (even if they're not needed), how can a unicode code point like U+FFFF (FFFF hexadecimal is 65535 in decimal) represent any possible unicode character if the maximum is FFFF ? (http://inamidst.com/stuff/unidata/)
This question makes no sense. Perhaps if you rephrase...