First, please note that these distinction exist since 1998; UCN were first introduced in C++98, a new standard (ISO/IEC 14882, 1st edition:1998), and then made their way into the C99 revision of the C standard; but the C committee (and existing implementers, and their pre-existing implementations) did not feel the C++ way was the only way to achieve the trick, particularly with corner cases and the use of smaller character sets than Unicode, or just different; for example, the requirement to ship the mapping tables from whatever-supported-encodings to Unicode was a preoccupation for C vendors in 1998.
- The C standard (consciously) avoids deciding this, and let the compiler chooses how to proceed. While your reasoning takes obviously place with the context of UTF-8 character sets used for both source and execution, there are a large (and pre-existing) range of different C99/C11 compilers available which are using different sets; and the committee felt it should not restrict the implementers too much on this issue. In my experience, most compilers keep it distinct in practice (for performance reasons.)
- Because of this freedom, some compilers can have it identical after phase 1 (like a C++ compiler shall), while other can left it distinct as late as phase 7 for the first degree character; the second degree characters (in the string) ought to be the same after phase 5, assuming the degree character is part of the extended execution character set supported by the implementation.
For the other answers, I won't add anything to Jonathan's.
About your additional question about the C++ more deterministic process to be Standard-C-compliant, it is clearly a goal to be so; and if you find a corner case which shows otherwise (a C++11-compliant preprocessor which would not conform to the C99 and C11 standards), then you should consider asking the WG14 committee about a potential defect.
Obviously, the reverse is not true: it is possible to write a pre-processor with handling of UCN which complies to C99/C11 but not to the C++ standards; the most obvious difference is with
#define str(t) #t
#define str_is(x, y) const char * x = y " is " str(y)
str_is(hell°, "hell°");
str_is(hell\u00B0, "hell\u00B0");
which a C-compliant preprocessor can render in a similar same way as your examples (and most do so) and as such, will have distinct renderings; but I am under the impression that a C++-compliant preprocessor is required to transform into (strictly equivalent)
const char* hell° = "hell°" " is " "\"hell\\u00b0\"";
const char* hell\u00b0 = "hell\\u00b0" " is " "\"hell\\u00b0\"";
Last, but not least, I believe not much compilers are fully compliant to this very level of details!