Why do we need both UCS and Unicode character sets? [closed]

https://stackoverflow.com/questions/8860644

unicode
ucs

16-04-2021
|

题

I guess the codepoints of UCS and Unicode are the same, am I right?

In that case, why do we need two standards (UCS and Unicode)?

解决方案

They are not two standards. The Universal Character Set (UCS) is not a standard but something defined in a standard, namely ISO 10646. This should not be confused with encodings, such as UCS-2.

It is difficult to guess whether you actually mean different encodings or different standards. But regarding the latter, Unicode and ISO 10646 were originally two distinct standardization efforts with different goals and strategies. They were however harmonized in the early 1990s to avoid all the mess resulting from two different standards. They have been coordinated so that the code points are indeed the same.

They were kept distinct, though, partly because Unicode is defined by an industry consortium that can work flexibly and has great interest in standardizing things beyond simple code point assignments. The Unicode Standard defines a large number of principles and processing rules, not just the characters. ISO 10646 is a formal standard that can be referenced in standards and other documents of the ISO and its members.

其他提示

The codepoints are the same but there are some differences. From the Wikipedia entry about the differences between Unicode and ISO 10646 (i.e. UCS):

The difference between them is that Unicode adds rules and specifications that are outside the scope of ISO 10646. ISO 10646 is a simple character map, an extension of previous standards like ISO 8859. In contrast, Unicode adds rules for collation, normalization of forms, and the bidirectional algorithm for scripts like Hebrew and Arabic

You might find useful to read the Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

I think the differences come from the way the code points are encoded. UCS-x uses a fixed amount of bytes to encode a code point. For example, UCS-2 uses two bytes. However, UCS-2 cannot encode code points that would require over 2 bytes. On the other hand, UTF uses variable amount of bytes for encoding. For example, UTF-8 uses at least one byte (for ascii characters) but uses more bytes if the character is outside the ascii range.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow