My tests tell me that, as of Unicode 6.2, all characters in full compatibility decompositions have the property NFD_Quick_Check=Yes.

This leads me to believe that isNFKD(x) implies isNFD(x), and isNFKC(x) implies isNFC(x).

Are my conclusions correct? And what about stability? Are these implications guaranteed to hold for future versions of the Unicode standard?

有帮助吗?

解决方案

Your conclusions are correct. Section Design Goals of Unicode Standard Annex #15 states:

toNFKC(x) = toNFC(toNFKC(x))
toNFKD(x) = toNFD(toNFKD(x))

With regard to stability, this will hold true for future versions of Unicode if the normalized string doesn't contain any unassigned code points.

其他提示

I have found here the following that states:

In other words, the composition phase of NFC and NFKC are the same—only their decomposition phase differs, with NFKC applying compatibility decompositions.

Then there is also this:

There are two forms of normalization that convert to composite characters: Normalization Form C and Normalization Form KC. The difference between these depends on whether the resulting text is to be a canonical equivalent to the original unnormalized text or a compatibility equivalent to the original unnormalized text. (In NFKC and NFKD, a K is used to stand for compatibility to avoid confusion with the C standing for composition.) Both types of normalization can be useful in different circumstances.

In the first three figures, the NFKD form is always the same as the NFD form, and the NFKC form is always the same as the NFC form, so for simplicity those columns are omitted.

This is what I could pick out of the text that may shed some light on at least part of your question. Hope it helps

There is also this table in the Wikipedia article:

NFD Normalization Form Canonical Decomposition: Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.

NFC Normalization Form Canonical Composition: Characters are decomposed and then recomposed by canonical equivalence.

NFKD Normalization Form Compatibility Decomposition: Characters are decomposed by compatibility, and multiple combining characters are arranged in a specific order.

NFKC Normalization Form Compatibility Composition: Characters are decomposed by compatibility, then recomposed by canonical equivalence.

Looking at the explanations of what these things are, I don't think you can conclude that one implies the other. NFD decomposes by canonical equivalance, whereas NFKD are decomposed by compatibility.

In the same article it also states:

the equivalence criteria can be either canonical (NF) or compatibility (NFK).

To me this means that its either canonical or its compatibility. NFD and NFKD do different things.


In this implementation notes article it states:

For all versions, even prior to Unicode 4.1, the following policy is followed:

A normalized string is guaranteed to be stable; that is, once normalized, a string is normalized according to all future versions of Unicode.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top