Question

Looking at definition of XML element contet and it's definition of CharData.

[43] content   ::= CharData? ((element | Reference | CDSect | PI | Comment) CharData?)*

[14] CharData  ::= [^<&]* - ([^<&]* ']]>' [^<&]*)

I noticed that this definition of CharData does not forbid having > character inside XML element. I assumed this is error so I looked at the description of CharData (emphasis mine)

The ampersand character (&) and the left angle bracket (<) MUST NOT appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they MUST be escaped using either numeric character references or the strings "&amp;" and "&lt;" respectively. The right angle bracket (>) may be represented using the string "&gt;", and MUST, for compatibility, be escaped using either "&gt;" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section.

So it seems that the [14] and the defintion of CharData are at odds. Is this assumption correct or do parsers allow for > inside element without escaping it? Or do they automatically escape it?

Was it helpful?

Solution

The character > is in fact allowed within xml without escaping, but the character sequence ]]> is not.

You MAY escape any > character as &gt;, but you MUST do so if it is part of the above sequence, i.e., the sequence ]]&gt; (or the equivalent with a character reference) is the correct way to represent that character sequence in xml when it's not used as the ending mark for a CDATA section.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top