DTD Parsing: Parameter entity reference name including another parameter entity reference - is it well formed?

StackOverflow https://stackoverflow.com/questions/8253210

  •  07-03-2021
  •  | 
  •  

Question

I'm writing a DTD parser and I'm a little uncertain how to expand parameter entities. For example is this DTD excerpt valid?

<!ENTITY % xx '&#37;zz;'>
<!ENTITY % zz '&#60;!ENTITY tricky "error-prone" >' >
<!ENTITY % abcd '%xx;'>
<!ENTITY % ef 'c'>
<!ENTITY % gh '%ab%ef;d;'>
%gh;

More specifically I'm curious to know if entity gh will expand correctly. In my opinion %ef; should expand first to 'c' and then the newly formed PE reference %abcd; should expand to %xx; and so on.

Most of the parsers I've seen are identifying %ab as a PE reference and fail since that PE is not defined. But I found absolutely no reference in the standard asking for the parser to work this way. The only reference I found was Included in Literal as opposed to Included as PE where it states that the replacement text must be enlarged with one leading and one following 0x20 - but not in a literal.

Any pointers? Thank you.

Was it helpful?

Solution

The first rows of the example code of this question are taken from this example in the W3C XML recommendation, so those who are not familiar with the quite convoluted logic of the DTD escapes should see the explanation that is written there.

More specifically I'm curious to know if entity gh will expand correctly.

No, it won't. Reason for this is that your definition for parameter entity gh has malformed syntax. Syntax for parameter entity definitions is: (ref)

PEDecl   ::=    '<!ENTITY' S '%' S Name S PEDef S? '>'
PEDef    ::=    EntityValue | ExternalID

and syntax for entity values is: (ref)

EntityValue   ::=       '"' ([^%&"] | PEReference | Reference)* '"'
                        |  "'" ([^%&'] | PEReference | Reference)* "'"

"PEReference" is a parameter entity reference (%Name;) and "Reference" is either a general entity reference (&Name;)or a character reference (&#123; or &#x7B). (ref)

Here [^%&"] and [^%&'] mean that the entity value cannot contain a % character unless it denotes the start of a (parameter entity) name production. And because % is not a valid name character, but it appears before the name production end character ;, the character sequence %ab% will cause an error. I'd say it should work if the first % sign is replaced with a character reference so then the %ef; entity replacement is done before the %ab... is seen as a reference to a parameter entity name.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top