Question

I would like to solve this problem by an ugly hack: declaring a "false DTD" to my "any XML"... Explaining with an example:

INPUT (any XML fragment)

<root id="root">
  <p id="p1"><i>Title</i></p>
  <p id="p2"><b id="b1">AAA<sup>1</sup>, BBB<sup>2</sup></b></p>
</root>

PHP code,

   $DTD = '
   <!DOCTYPE noname [
   <!ATTLIST ANY
      id     ID             #IMPLIED
   >
   ]>';

   $dom = new DomDocument();
   $dom->loadXML( "$DTD\n$input" );
   $e = $dom->getElementById('p1');

   var_dump($e);

This code is not a solution: $e is NULL, and I not see why... So, the question: is possible to express a "minimal DTD" that solve this problem?

Était-ce utile?

La solution

If you just want the ID mechanism to work, the simplest option is to use xml:id:

<root xml:id="root">
  <p xml:id="p1"><i>Title</i></p>
  <p xml:id="p2"><b xml:id="b1">AAA<sup>1</sup>, BBB<sup>2</sup></b></p>
</root>

According to https://fosswiki.liip.ch/display/BLOG/GetElementById+Pitfalls, xml:id should work with getElementById in PHP.


Problems with your attempt:

  1. The element name following <!DOCTYPE must match the name of the root element of the XML document. In your case, noname != root, which does not work. See http://www.w3.org/TR/xml/#sec-prolog-dtd.

  2. Attributes must be declared for each element. You cannot declare attributes for ANY. And even if the content model of an element is ANY, you still have to declare all the elements that may occur.

So there is no way to create a DTD just for ID resolution. The following validates, and it cannot really be smaller than this:

<!DOCTYPE root [
<!ELEMENT root (p+)>
<!ATTLIST root
         id  ID   #IMPLIED>
<!ELEMENT p ANY>
<!ATTLIST p
         id  ID   #IMPLIED>
<!ELEMENT b ANY>
<!ATTLIST b
         id  ID   #IMPLIED>
<!ELEMENT sup (#PCDATA)>
<!ELEMENT i (#PCDATA)>
]>
<root id="root">
  <p id="p1"><i>Title</i></p>
  <p id="p2"><b id="b1">AAA<sup>1</sup>, BBB<sup>2</sup></b></p>
</root>

It is possible to provide a smaller DTD as long as the XML parser does not attempt to validate. This document is accepted by xmllint (and by PHP) in non-validating mode:

<!DOCTYPE anyname [ 
<!ATTLIST p id ID #IMPLIED> 
]>
<root id="root">
  <p id="p1"><i>Title</i></p>
  <p id="p2"><b id="b1">AAA<sup>1</sup>, BBB<sup id="b1">2</sup></b></p>
</root>

And ID uniqueness violations on p elements are reported.

If xmllint is run with the --postvalid option (or PHP is run with LIBXML_DTDVALID enabled), this is emitted:

test.xml:4: element root: validity error : root and DTD name do not match 'root' and 'anyname'
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top