Domanda

I wanted to try out lxml to get the elements of an internal DTD but fail to do this. First here is my xml file (http://validator.w3.org asserts it as valid):

<?xml
    version='1.1'
    encoding='utf-8'
?>
<!DOCTYPE root [
    <!ATTLIST test
        attr (A | B | C) 'B'
    >
    <!ELEMENT test (#PCDATA)>
    <!ELEMENT root (test)*>
]>
<root></root>

But using lxml.etree.DTD(file = 'test.xml') throws an exception:

Traceback (most recent call last):
  File "./test.py", line 6, in <module>
    lxml.etree.DTD(file = 'test.xml')
  File "dtd.pxi", line 285, in lxml.etree.DTD.__init__ (src/lxml/lxml.etree.c:152121)
lxml.etree.DTDParseError: Content error in the external subset, line 5, column 1

Maybe lxml.etree.DTD doesn't support internal DTD's or I'm making something wrong. I also wanted to try lxml.etree.parse() but I can't figure out the methods of this class (I have looked into the reference for parse() but it is not linking to the methods). The task is in theory simple but I can't find the needed informations.

È stato utile?

Soluzione

I'm not sure what you are looking for, but you may be able to find it using an interactive Python interpreter with tab-completion, such as IPython. That's how I found this:

import lxml.etree as ET
import io

content = '''<?xml
    version='1.1'
    encoding='utf-8'
?>
<!DOCTYPE root [
    <!ATTLIST test
        attr (A | B | C) 'B'
    >
    <!ELEMENT test (#PCDATA)>
    <!ELEMENT root (test)*>
]>
<root></root>'''

tree = ET.parse(io.BytesIO(content))
info = tree.docinfo
dtd = info.internalDTD

for elt in dtd.elements():
    print(elt)
    print(elt.content)
    print

# <lxml.etree._DTDElementDecl object name='test' prefix=None type='mixed' at 0xb73e044c>
# <lxml.etree._DTDElementContentDecl object name=None type='pcdata' occur='once' at 0xb73e04ac>

# <lxml.etree._DTDElementDecl object name='root' prefix=None type='element' at 0xb73e046c>
# <lxml.etree._DTDElementContentDecl object name='test' type='element' occur='mult' at 0xb73e04ac>
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top