Вопрос

I wanted to try out lxml to get the elements of an internal DTD but fail to do this. First here is my xml file (http://validator.w3.org asserts it as valid):

<?xml
    version='1.1'
    encoding='utf-8'
?>
<!DOCTYPE root [
    <!ATTLIST test
        attr (A | B | C) 'B'
    >
    <!ELEMENT test (#PCDATA)>
    <!ELEMENT root (test)*>
]>
<root></root>

But using lxml.etree.DTD(file = 'test.xml') throws an exception:

Traceback (most recent call last):
  File "./test.py", line 6, in <module>
    lxml.etree.DTD(file = 'test.xml')
  File "dtd.pxi", line 285, in lxml.etree.DTD.__init__ (src/lxml/lxml.etree.c:152121)
lxml.etree.DTDParseError: Content error in the external subset, line 5, column 1

Maybe lxml.etree.DTD doesn't support internal DTD's or I'm making something wrong. I also wanted to try lxml.etree.parse() but I can't figure out the methods of this class (I have looked into the reference for parse() but it is not linking to the methods). The task is in theory simple but I can't find the needed informations.

Это было полезно?

Решение

I'm not sure what you are looking for, but you may be able to find it using an interactive Python interpreter with tab-completion, such as IPython. That's how I found this:

import lxml.etree as ET
import io

content = '''<?xml
    version='1.1'
    encoding='utf-8'
?>
<!DOCTYPE root [
    <!ATTLIST test
        attr (A | B | C) 'B'
    >
    <!ELEMENT test (#PCDATA)>
    <!ELEMENT root (test)*>
]>
<root></root>'''

tree = ET.parse(io.BytesIO(content))
info = tree.docinfo
dtd = info.internalDTD

for elt in dtd.elements():
    print(elt)
    print(elt.content)
    print

# <lxml.etree._DTDElementDecl object name='test' prefix=None type='mixed' at 0xb73e044c>
# <lxml.etree._DTDElementContentDecl object name=None type='pcdata' occur='once' at 0xb73e04ac>

# <lxml.etree._DTDElementDecl object name='root' prefix=None type='element' at 0xb73e046c>
# <lxml.etree._DTDElementContentDecl object name='test' type='element' occur='mult' at 0xb73e04ac>
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top