Frage

I am trying to convert an approx. 100MB XML file into another XML file by putting all elements of a certain tag in the new file. Since conventional writing resulted into memory problems, I wanted to do so using Mako templates. There are about 60000 elements in the XML, and to keep memory usage low, I tried to pass a generator to the template. However, this resulted in a segfault. My knowledge of memory management is very low, but it seems to have something to do with putting the content in a template, as when I 'just print' the elements no problems arise. Am I abusing the template rendering for something it isn't for? How to solve this?

My rendering code:

from lxml import etree
from mako.template import Template
from mako.runtime import Context

ns = {'xmlns': 'http://namesp.ace/version/1'}
## get xml elements with correct tag
featgen = etree.iterparse('somefile.xml', tag='{%s}sometag' % ns['xmlns'], events=('start',))
templatefn = 'template.mako'
# create template
template = Template(filename=templatefn)
with open('outfile', 'w') as fp:
    ctx = Context(fp, tag=featgen)
    template.render_context(ctx)

And the template:

<%! from lxml import etree
def tostr_xml(el, ns):
    strxml = etree.tostring(el)
    el.clear()
    strxml = strxml.replace('xmlns="{0}" '.format(ns['xmlns']), '')
    strxml = strxml.replace('xmlns:p="{0}" '.format(ns['xmlns:p']), '')
    strxml = strxml.replace('xmlns:xsi="{0}" '.format(ns['xmlns:xsi']), '')
    return strxml
%>
<?xml version='1.0' encoding='ASCII'?>
<root>
  <features>
    % for ev,el in tag:
    ${tostr_xml,el, {'xmlns':'http://namesp.ace/version/1'})}
    % endfor
  </features>
</root>
War es hilfreich?

Lösung

I solved the problem by turning:

featgen = etree.iterparse('somefile.xml', tag='{%s}sometag' % ns['xmlns'], events=('start',))

into:

featgen = etree.iterparse('somefile.xml', tag='{%s}sometag' % ns['xmlns'])

I can however not say why this works. If anyone'd like to explain I'll accept that answer instead.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top