Question

I have a big XMLNode in VB.NET (we're talking 10s/100s of thousands of nodes, and millions of attributes).

The general structure of the XML is:

<other xml nodes></other xml nodes>
<list>
    <item attr1="" attr2="" attr3="" attr4="" ... />
    <item attr1="" attr2="" attr3="" attr4="" ... />
    <item attr1="" attr2="" attr3="" attr4="" ... />
    <item attr1="" attr2="" attr3="" attr4="" ... />
    <item attr1="" attr2="" attr3="" attr4="" ... />
    <item attr1="" attr2="" attr3="" attr4="" ... />
    .
    .
    .
</list>

What I want to do, is remove item nodes based upon certain attributes (say if I had 50,000 nodes, my criteria would probably delete 49500 of them).

The problem I have, is that it takes a few seconds for my code to remove such a large number of nodes, and I need it to go faster.

I've tried a few different ways of doing this, the fastest I've got todate is:

dim xnlList = xnBigXMLNode.selectNodes("//list/item[@attr1=sample]")
for each xnNode in xnlList
    xnNode.ParentNode.RemoveChild(xnNode)
next

*Please forgive any mistakes in the above code, I'm not at my development machine at the moment

As an added constraint, I need to keep the "other xml nodes" in the xmlNode.

I've considered deleting the whole list, and adding back the nodes that I needed, but that took even longer to execute.

Can anybody think of a way to get this deleting thousands of nodes any faster?

Thanks in advance

Était-ce utile?

La solution 2

I've figured out a solution that takes milliseconds to run (as opposed to the few seconds it took before).

It turns out that:

dim xnlList = xnBigXMLNode.selectNodes("//list/item[@attr1=sample]")
for each xnNode in xnlList
    xnNode.ParentNode.RemoveChild(xnNode)
next

takes a few seconds to run, whereas:

dim xnlList = xnBigXMLNode.selectNodes("//list/item")
for each xnNode in xnlList
    xnNode.ParentNode.RemoveChild(xnNode)
next

takes milliseconds.

So inside my loop I wrote code to test each node for the required attributes, and deleted appropriately.

Thanks for all the helpful suggestions in the other answers.

Autres conseils

Looks like a classic case for writing a SAX filter. The parser generates SAX events and passes them to your application, your application passes a subset of the events on to a serializer, which generates the new XML file. No need to build a tree in memory.

I don't know the details of how to do this in VB, being a Java man, but the technology certainly exists.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top