Get XmlNodeList if a particular element value or its attribute value is present in a given list of strings

StackOverflow https://stackoverflow.com/questions/17573939

Question

I would like to get XmlNodeList from a huge XML file.

Conditions: I have a List of unique ID values, say IDList
Case I: Collect all the nodes where element called ID has value from IDList.
Case II: Collect all nodes where one of the attribute called idName of element ID has value from IDList.

In short, extract only the nodes which match with the values given in the IDList.

I did this using some loops like load this XML to XmlDocument to iterate over all nodes and ID value but what I am looking for is some sophisticated method to do it faster and in quick way. Because looping isn't a solution for a large XML file.

My try:

try
{
using (XmlReader reader = XmlReader.Create(URL))
{
    XmlDocument doc = new XmlDocument();
    doc.Load(reader);
    XmlNodeList nodeList = doc.GetElementsByTagName("idgroup");
    foreach (XmlNode xn in nodeList)
    {
        string id = xn.Attributes["id"].Value;
        string value = string.Empty;
        if (IDList.Contains(id))
        {
            value = xn.ChildNodes[1].ChildNodes[1].InnerText; // <value>
            if (!string.IsNullOrEmpty(value))
            {
                listValueCollection.Add(value);
            }
        }
    }
}
}
catch
{}

XML (XLIFF) structure:

<XLIFF>
    <xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.2">
         <file date="2013-07-17">
              <body>
                   <id idName="test_001" >
                       <desc-group name="test_001">
                               <desc type="text"/>
                       </desc-group>
                       <result-unit idName="test_001_text">
                               <source>abcd</source>
                               <result>xyz</result>
                       </result-unit>
                   </id>
             </body>
       </file>
 </xliff>

Collect all the nodes like above where idName matches.

Was it helpful?

Solution

EDIT

This is a test that can parse the example you are giving. It attempts to reach the result node directly, so that it stays as efficient as possible.

[Test]
public void TestXPathExpression()
{
    var idList = new List<string> { "test_001" };
    var resultsList = new List<string>();

    // Replace with appropriate method to open your URL.
    using (var reader = new XmlTextReader(File.OpenRead("fixtures\\XLIFF_sample_01.xlf")))
    {
        var doc = new XmlDocument();
        doc.Load(reader);
        var root = doc.DocumentElement;

        // This is necessary, since your example is namespaced.
        var nsmgr = new XmlNamespaceManager(doc.NameTable);
        nsmgr.AddNamespace("x", "urn:oasis:names:tc:xliff:document:1.2");

        // Go directly to the node from which you want the result to come from.
        foreach (var nodes in idList
            .Select(id => root.SelectNodes("//x:file/x:body/x:id[@idName='" + id + "']/x:result-unit/x:result", nsmgr))
            .Where(nodes => nodes != null && nodes.Count > 0))
                resultsList.AddRange(nodes.Cast<XmlNode>().Select(node => node.InnerText));

    }

    // Print the resulting list.
    resultsList.ForEach(Console.WriteLine);
}

You can extract only those nodes you need by using an XPath query. A brief example on how you 'd go about it:

using (XmlReader reader = XmlReader.Create(URL))
{
    XmlDocument doc = new XmlDocument();
    doc.Load(reader);
    foreach(var id in IDList) {
        var nodes = doc.SelectNodes("//xliff/file/body/id[@idName='" + id + "']");
        foreach(var node in nodes.Where(x => !string.IsNullOrEmpty(x.ChildNodes[1].ChildNodes[1].InnerText)))
            listValueCollection.Add(node.ChildNodes[1].ChildNodes[1].InnerText);
    }
}

The xpath expression is of course an example. If you want, you can post an example of your XML so I can give you something more accurate.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top