Question

I have a 6GB XML file and I'm using XmlReader to loop through the file. The file's huge but there's nothing I can do about that. I use LINQ, but the size doesn't let me use XDocument as I get an OutOfMemory error.

I'm using XmlReader to loop through the whole file and extract what I need. I'm including a sample XML file.

Essentially, this is what I do:

  1. Find tag Container. If found, then retrieve attribute "ID".
  2. If "ID" begins with LOCAL, then this is what I'll be reading.
  3. Reader loop until I find tag Family with value CELL_FD
  4. When found, loop the reader.read() until I find tag IMPORTANT_VALUE.
  5. Once found, read value of IMPORTANT_VALUE.
  6. I'm done with this container, so continue looping until I find the next Container (that's where the break comes in).

This is the simplified version of how I've been reading the file and finding the relevant values.

while (myReader.Read())
{
    if ((myReader.Name == "CONTAINER"))
    {
        if (myReader.HasAttributes) 
        {
            string Attribute = myReader.GetAttribute("id");
            if (Attribute.IndexOf("LOCAL_") >= 0)
            {
                while (myReader.Read())
                {
                    if (myReader.Name == "FAMILY")
                    {
                        myReader.Read();//read value
                        string Family = myReader.Value;
                        if (Family == "CELL_FDD")
                        {
                            while (myReader.Read())
                            {
                                if ((myReader.Name == "IMPORTANT_VALUE"))
                                {
                                    myReader.Read();
                                    string Counter = myReader.Value;
                                    Console.WriteLine(Attribute + " (found: " + Counter + ")");
                                    break;
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

And this is the XML:

<es:esFD xmlns:es="File.xsd">
    <vs:vsFD xmlns:vs="OTHER_FILE.xsd">
    <CONTAINER id="LOCAL_CONTAINER1">
        <ATTRIBUTES>
            <FAMILY>CELL_FDD</FAMILY>
            <CELL_FDD>
                <VAL1>1.1.2.3</VAL1>
                <VAL2>JSMITH</VAL2>
                <VAL3>320</VAL3>
                <IMPORTANT_VALUE>VERY</IMPORTANT_VALUE>
                <VAL4>320</VAL4>
            </CELL_FDD>
            <FAMILY>BLAH</FAMILY>
            <BLAH>
                <VAL1>1.4.43.3</VAL1>
                <VAL2>NA</VAL2>
                <VAL3>349</VAL3>
                <IMPORTANT_VALUE>NA</IMPORTANT_VALUE>
                <VAL4>43</VAL4>
                <VAL5>00</VAL5>
                <VAL6>12</VAL6>
            </BLAH>
        </ATTRIBUTES>
    </CONTAINER>    
    <CONTAINER id="FOREIGN_ELEMENT1">
        <ATTRIBUTES>
            <FAMILY>CELL_FDD</FAMILY>
            <CELL_FDD>
                <VAL1>1.1.2.3</VAL1>
                <VAL2>JSMITH</VAL2>
                <VAL3>320</VAL3>
                <IMPORTANT_VALUE>VERY</IMPORTANT_VALUE>
                <VAL4>320</VAL4>
            </CELL_FDD>
            <FAMILY>BLAH</FAMILY>
            <BLAH>
                <VAL1>1.4.43.3</VAL1>
                <VAL2>NA</VAL2>
                <VAL3>349</VAL3>
                <IMPORTANT_VALUE>NA</IMPORTANT_VALUE>
                <VAL4>43</VAL4>
                <VAL5>00</VAL5>
                <VAL6>12</VAL6>
            </BLAH>
        </ATTRIBUTES>
    </CONTAINER>    
    </vs:vsFD>
</es:esFD>

How can I break from the most inner loop so that I can reach the top-most loop?

Was it helpful?

Solution 3

Using svick's comment, I ended up combining LINQ to XML. Once I reached the correct element and checked that the attribute had the correct ID, I dumped it to XElement.Load.

OTHER TIPS

Using separate methods should make it easier to control your loops:

while (myReader.Read())
{
    if ((myReader.Name == "CONTAINER"))
    {
        ProcessContainerElement(myReader);
    }
}

In the ProcessContainerElement method, you can return when you determine that you need to start looking for the next CONTAINER element.

private void ProcessContainerElement(XmlReader myReader)
{
    while (whatever)
    {
        if ((myReader.Name == "IMPORTANT_VALUE"))
        {
            myReader.Read();
            string Counter = myReader.Value;
            Console.WriteLine(Attribute + " (found: " + Counter + ")");
            return;
        }
    }
}

You can read with XmlReader and each node put to XmlDocument.

Something like this, not tested:

bool notFound = false;
notFound |= !reader.ReadToDescendant("root");
notFound |= !reader.ReadToDescendant("CONTAINER");

if (notFound)
    Throw new Exception("[Не удаётся найти \"/root/CONTAINER\"]");

do
{
    XmlDocument doc = new XmlDocument();
    doc.LoadXml(reader.ReadOuterXml());
    XmlNode container = doc.DocumentElement;

    // do your work with container
}
while (reader.ReadToNextSibling("CONTAINER"));

reader.Close();
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top