Question

so I have an XML document I'm trying to import using XmlTextReader in C#, and my code works well except for one part, that's where the tag line is not on the same line as the actually text/content, for example with product_name:

    <product> 
        <sku>27939</sku> 
        <product_name>
            Sof-Therm Warm-Up Jacket
        </product_name> 
        <supplier_number>ALNN1064</supplier_number> 
    </product>

My code to try to sort the XML document is as such:

while (reader.Read())
            {
                switch (reader.Name)
                {
                    case "sku":
                        newEle = new XMLElement();
                        newEle.SKU = reader.ReadString();
                        break;
                    case "product_name":
                        newEle.ProductName = reader.ReadString();
                        break;
                    case "supplier_number":
                        newEle.SupplierNumber = reader.ReadString();
                        products.Add(newEle);
                        break;
                }
            }

I have tried almost everything I found in the XmlTextReader documentation

reader.MoveToElement();
reader.MoveToContent();
reader.MoveToNextAttribute();

and a couple others that made less sense, but none of them seem to be able to consistently deal with this issue. Obviously I could fix this one case, but then it would break the regular cases. So my question is, would there be a way to have it after I find the "product_name" tag to go to the next line that contains text and extract it?

I should have mentioned, I am outputting it to an HTML table after and the element is coming up blank so I'm fairly certain it is not reading it correctly.

Thanks in advanced!

Was it helpful?

Solution

I think you will find Linq To Xml easier to use

var xDoc = XDocument.Parse(xmlstring); //or XDocument.Load(filename);

int sku = (int)xDoc.Root.Element("sku");
string name = (string)xDoc.Root.Element("product_name");
string supplier = (string)xDoc.Root.Element("supplier_number");

You can also convert your xml to dictionary

var dict = xDoc.Root.Elements()
           .ToDictionary(e => e.Name.LocalName, e => (string)e);

Console.WriteLine(dict["sku"]);

OTHER TIPS

It looks like you may need to remove the carriage returns, line feeds, tabs, and spaces before and after the text in the XML element. In your example, you have

    <!-- 1. Original example -->
    <product_name>
        Sof-Therm Warm-Up Jacket
    </product_name>

    <!-- 2. It should probably be. If possible correct the XML generator. -->
    <product_name>Sof-Therm Warm-Up Jacket</product_name>

    <!-- 3a. If white space is important, then preserve it -->
    <product_name xml:space='preserve'>
        Sof-Therm Warm-Up Jacket
    </product_name>

    <!-- 3b. If White space is important, use CDATA -->
    <product_name>!<[CDATA[
        Sof-Therm Warm-Up Jacket
    ]]></product_name>

The XmlTextReader has a WhitespaceHandling property, but when I tested it, it still including the returns and indentation:

reader.WhitespaceHandling = WhitespaceHandling.None;

An option is to use a method to remove the extra characters while you are parsing the document. This method removes the normal white space at the beginning and end of a string:

string TrimCrLf(string value)
{
    return Regex.Replace(value, @"^[\r\n\t ]+|[\r\n\t ]+$", "");
}

    // Then in your loop...
    case "product_name":
       // Trim the contents of the 'product_name' element to remove extra returns
       newEle.ProductName = TrimCrLf(reader.ReadString());
       break;

You can also use this method, TrimCrLf(), with Linq to Xml and the traditional XmlDocument. You can even make it an extension method:

public static class StringExtensions
{
    public static string TrimCrLf(this string value)
    {
        return Regex.Replace(value, @"^[\r\n\t ]+|[\r\n\t ]+$", "");
    }
}

// Use it like:
newEle.ProductName = reader.ReadString().TrimCrLf();

Regular expression explanation:

  • ^ = Beginning of field
  • $ = End of field
  • []+= Match 1 or more of any of the contained characters
  • \n = carriage return (0x0D / 13)
  • \r = line feed (0x0A / 10)
  • \t = tab (0x09 / 9)
  • ' '= space (0x20 / 32)

I have run into a similar problem before when dealing with text that originated on a Mac platform due to reversed \r\n in newlines. Suggest you try Ryan's regex solution, but with the following regex:

         "^[\r\n]+|[\r\n]+$"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top