
so I have an XML document I'm trying to import using XmlTextReader in C#, and my code works well except for one part, that's where the tag line is not on the same line as the actually text/content, for example with product_name:

            Sof-Therm Warm-Up Jacket

My code to try to sort the XML document is as such:

while (reader.Read())
                switch (reader.Name)
                    case "sku":
                        newEle = new XMLElement();
                        newEle.SKU = reader.ReadString();
                    case "product_name":
                        newEle.ProductName = reader.ReadString();
                    case "supplier_number":
                        newEle.SupplierNumber = reader.ReadString();

I have tried almost everything I found in the XmlTextReader documentation


and a couple others that made less sense, but none of them seem to be able to consistently deal with this issue. Obviously I could fix this one case, but then it would break the regular cases. So my question is, would there be a way to have it after I find the "product_name" tag to go to the next line that contains text and extract it?

I should have mentioned, I am outputting it to an HTML table after and the element is coming up blank so I'm fairly certain it is not reading it correctly.

Thanks in advanced!



I think you will find Linq To Xml easier to use

var xDoc = XDocument.Parse(xmlstring); //or XDocument.Load(filename);

int sku = (int)xDoc.Root.Element("sku");
string name = (string)xDoc.Root.Element("product_name");
string supplier = (string)xDoc.Root.Element("supplier_number");

You can also convert your xml to dictionary

var dict = xDoc.Root.Elements()
           .ToDictionary(e => e.Name.LocalName, e => (string)e);



It looks like you may need to remove the carriage returns, line feeds, tabs, and spaces before and after the text in the XML element. In your example, you have

    <!-- 1. Original example -->
        Sof-Therm Warm-Up Jacket

    <!-- 2. It should probably be. If possible correct the XML generator. -->
    <product_name>Sof-Therm Warm-Up Jacket</product_name>

    <!-- 3a. If white space is important, then preserve it -->
    <product_name xml:space='preserve'>
        Sof-Therm Warm-Up Jacket

    <!-- 3b. If White space is important, use CDATA -->
        Sof-Therm Warm-Up Jacket

The XmlTextReader has a WhitespaceHandling property, but when I tested it, it still including the returns and indentation:

reader.WhitespaceHandling = WhitespaceHandling.None;

An option is to use a method to remove the extra characters while you are parsing the document. This method removes the normal white space at the beginning and end of a string:

string TrimCrLf(string value)
    return Regex.Replace(value, @"^[\r\n\t ]+|[\r\n\t ]+$", "");

    // Then in your loop...
    case "product_name":
       // Trim the contents of the 'product_name' element to remove extra returns
       newEle.ProductName = TrimCrLf(reader.ReadString());

You can also use this method, TrimCrLf(), with Linq to Xml and the traditional XmlDocument. You can even make it an extension method:

public static class StringExtensions
    public static string TrimCrLf(this string value)
        return Regex.Replace(value, @"^[\r\n\t ]+|[\r\n\t ]+$", "");

// Use it like:
newEle.ProductName = reader.ReadString().TrimCrLf();

Regular expression explanation:

  • ^ = Beginning of field
  • $ = End of field
  • []+= Match 1 or more of any of the contained characters
  • \n = carriage return (0x0D / 13)
  • \r = line feed (0x0A / 10)
  • \t = tab (0x09 / 9)
  • ' '= space (0x20 / 32)

I have run into a similar problem before when dealing with text that originated on a Mac platform due to reversed \r\n in newlines. Suggest you try Ryan's regex solution, but with the following regex:

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top