Correzione dei problemi di XmlReader utilizzando ReadToDescendant e / o ReadElementContentAsObject

https://stackoverflow.com/questions/2249023

20-09-2019
|

Domanda

Sto lavorando su un misterioso bug nel solito molto buono progetto open source Lettore di dati Excel . E 'saltare i valori di lettura dal mio particolare OpenXML .xlsx foglio di calcolo.

Il problema si verifica nel ReadSheetRow metodo (codice dimostrativo qui sotto). La sorgente XML viene salvato da Excel e non contiene spazi bianchi, che è quando si verifica il comportamento strano. Tuttavia XML che è stato riformattato con spazi bianchi (per esempio in Visual Studio andare a modificare, Advanced, Document Format) funziona tutto bene!

dati di prova con spazi:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <sheetData>
        <row r="5" spans="1:73" s="7" customFormat="1">
            <c r="B5" s="12">
                <v>39844</v>
            </c>
            <c r="C5" s="8"/>
            <c r="D5" s="8"/>
            <c r="E5" s="8"/>
            <c r="F5" s="8"/>
            <c r="G5" s="8"/>
            <c r="H5" s="12">
                <v>39872</v>
            </c>
            <c r="I5" s="8"/>
            <c r="J5" s="8"/>
            <c r="K5" s="8"/>
            <c r="L5" s="8"/>
            <c r="M5" s="8"/>
            <c r="N5" s="12">
                <v>39903</v>
            </c>
        </row>
    </sheetData>
</worksheet>

I dati dei test senza spazi bianchi:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"><sheetData><row r="5" spans="1:73" s="7" customFormat="1"><c r="B5" s="12"><v>39844</v></c><c r="C5" s="8"/><c r="D5" s="8"/><c r="E5" s="8"/><c r="F5" s="8"/><c r="G5" s="8"/><c r="H5" s="12"><v>39872</v></c><c r="I5" s="8"/><c r="J5" s="8"/><c r="K5" s="8"/><c r="L5" s="8"/><c r="M5" s="8"/><c r="N5" s="12"><v>39903</v></c></row></sheetData></worksheet>

il codice di esempio che illustra il problema:

Si noti che A è uscita dopo _xmlReader.Read() , B dopo ReadToDescendant , e C dopo ReadElementContentAsObject .

while (reader.Read())
{
    if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*A* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));

    if (reader.NodeType == XmlNodeType.Element && reader.Name == "c")
    {
        string a_s = reader.GetAttribute("s");
        string a_t = reader.GetAttribute("t");
        string a_r = reader.GetAttribute("r");

        bool matchingDescendantFound = reader.ReadToDescendant("v");
        if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*B* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
        object o = reader.ReadElementContentAsObject();
        if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*C* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
    }
}

I risultati dei test per XML con spazi:

*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"'
*A* NodeType: Element, Name: 'worksheet', Empty: False, Value: ''
*A* NodeType: Element, Name: 'sheetData', Empty: False, Value: ''
*A* NodeType: Element, Name: 'row', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: False, Value: ''
*B* NodeType: Element, Name: 'v', Empty: False, Value: ''
*A* NodeType: EndElement, Name: 'c', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: True, Value: ''
*B* NodeType: Element, Name: 'c', Empty: True, Value: ''
...

I risultati dei test per XML senza spazi bianchi:

*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"'
*A* NodeType: Element, Name: 'worksheet', Empty: False, Value: ''
*A* NodeType: Element, Name: 'sheetData', Empty: False, Value: ''
*A* NodeType: Element, Name: 'row', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: False, Value: ''
*B* NodeType: Element, Name: 'v', Empty: False, Value: ''
*C* NodeType: EndElement, Name: 'c', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: True, Value: ''
*B* NodeType: Element, Name: 'c', Empty: True, Value: ''
...

I cambi di pattern indicano un problema nel ReadElementContentAsObject o, eventualmente, la posizione che ReadToDescendant sposta il XmlReader per.

Qualcuno sa cosa potrebbe accadere qui?

Soluzione

E 'abbastanza semplice. Come si può vedere dalla uscita, la prima volta che sei in " B " linea, stai posizionati al primo 'v' elemento. Poi, si chiama ReadElementContentAsObject. Che restituisce il contenuto del testo di v, e "sposta il lettore oltre il tag di elemento terminale." (Di v). Si è ora punta a un nodo di spazi se non v'è spazio bianco, o un nodo endElement (di c) se non c'è. Naturalmente, l'output non viene stampato se si tratta di spazi bianchi. In entrambi i casi, si poi fare un read () e passare all'elemento successivo. Nel caso del non-spazio bianco, hai perso l'endElement.

Il problema è molto peggio in altre situazioni specifiche. Quando si esegue un ReadElementContentAsObject di un c (chiamarlo C1), quindi si sposta sul prossimo c (c2). Poi si fa una lettura, di trasferirsi a c3, c2 e perdere per sempre.

Non ho intenzione di provare a risolvere il reale codice . Ma è chiaro che cosa avete bisogno di preoccuparsi, spostando il flusso in avanti in più di un posto. Si tratta di una fonte comune di looping errori in generale.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow