readTodeScendant 및/또는 readElementContentAsObject를 사용하여 XMLReader 문제를 수정합니다

https://stackoverflow.com/questions/2249023

20-09-2019
|

문제

나는 보통 아주 좋은 오픈 소스 프로젝트에서 신비한 버그를 연구하고 있습니다. Excel 데이터 리더. 내 특정 OpenXML .XLSX 스프레드 시트에서 읽는 값을 건너 뛰고 있습니다.

문제가 발생합니다 readsheetrow 메소드 (아래 데모 코드). 소스 XML은 Excel에 의해 저장되며 이상한 동작이 발생할 때의 공백이 포함되어 있지 않습니다. 그러나 공백으로 재구성 된 XML (예 : Visual Studio에서 편집, 고급, 형식 문서로 이동)은 완전히 잘 작동합니다!

공백으로 테스트 데이터 :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <sheetData>
        <row r="5" spans="1:73" s="7" customFormat="1">
            <c r="B5" s="12">
                <v>39844</v>
            </c>
            <c r="C5" s="8"/>
            <c r="D5" s="8"/>
            <c r="E5" s="8"/>
            <c r="F5" s="8"/>
            <c r="G5" s="8"/>
            <c r="H5" s="12">
                <v>39872</v>
            </c>
            <c r="I5" s="8"/>
            <c r="J5" s="8"/>
            <c r="K5" s="8"/>
            <c r="L5" s="8"/>
            <c r="M5" s="8"/>
            <c r="N5" s="12">
                <v>39903</v>
            </c>
        </row>
    </sheetData>
</worksheet>

공백없는 테스트 데이터 :

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"><sheetData><row r="5" spans="1:73" s="7" customFormat="1"><c r="B5" s="12"><v>39844</v></c><c r="C5" s="8"/><c r="D5" s="8"/><c r="E5" s="8"/><c r="F5" s="8"/><c r="G5" s="8"/><c r="H5" s="12"><v>39872</v></c><c r="I5" s="8"/><c r="J5" s="8"/><c r="K5" s="8"/><c r="L5" s="8"/><c r="M5" s="8"/><c r="N5" s="12"><v>39903</v></c></row></sheetData></worksheet>

문제를 보여주는 예제 코드 :

주목하십시오 ㅏ 이후에 출력됩니다 _xmlReader.Read(), 비 ~ 후에 ReadToDescendant, 그리고 씨 ~ 후에 ReadElementContentAsObject.

while (reader.Read())
{
    if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*A* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));

    if (reader.NodeType == XmlNodeType.Element && reader.Name == "c")
    {
        string a_s = reader.GetAttribute("s");
        string a_t = reader.GetAttribute("t");
        string a_r = reader.GetAttribute("r");

        bool matchingDescendantFound = reader.ReadToDescendant("v");
        if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*B* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
        object o = reader.ReadElementContentAsObject();
        if (reader.NodeType != XmlNodeType.Whitespace) outStream.WriteLine(String.Format("*C* NodeType: {0}, Name: '{1}', Empty: {2}, Value: '{3}'", reader.NodeType, reader.Name, reader.IsEmptyElement, reader.Value));
    }
}

공백이있는 XML의 테스트 결과 :

*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"'
*A* NodeType: Element, Name: 'worksheet', Empty: False, Value: ''
*A* NodeType: Element, Name: 'sheetData', Empty: False, Value: ''
*A* NodeType: Element, Name: 'row', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: False, Value: ''
*B* NodeType: Element, Name: 'v', Empty: False, Value: ''
*A* NodeType: EndElement, Name: 'c', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: True, Value: ''
*B* NodeType: Element, Name: 'c', Empty: True, Value: ''
...

공백이없는 XML의 테스트 결과 :

*A* NodeType: XmlDeclaration, Name: 'xml', Empty: False, Value: 'version="1.0" encoding="UTF-8" standalone="yes"'
*A* NodeType: Element, Name: 'worksheet', Empty: False, Value: ''
*A* NodeType: Element, Name: 'sheetData', Empty: False, Value: ''
*A* NodeType: Element, Name: 'row', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: False, Value: ''
*B* NodeType: Element, Name: 'v', Empty: False, Value: ''
*C* NodeType: EndElement, Name: 'c', Empty: False, Value: ''
*A* NodeType: Element, Name: 'c', Empty: True, Value: ''
*B* NodeType: Element, Name: 'c', Empty: True, Value: ''
...

패턴 변경은 문제를 나타냅니다 ReadElementContentAsObject 또는 아마도 위치 일 것입니다 ReadToDescendant xmlreader를 이동합니다.

여기서 무슨 일이 일어나고 있는지 아는 사람이 있습니까?

해결책

상당히 간단합니다. 출력에서 볼 수 있듯이 처음으로 ""비"라인, 당신은 첫 번째 'v'요소에 위치하고 있습니다. 그런 다음 readelementContentAsObject를 호출합니다. v의 텍스트 내용을 반환합니다. 그리고 "엔드 요소 태그를 지나서 독자를 움직입니다." (V). 공백이없는 경우 공백 노드를 가리키고 있습니다. 물론, 출력은 공백 인 경우 인쇄되지 않습니다. 어느 쪽이든, 당신은 read ()를 수행하고 다음 요소로 이동합니다. 비 whitespace의 경우, 당신은 엔 드레임을 잃었습니다.

다른 장관에서는 문제가 훨씬 나쁩니다. AC의 ReadElementContentAsObject (C1 호출)를 수행하면 다음 C (C2)로 이동합니다. 그런 다음 읽기를하고 C3로 이동하고 C2를 잃어 버립니다.

나는 고치려고하지 않을 것이다 실제 코드. 그러나 걱정해야 할 것이 무엇인지 분명히 분명합니다. 스트림을 둘 이상으로 앞으로 움직입니다. 이것은 일반적으로 루핑 오류의 일반적인 원천입니다.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow