HTML 민첩성 팩을 사용할 때 인코딩 오류

https://stackoverflow.com/questions/1082156

22-08-2019
|

문제

이 실제 사이트에서 찾은 일부 코드를 사용하여 HTML 문서를 구문 분석하려고하지만 구문 분석 오류가 계속됩니다.

HtmlAgilityPack.HtmlDocument htmlDoc = new HtmlAgilityPack.HtmlDocument();

        // There are various options, set as needed
        htmlDoc.OptionFixNestedTags = true;

        // filePath is a path to a file containing the html
        htmlDoc.Load(@"C:\Documents and Settings\Mine\My Documents\Random.html");

        // Use:  htmlDoc.LoadXML(xmlString);  to load from a string

        // ParseErrors is an ArrayList containing any errors from the Load statement
        if (htmlDoc.ParseErrors != null && htmlDoc.ParseErrors.Count > 0)
        {
            // Handle any parse errors as required
            MessageBox.Show("Oh no");
        }
        else
        {

            if (htmlDoc.DocumentNode != null)
            {
                HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode("//head");

                if (bodyNode != null)
                {
                    MessageBox.Show("Hello");
                }
            }
        }

모든 도움이 감사하겠습니다 :)

해결책

야생에서, HTML은 부적합하지 않고, 비준수 및 비 검증 일 가능성이 높습니다. XHTML 또는 매우 간단한 HTML 만 파세이러를 채우지 않고 진행됩니다. 나는 HTML 민첩성 팩이 상당히 강력하고 파르세러가 생성 되더라도 대부분의 HTML 소스에서 괜찮은 DOM 트리를 구축 할 것임을 알았습니다. 다른 것을 떨어 뜨리고 다른 블록이 정상적으로 실행하도록하십시오.

DOM 트리를 만들지 않았다면 생성 된 ParseError를 조사해야합니다. 부분 트리 만 만들어지면 노드, 인쇄 또는 메시지 상자를 재귀하여 DOM 트리의 어떤 부분이 구축되었는지 확인하십시오. 당신은 나무 전체가 필요하지 않을 수 있습니다.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow