문제

This might sound a bit complicated, but what I want to do is find all <a>s that contain <img>s such that the images that are in the same node with the greatest number of other images are chosen first.

For example, if my page looks like this:

http://img684.imageshack.us/img684/5678/imagechart.gif

If the blue squares are <div>s and the pink squares are <img>s then the middle div contains the most images, then those images are chosen first. Since they aren't nested any deeper than that, they are just appear in the order that they are on the page. Next the first div is chosen (contains the 2nd most images), and so forth... does that make sense?

We can think of it sort of recursively. First the body would be chosen since that will always contain the most images, then each of the direct children are examined to see which contains the most image descendants (not necessarily direct), then we go into that node, and repeat...

도움이 되었습니까?

해결책 3

Current solution:

    private static int Count(HtmlNodeCollection nc) {
        return nc == null ? 0 : nc.Count;
    }

    private static void BuildList(HtmlNode node, ref List<HtmlNode> list) {
        var sortedNodes = from n in node.ChildNodes
                          orderby Count(n.SelectNodes(".//a[@href and img]")) descending
                          select n;
        foreach (var n in sortedNodes) {
            if (n.Name == "a") list.Add(n);
            else if (n.HasChildNodes) BuildList(n, ref list);
        }
    }

Example usage:

    private static void ProcessDocument(HtmlDocument doc, Uri baseUri) {
        var linkNodes = new List<HtmlNode>(100);
        BuildList(doc.DocumentNode, ref linkNodes);
        // ...

It's a bit inefficient though because it does a lot of recounting, but oh well.

다른 팁

You could try looking at the count of images for every node.

    public static XmlNode FindNodeWithMostImages(XmlNodeList

nodes) {

        var greatestImageCount = 0;
        XmlNode nodeWithMostImages = null;

        foreach (XmlNode node in nodes)
        {
            var currentNode = node;
            var currentNodeImageCount = node.SelectNodes("*/child::img").Count;

            if (currentNodeImageCount > greatestImageCount)
            {
                greatestImageCount = currentNodeImageCount;
                nodeWithMostImages = node;
            }
        }

        return nodeWithMostImages;
    }

XPATH 1.0 does not provide the ability to sort a collection. You will need to leverage XPATH with something else.

Here is an example XSLT solution that will find all elements that contain descendant <img> elements, and then sorts them by the count of their descendant <img> elements in descending order.

    <xsl:template match="/">
        <!--if only want <a>, then select //a[descendant::img] -->
        <xsl:for-each select="//*[descendant::img]">
            <xsl:sort select="count(descendant::img)" order="descending" />

                <!--Example output to demonstrate what elements have been selected-->
                <xsl:value-of select="name()"/><xsl:text> has </xsl:text>
                <xsl:value-of select="count(.//img)" />  
                <xsl:text> descendant images                     
                </xsl:text>

        </xsl:for-each>

    </xsl:template>

</xsl:stylesheet>

I wasn't clear from your question and examples whether you want to find any element with descendant <img> or just <a> with descendant <img>.

If you wanted to just find <a> elements with descendant <img> elements, then adjust the XPATH in the for-each to select: //a[descendant::img]

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top