Question

I want to crawl yahoo and get the top 10 results matching a keyword.

I am using this link to crawl the results

Code I am using for this is:

public static void main(String args[]) throws IOException
{
    try
    {
        Document doc = Jsoup.connect("https://in.search.yahoo.com/search;_ylt=AibrWnqoneznrEAiS9bG0aOuitIF?p=solar+systems&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-405").get();
            for(Element dc : doc.select("div#doc.uh3-p uh3lite"))
        {
            System.out.println("data");
                    for(Element dd : doc.select("div#bd"))
            {
                for(Element results : doc.select("div#results"))
                {
                    for(Element wb : doc.select("div#web"))
                    {
                        Elements data=wb.select("span");
                        if(data.size()>0)
                        {
                               System.out.println(data.get(0).text());
                        }
                    }
                }
            }
        }
    }
    catch(Exception ex)
    {
        System.out.println(ex); 
    }
}

I am getting no results with it. Can anyone help me?

Was it helpful?

Solution

This selector is wrong.

doc.select("div#doc.uh3-p uh3lite")

If you want to select two classes, use the period . before each class name.

doc.select("div#doc.uh3-p.uh3lite")

A space in the selector means something entirely different.

EDIT: Also, you refer back to doc in each of your nested for loop selectors. I assume you mean to be referring to the selected element from the previous for loop.

i.e.

    for(Element dc : doc.select("div#doc.uh3-p uh3lite"))
    {
        System.out.println("data");

        for(Element dd : dc.select("div#bd")) // note doc was changed to dc
        {
            for(Element results : dd.select("div#results")) // note doc was changed to dd
            {
                // etc...

And finally how will you know if you get any results since your print statements are commented out?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top