Question

This is the code i tried but always get error during parsing

Elements sezioni = tabella.getElementsByClass("grid grid-pad");
                                for(Element sezione : sezioni)
                                {

                                        Elements righe_sezione = sezione.getElementsByClass("col-1-1");                                       
                                        for(Element riga : righe_sezione)
                                        {

                                                Element info = riga.getElementsByClass("text").first();


                                                String titolo = riga.getElementsByTag("h2").first().text();


                                                String date= info.getElementsByClass("date").first().text();


                                                titoli.add(titolo);
                                                data.add(date);
                                        }
                                }

And i can't parsing the title and the date of articles..what's wrong? the page to parse is http://multiplayer.it/articoli/

thanks

Was it helpful?

Solution

jsoup uses css based selectors, hence your grid grid-pad would translate to grid-pad itself.

Now there seems to be a flaw in the code where grid-pad and col-1-1 do not cover the repetitive divs for the textual content you are looking for.

However, changing your selector to text and archive_box seem to do the trick as all of the content is within this div with css selector archive_box.

Hence your code skips 1 loop and just loops over text, like this:

try {
            Document doc = Jsoup.connect("http://multiplayer.it/articoli/")
                    .get();
            Elements sezioni = doc.getElementsByClass("archive_box");
            for (Element riga : sezioni) {

                Element info = riga.getElementsByClass("text").first();

                String titolo = riga.getElementsByTag("h2").first().text();
                System.out.println(titolo);

                String date = info.getElementsByClass("date").first().text();
                System.out.println(date);

            }

        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

The output would be:

Carmageddon è gratuito per un giorno su App Store e Google Play
circa un'ora fa
Paradox Interactive annuncia Runemaster
circa un'ora fa
Annunciato Hearts of Iron IV
circa un'ora fa
...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top