JSoup with Wunderground Pollen data

https://stackoverflow.com/questions/23663012

22-07-2023
|

Pergunta

I am currently scraping pollen data from wunderground since their API accessor doesn't offer pollen data, specifically the values attributed to each day.

I've navigated the HTML using Chrome Dev Tools and found the specific line that I want. Using the documentation offered by JSoup, I tried putting in my own custom CSS Selectors, but I am quite lost.

I was wondering if anyone would give me some insight on how to access that particular element.

For example, below is an example of what I have so far.

doc = Jsoup.connect("http://www.wunderground.com/DisplayPollen.asp?Zipcode=19104").get();
Element title = doc.getElementById("td");
Element tagName = doc.tagName("id");
System.out.println(tagName);

enter image description here

Solução

You don't want to use doc.getElementById("td") because <td> is not id attribute, but tag (also getElementById doesn't support CSS query).

What you want is to select first <td> with class levels. You can do it via

Element tag = doc.select("td.levels").first();

Also to get only text which will be generated with this tag (and not entire HTML) use text() method like

System.out.println(tag.text());

Outras dicas

Document doc = Jsoup.connect("http://www.wunderground.com/DisplayPollen.asp?Zipcode=19104").get();

Elements days = doc.select("table.pollen-table").first().select("td.even-four");
for (Element day : days) {
    System.out.println(day.text());
}


Elements levels = doc.select("td.levels");
for (Element level : levels) {
    System.out.println(level.text());
}

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow