문제

I am continuing work on a project that I've been at for some time now, and I have been struggling to pull some data from a website. The website has an iframe that pulls in some data from an unknown source. The data is in the iframe in a tag something like this:

<DIV id="number_forecast"><LABEL id="lblDay">9,000</LABEL></DIV>

There is a BUNCH of other crap above it but this div id / label is totally unique and is not used anywhere else in the code.

도움이 되었습니까?

해결책

jsoup is probably what you want, it excels at extracting data from an HTML document.

There are many examples available showing how to use the API: http://jsoup.org/cookbook/extracting-data/selector-syntax

The process will be in two steps:

  • parse the page and find the url of the iframe
  • parse the content of the iframe and extract the information you need

The code would look like this:

 // let's find the iframe
 Document document = Jsoup.parse(inputstream, "iso-8859-1", url);
 Elements elements = document.select("iframe");
 Element iframe = elements.first();

 // now load the iframe
 URL iframeUrl = new URL(iframe.absUrl("src"));
 document = Jsoup.parse(iframeUrl, 15000);

 // extract the div
 Element div = document.getElementById("number_forecast");

다른 팁

In you page that contains iframe change source of youe iframe to your own url. This url will be processed with your ouw controller, that will read content, parse it, extract all you need and write to response. If there is absolute references in your iframe this should work.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top