Question

I am continuing work on a project that I've been at for some time now, and I have been struggling to pull some data from a website. The website has an iframe that pulls in some data from an unknown source. The data is in the iframe in a tag something like this:

<DIV id="number_forecast"><LABEL id="lblDay">9,000</LABEL></DIV>

There is a BUNCH of other crap above it but this div id / label is totally unique and is not used anywhere else in the code.

Was it helpful?

Solution

jsoup is probably what you want, it excels at extracting data from an HTML document.

There are many examples available showing how to use the API: http://jsoup.org/cookbook/extracting-data/selector-syntax

The process will be in two steps:

  • parse the page and find the url of the iframe
  • parse the content of the iframe and extract the information you need

The code would look like this:

 // let's find the iframe
 Document document = Jsoup.parse(inputstream, "iso-8859-1", url);
 Elements elements = document.select("iframe");
 Element iframe = elements.first();

 // now load the iframe
 URL iframeUrl = new URL(iframe.absUrl("src"));
 document = Jsoup.parse(iframeUrl, 15000);

 // extract the div
 Element div = document.getElementById("number_forecast");

OTHER TIPS

In you page that contains iframe change source of youe iframe to your own url. This url will be processed with your ouw controller, that will read content, parse it, extract all you need and write to response. If there is absolute references in your iframe this should work.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top