Scraping data from webpage using Java?

Question 1

You need to use a library that has javascript support. I use HtmlUnit for this which is a great library for replicating browser behavior!

See my modified answer from this question below for a simple example of how to access a page with javascript.

First, check out their web page(http://htmlunit.sourceforge.net/) to get htmlunit up and running. Make sure you use the latest snapshot(2.12 when writing this)

Try these settings to ignore pretty much any obstacle:

WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17);
webClient.getOptions().setRedirectEnabled(true);
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getCookieManager().setCookiesEnabled(true);

Then when fetching your page, make sure you wait for background Javascript before doing anything with the page, like waiting for background javascript.

//Get Page
HtmlPage page1 = webClient.getPage("https://login-url/");

//Wait for background Javascript
webClient.waitForBackgroundJavaScript(10000);

//Get full page _after_ javascript has rendered it fully
System.out.println(page1.asXml());

I hope this basic example will help you!

You can use HtmlUnit to do pretty much anything a browser can do, but programmatically.

Question 2

As far as scraping is concerned, you can scrape the whole page and look for the twitter id(or handle). When I checked the sample page I could not find the handle as such, but in the Twitter icon has the link to user's account. You can use this to get the handle. If you are looking for scraping libraries in Java you can give JSOUP a shot.