Question

I understand that scraping the title uses this code scrapes the title "Google Inc (GOOG)" http://finance.yahoo.com/q?s=goog:

    String name = doc.select(".title h2").first().text();

I was wondering how to scrape the title and ticker-symbol separately "Google Inc" and "GOOG":

Yahoo Finance Ticker Symbol GOOG

Was it helpful?

Solution

(1) I have to Scrape Solution:

This is a short answer which doesn't include lines of exception handling, however, it is short and work out of box.

public static void main(String[] args) throws IOException {
            // collect the html and create the doc
    String url = "http://finance.yahoo.com/q?s=goog";
    Document doc = Jsoup.connect(url).get();

            // locate the header, title and then found the h2 tag
    Element header = doc.select("div[id=yfi_rt_quote_summary]").get(0);
    Element title = header.select("div[class=title]").get(0);
    String h2 = title.select("h2").get(0).text();

            // split by open parenthesis (double escape) and strip off the close parenthesis
            // TODO - regular expression help handle situation where exist multiple "()"s
    String[] parts = h2.split("\\(");
    String name = parts[0];
    String shortname = parts[1].replace(")", "");
    System.out.println(name);
    System.out.println(shortname);

}

Output looks like this:

Google Inc. 
GOOG

(2) I don't have to Scrape Solution:

Here is really a nice post showing you how to download yahoo data programmatically.

I am also a R user and it is extremely easy to get Yahoo finance data inside R. You can do the analysis there and save that to file or database if you want. :)

OTHER TIPS

You want to scrape the id's: "yfs_184_goog", yfs_c63_goog" and "yfs_p43_goog".

Those are the big black numbers, the little red/green numbers next to it and the percentage.

"Screen scrape" with Jsoup with element who has ID

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top