Question

I noticed that Google removed the Finance API for Google App Engine. All I want is a list of stock tickers that they have in their Google Finance portfolio. Is there any way to still pull this data from the end user's portfolio, given that the API has been removed? I'm trying to manually retrieve it given that I know the login and password (e.g., it's my own).

Is there any way to retrieve it manually through curl, by logging in to the Google services? It seems like it should be possible to log in and go to my portfolio page, retrieving the source.

I have tried the following code:

#!/bin/bash

function ClientLogin() {
  read -p 'Email> ' email
  read -p 'Password> ' -s password
  local service=$1
  curl -s -d Email=$email -d Passwd=$password -d service=$service https://www.google.com/accounts/ClientLogin | tr ' ' \n | grep Auth= | sed -e 's/Auth=//'
}

function GetFinance() {
  curl -L -s -H "Authorization: GoogleLogin auth=$(ClientLogin finance)" "http://www.google.com/finance/portfolio?action=view&pid=1" &> output.html
}

GetFinance

However, this code only retrieves a page that tells me to log in. The solution does not need to use curl, but it must be an automated retrieval using some scripting language.


Thanks to x4avier, I learned about casperjs and was able to write a quick script to load the Google services login page, enter the username and password, and then fetch the Google Finance portfolio. I'm sure this would work with any other google service and page. I save the html of the portfolio to portfolio.html. Hopefully this helps someone else also.

var fs = require('fs');
var failed = [];
var links = [
    "https://www.google.com/finance/portfolio?action=view&pid=13"
];

var casper = require('casper').create({
    verbose: true,
    logLevel: 'debug',
    pageSettings: {
         loadImages:  false,         // The WebPage instance used by Casper will
         loadPlugins: false,         // use these settings
         userAgent: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537
    }
});

// print out all the messages in the headless browser context
casper.on('remote.message', function(msg) {
    this.echo('remote message caught: ' + msg);
});

// print out all the messages in the headless browser context
casper.on("page.error", function(msg, trace) {
    this.echo("Page Error: " + msg, "ERROR");
});

var url = 'https://accounts.google.com/ServiceLogin?service=finance';

casper.start(url, function() {
   // search for 'casperjs' from google form
   console.log("page loaded");
   this.test.assertExists('form#gaia_loginform', 'form is found');
   this.fill('form#gaia_loginform', {
        Email: 'youraccount@gmail.com',
        Passwd:  'yourpass'
    }, true);
});

casper.each(links, function(casper, link) {
    this.then(function() {
        this.test.comment("Loading " + link);
        start = new Date();
        this.open(link);
    });
    this.then(function() {
        var message = this.requestUrl + " loaded";
        if (failed.indexOf(this.requestUrl) === -1) {
            this.test.pass(message);
            fs.write('portfolio.html',this.getPageContent(),'w');
        }
    });
});

casper.run();
Was it helpful?

Solution

You should consider using an headless browser like casper.js.

With it you can login to google, go to google finance and get the html of a page or of a particular css selector.

To login you will to use the fill() function, it works like this :

casper.start('http://admin.domain.tld/login/', function() {
    this.fill('form[id="login-form"]', {
        'username': 'chuck',
        'password': 'n0rr1s'
    }, true);
});

casper.run();

Then you can parse the page and the specific content with getHTML(), work as below :

casper.then(function() {
    this.echo(this.getHTML('h1#foobar')); // => 'The text included in the <h1 id=foobar>'
});

CasperJs works with cookies and explore more than one page, it should fit your needs.

Hope it helps :)

OTHER TIPS

What information do you want to retrieve exactly?

It's pretty easy to do that using python urllib and beautifulsoup http://docs.python.org/2/library/urllib2.html http://www.crummy.com/software/BeautifulSoup/bs4/doc/

I've done it myself to post and retrieve messages on different forums website. The only thing that is not cool is that you have to hardcode the id of some elements you want to retrieve.

Here's a sample of what I did for the login part

#!/usr/bin/python

import urllib
import urllib2
import cookielib
import BeautifulSoup

url = "https://accounts.google.com/ServiceLogin?hl=en";
values = {'Email': 'me@mymail.fr', 'Passwd' : '', 'signIn' : 'Sign in', 'PersistentCookie' : 'yes'} # The form data 'name' : 'value'

cookie = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cookie))
data = urllib.urlencode(values)
response = self.opener.open(url, data)
print response

I filled some of the required info for the google login. But when I checked the POST request there was some others values you might need to add too in the values dict.

Here's the POST request I captured:

dsh:5606788993588
hl:en
checkedDomains:youtube
checkConnection:youtube:47:1,youtube:46:1
pstMsg:1
GALX:YU6dyLz2tHE
pstMsg:0
dnConn:
checkConnection:
checkedDomains:youtube
timeStmp:
secTok:
_utf8:☃
bgresponse:!A0LP9ks4H06eS0R0GKgonCCotgIAAAAiUgAAAAkqAOjHBiH2qA-EIczqcDooax5q8bxis...
Email:****@gmail.com
Passwd:mypassword
signIn:Sign in
PersistentCookie:yes
rmShown:1

I guess you will have to parse the login page using Beautifulsoup to get this values before you can actually send the form. I wonder if the casper example given above does that automatically, if it does you'd rather use it and then parse the portfolio page using Beatifulsoup of whatever you want.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top