Question

I am new at Matlab and I am currently working with financial data exporting from financial times website. I would like to know how can I get, for example, share price forecast information from this page

http://markets.ft.com/research/Markets/Tearsheets/Forecasts?s=DIS:NYQ

High    +34.7 % 85.00
Med     +15.7 % 73.00
Low      -9.6 % 57.00

And save this information as a variables.

Was it helpful?

Solution

Here's a simple solution using urlread and regexpi:

% Create URL string and read in HTML
ftbaseurl = 'http://markets.ft.com/research/Markets/Tearsheets/Forecasts?s=';
ticksym = 'DIS:NYQ';
s = urlread([ftbaseurl ticksym]);

% Create pattern string for regular expression matching
trspan = '<tr><td class="text"><span class="';
tdspan1 = '</span></td><td><span class="\w\w\w color ">'; % \w\w\w matchs pos or neg
matchstr1 = '(?<percent>[\+|\-]*\d+.\d+)'; % percent: match (+or-)(1+ digits).(1+ digits)
tdspan2 = ' %</span></td><td>';
matchstr2 = '(?<price>\d+\.\d\d)</td></tr>'; % price: match (1+ digits) . 2 digits
pat = [trspan 'high">High' tdspan1 matchstr1 tdspan2 matchstr2 '|' ...
       trspan 'med">Med' tdspan1 matchstr1 tdspan2 matchstr2 '|' ...
       trspan 'low">Low' tdspan1 matchstr1 tdspan2 matchstr2];

% Match patterns in HTML, case insensitive, put results in struct array
forecasts = regexpi(s,pat,'names');

The result is a 1-by-3 struct array where each element has two fields, 'percent' and 'price', that each contain strings extracted by the regular expression parser. For example

>> forecasts(3)

ans = percent: '-10.3'
        price: '57.00'

>> str2double(forecasts(3).percent)

-10.3000

I'll leave it to you to convert the strings to numbers (note that financial software usually stores prices in integer cents (or what ever the lowest denomination is) rather than floating point dollars to avoid numerical issues) and to turn this into a general function. Here's some more information on regular expressions in Matlab.

My comment above still stands. This is very inefficient. You're downloading the entire webpage HTML and parsing it in order to find a few small bits of data. This is fine if this doesn't update very often or if you don't need it to be very fast. Also, this scheme is fragile. If the Financial Times updates their website, it may break the code. And if you try downloading their regular webpages very often they may also have means of blocking you.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top