I am learning "large" data set calculations using Matlab. I have a txt file consisting of every trade made for a stock called MTB. My goal is to turn this tick data into daily data.
For example, on the first day, over 15,000 transactions took place, my prgm turn that data into the open, high, low, close, total volume, and net transaction for each day.
My questions:
Can you help me make the code faster?
Do you have any practical "techniques" to verify the calculations since they are made on such large data set?
It took my pgm: 20.7757 seconds
and I go the following warning. I don't really know what it means
Warning: 'rows' flag is ignored for cell arrays.
In cell.unique at 32
In ex5 at 16
Warning: 'rows' flag is ignored for cell arrays.
In cell.unique at 32
In ex5 at 17
%DESCRIPTION: Turn tick data into daily data
%INPUT: stock tick data(tradeDate,tradeTime,open,high,low,
%close,upVol,downVol)
%OUTPUT: openDay,highDay,lowDay,closeDay,volumeDay,netTransaction
%net transaction taded = sum (price*upVol -price*downVol)
clear;
startTime=tic;
%load data from MTB_db2
[tradeDate, tradeTime,open,high,low,close,upVol,downVol]=textread('MTB_db2.txt','%s %u %f %f %f %f %f %f','delimiter',',');
%begIdx:Index the first trade for the day from tick database and
%endIdx:index for the last trade for that day
[dailyDate begIdx]=unique(tradeDate,'rows','first');
[dailyDate2 endIdx]=unique(tradeDate,'rows','last');
%the number of daily elements, useful for the loop.
n=numel(dailyDate);
%initilize arrays
highDay=[];
lowDay=[];openDay=[];closeDay=[];
volumeDay=[];netTransaction=[];
priceChange(1)=NaN; mfChange(1)=NaN;
%loop: bottleneck is here!!
for j=1:n
openDay(j)=open(begIdx(j));
closeDay(j)=close(endIdx(j));
highDay(j)=max(high(begIdx(j):endIdx(j)));
lowDay(j)=min(low(begIdx(j):endIdx(j)));
volumeDay(j)=sum(upVol(begIdx(j):endIdx(j)))+sum(downVol(begIdx(j):endIdx(j)));
cumSum=0;
for i=begIdx(j):endIdx(j)
cumSum=cumSum+close(i)*(upVol(i)-downVol(i));
end
netTransaction(j)=cumSum;
end
elapsedTimeNonVectorized=toc(startTime)