Creating open-high-low-close (ohlc) bars from tick data in Matlab

https://stackoverflow.com/questions/17111785

31-05-2022
|

Question

I have a CSV file 'XPQ12.csv' of futures tick data in the following form:

20090312    30:14.0 717.25  1   E
20090312    30:15.0 718.47  1   E
20090312    30:17.0 717.25  1   E
20090312    30:32.0 718.42  1   E
20090312    30:49.0 715.32  1   E
20090312    30:58.0 717.57  1   E
20090312    31:06.0 716.65  3   E
20090312    31:12.0 718.35  2   E
20090312    31:45.0 721.14  1   E
20090312    31:52.0 719.24  1   E
20090312    32:11.0 717.02  6   E
20090312    32:29.0 717.14  1   E
20090312    32:35.0 717.34  1   E
20090312    32:55.0 717.26  1   E

(The first column is the yearmonthdate, the second column is the minute:second:tenthofsecond, the third column is the price, the fourth column is the number of contracts traded, and the fifth indicates if the trade was electronic or in a pit). In my actual data set, I may have thousands of price quotes within any given minute.

I read the file using the following code:

fid = fopen('C:\Program Files\MATLAB\R2013a\XPQ12.csv','r'); 
[c] = fscanf(fid, '%d,%d:%d.%d,%f,%d,%c')

Which outputs:

(the 69s are the matlab representation for E I believe)

Now I want to cut this up into one minute ohlc bars, so that for each minute, I record what the first, highest, lowest, and last price was within that minute. I'd really like to know the best way to go about this.

My original idea was to store the sequence of minutes in a vector d, and while working through the data, each time the number at the end of d changed I would record the corresponding price as an open, record the previous price as a close for the last bar, and find the largest and smallest prices within each open and close.

c(2) is the first minute, so I said:

d(1)=c(2);

and then noting that I'd always be counting by 7 before getting to the next minute, I said:

Nrows = numel(textread('XPQ12.csv','%1c%*[^\n]')); % counts rows in file
for i=1:Nrows
 if mod(i-2,7)== 0; 
     d(end+1)=c(i);
 end
end

which should fill up d with all the minutes:

in the case of the example data. I'm kind of lost what to do from here, or if what I'm doing is on the right track.

Solution

From where you are:

Minutes = c(2:7:end);    
MinuteValues=unique(Minutes);
Prices = c(5:7:end);
if (length(Prices)>length(Minutes))
    Prices=Prices(1:length(Minutes));
elseif (length(Prices)<length(Minutes))
    Minutes=Minutes(1:length(Prices));

OverflowValues=1+find(Minutes(2:end)==0 & Minutes(1:end-1)==59);
for v=length(OverflowValues):-1:1
    Minutes(OverflowValues(v):end)=Minutes(OverflowValues(v):end)+60;
end

Highs=zeros(1,length(MinuteValues));
Lows=zeros(1,length(MinuteValues));
First=zeros(1,length(MinuteValues));
Last=zeros(1,length(MinuteValues));
for v=1:length(MinuteValues)
    Highs(v) = max(Prices(Minutes==MinuteValues(v)));
    Lows(v) = min(Prices(Minutes==MinuteValues(v)));
    First(v) = Prices(find(Minutes==MinuteVales(v),1,'first'));
    Last(v) = Prices(find(Minutes==MinuteVales(v),1,'last'));
end

Using textread would make this easier for you, as mentioned.

(If you are lost at this stage, I wouldn't find accumarray as mentioned in the comments is the best place to start!)

By the way, this is assuming that minutes increases above 60 and you don't have hours in there somewhere. Otherwise this won't work at all.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow