I'm updating some inherited legacy code and have found an interesting "approach" to a probelm that I am at a loss to optimize.
There is a set of data that can be somewhat simulated using the following:
approximateEntries = 5e6;
maxEntryValue = 5e12;
maxExtraEvents = 10e3;
event1 = randperm( maxEntryValue, approximateEntries + randi( maxExtraEvents ) );
event1 = sort(event1);
event2 = randperm( maxEntryValue, approximateEntries + randi( maxExtraEvents ) );
event2 = sort(event2);
event3 = randperm( maxEntryValue, approximateEntries + randi( maxExtraEvents ) );
event3 = sort(event3);
event4 = randperm( maxEntryValue, approximateEntries + randi( maxExtraEvents ) );
event4 = sort(event4);
data = [[ones(length(event1),1); 2*ones(length(event2),1); 3*ones(length(event3),1); 4*ones(length(event4),1)], ...
[event1'; event2'; event3'; event4']];
data = sortrows(data,2);
clear approximateEntries;
clear maxEntryValue;
clear maxExtraEvents;
Which is to say that thanks to the way the code before is written, I have access to four arrays, each with about five million elements and one large array containing the four arrays concatenated, sorted and marked in another column with which array the element originally came from. If i can avoid using the single large array in the solution to this problem, it can be eliminated from the code as it is not required later.
We are looking to find different characteristics of near (to within +/- 1000) matches between values across the four "event" matrices. the values may not necessarily be at the same indices but each "event" matrix will be sorted in ascending order. We are looking for:
- Total number of times a value from each "event" matrix is unique, within +/- 1000, to that matrix.
- Total number of times a value from each "event" matrix occurs +/- 1000 in just two matrices, with a seperate total for each possible pair of matrices.
- Total number of times a value from each "event" matrix +/- 1000 occurs in just three matrices, with a sperate total for each triplet of matrices.
- Total number of times a value +/- 1000 occurs in all four matrices
The legacy code attempts to do this in the following incredible way (I know it is not totally correct, that is why it is being updated)
onlyEvent1 = 0;
onlyEvent2 = 0;
onlyEvent3 = 0;
onlyEvent4 = 0;
onlyEvent1Event2 = 0;
onlyEvent1Event3 = 0;
onlyEvent1Event4 = 0;
onlyEvent2Event3 = 0;
onlyEvent2Event4 = 0;
onlyEvent3Event4 = 0;
onlyEvent1Event2Event3 = 0;
onlyEvent1Event2Event4 = 0;
onlyEvent1Event3Event4 = 0;
onlyEvent2Event3Event4 = 0;
allEvents = 0;
maxIndex = length(data);
workingIndex = 1;
fullGate = 2000;
while workingIndex < maxIndex
subIndex = 0;
withinRange = true;
% Prepare a buffer
buffer = zeros(200,2);
while ((workingIndex + subIndex) < maxIndex) && (withinRange == true)
if subIndex > 0
timeDifference = data(workingIndex + subIndex,2) - buffer(subIndex,2);
if timeDifference <= fullGate
buffer(subIndex + 1,:) = data(workingIndex + subIndex,:);
subIndex = subIndex + 1;
else
withinRange = false;
end % if
else
buffer(subIndex + 1,:) = data(workingIndex + subIndex,:);
subIndex = subIndex + 1;
end % if
end % while
event1 = false;
event2 = false;
event3 = false;
event4 = false;
compareIndex = 1;
while (buffer(compareIndex) ~= 0) && (compareIndex < length(buffer))
if buffer(compareIndex,1) == 1
event1 = true;
else
if buffer(compareIndex,1) == 2
event2 = true;
else
if buffer(compareIndex,1) == 3
event3 = true;
else
% Should really only be four
event4 = true;
end % if is 3
end % if is 2
end % if is 1
compareIndex = compareIndex + 1;
end % while buffer
if (event1 == true) && (event2 == true) && (event3 == true) && (event4 == true)
allEvents = allEvents + 1;
else
if (event1 == true) && (event2 == true) && (event3 == true) && (event4 == false)
onlyEvent1Event2Event3 = onlyEvent1Event2Event3 + 1;
else
if (event1 == true) && (event2 == true) && (event3 == false) && (event4 == true)
onlyEvent1Event2Event4 = onlyEvent1Event2Event4 + 1;
else
if (event1 == true) && (event2 == false) && (event3 == true) && (event4 == true)
onlyEvent1Event3Event4 = onlyEvent1Event3Event4 + 1;
else
if (event1 == false) && (event2 == true) && (event3 == true) && (event4 == true)
onlyEvent2Event3Event4 = onlyEvent2Event3Event4 + 1;
else
if (event1 == true) && (event2 == true) && (event3 == false) && (event4 == false)
onlyEvent1Event2 = onlyEvent1Event2 + 1;
else
if (event1 == true) && (event2 == false) && (event3 == true) && (event4 == false)
onlyEvent1Event3 = onlyEvent1Event3 + 1;
else
if (event1 == true) && (event2 == false) && (event3 == false) && (event4 == true)
onlyEvent1Event4 = onlyEvent1Event4 + 1;
else
if (event1 == false) && (event2 == true) && (event3 == true) && (event4 == false)
onlyEvent2Event3 = onlyEvent2Event3 + 1;
else
if (event1 == false) && (event2 == true) && (event3 == false) && (event4 == true)
onlyEvent2Event4 = onlyEvent2Event4 + 1;
else
if (event1 == false) && (event2 == false) && (event3 == true) && (event4 == true)
onlyEvent3Event4 = onlyEvent3Event4 + 1;
else
if (event1 == true) && (event2 == false) && (event3 == false) && (event4 == false)
onlyEvent1 = onlyEvent1 + 1;
else
if (event1 == false) && (event2 == true) && (event3 == false) && (event4 == false)
onlyEvent2 = onlyEvent2 + 1;
else
if (event1 == false) && (event2 == false) && (event3 == true) && (event4 == false)
onlyEvent3 = onlyEvent3 + 1;
else
onlyEvent4 = onlyEvent4 + 1;
end % if 3
end % if 2
end % if 1
end % if 3&4
end % if 2&4
end % if 2&3
end % if 1&4
end % if 1&3
end % if 1&2
end % if 2,3&4
end % if 1,3&4
end % if 1,2&4
end % if 1,2&3
end % if all events
workingIndex = workingIndex + subIndex;
end % while
I came up with the following basis for an approach:
temp=bsxfun(@plus, event1', -1000:1000);
matchOneTwo=sum(ismember(event2,temp));
matchOneThree=sum(ismember(event3,temp));
matchOneFour=sum(ismember(event4,temp));
temp=bsxfun(@plus, event2', -1000:1000);
etc...
but this fails due to lack of memory when generating "temp". Can anyone help with another approach?
[EDIT] To provide a small scale example as requested by Dennis Jaheruddin. This was produced by hand and is not real data. The periodicity was just to help the reader see the events used in comaprisons.
event1=[125, 1500, 5000, 15000, 22349,25000, 35000, 45000, 55000, 60325, 65000, 75000, 85000, 91117, 95000];
event2=[1750, 7000, 17000, 21562, 27000, 37000, 47000, 57000, 60256, 67000, 77000, 87000, 97000];
event3=[1126, 9000, 19000, 29000, 30130, 39000, 49000, 59000, 69000, 79000, 89000, 91560, 99000, 120000];
event4=[1, 1975, 11000, 21000, 31000, 31159, 41000, 51000, 60112, 61000, 71000, 81000, 91000, 91001, 101000, 130000];
Then:
- Match 1, 2, 3, & 4 = 1
- Match 1 & 2 = 1
- Match 3 & 4 = 1
- Match 1, 2, & 4 = 1
- Match 1, 3, & 4 = 1
- All the rest have zero matches.