Question

I'm updating some inherited legacy code and have found an interesting "approach" to a probelm that I am at a loss to optimize.

There is a set of data that can be somewhat simulated using the following:

approximateEntries = 5e6;
maxEntryValue = 5e12;
maxExtraEvents = 10e3;

event1 = randperm( maxEntryValue, approximateEntries + randi( maxExtraEvents ) );
event1 = sort(event1);
event2 = randperm( maxEntryValue, approximateEntries + randi( maxExtraEvents ) );
event2 = sort(event2);
event3 = randperm( maxEntryValue, approximateEntries + randi( maxExtraEvents ) );
event3 = sort(event3);
event4 = randperm( maxEntryValue, approximateEntries + randi( maxExtraEvents ) );
event4 = sort(event4);

data = [[ones(length(event1),1); 2*ones(length(event2),1); 3*ones(length(event3),1); 4*ones(length(event4),1)], ...
       [event1'; event2'; event3'; event4']];
data = sortrows(data,2);

clear approximateEntries;
clear maxEntryValue;
clear maxExtraEvents;

Which is to say that thanks to the way the code before is written, I have access to four arrays, each with about five million elements and one large array containing the four arrays concatenated, sorted and marked in another column with which array the element originally came from. If i can avoid using the single large array in the solution to this problem, it can be eliminated from the code as it is not required later.

We are looking to find different characteristics of near (to within +/- 1000) matches between values across the four "event" matrices. the values may not necessarily be at the same indices but each "event" matrix will be sorted in ascending order. We are looking for:

  • Total number of times a value from each "event" matrix is unique, within +/- 1000, to that matrix.
  • Total number of times a value from each "event" matrix occurs +/- 1000 in just two matrices, with a seperate total for each possible pair of matrices.
  • Total number of times a value from each "event" matrix +/- 1000 occurs in just three matrices, with a sperate total for each triplet of matrices.
  • Total number of times a value +/- 1000 occurs in all four matrices

The legacy code attempts to do this in the following incredible way (I know it is not totally correct, that is why it is being updated)

onlyEvent1 = 0;
onlyEvent2 = 0;
onlyEvent3 = 0;
onlyEvent4 = 0;
onlyEvent1Event2 = 0;
onlyEvent1Event3 = 0;
onlyEvent1Event4 = 0;
onlyEvent2Event3 = 0;
onlyEvent2Event4 = 0;
onlyEvent3Event4 = 0;
onlyEvent1Event2Event3 = 0;
onlyEvent1Event2Event4 = 0;
onlyEvent1Event3Event4 = 0;
onlyEvent2Event3Event4 = 0;
allEvents = 0;

maxIndex = length(data);
workingIndex = 1;

fullGate = 2000;

while workingIndex < maxIndex 
    subIndex = 0;

    withinRange = true;

    % Prepare a buffer
    buffer = zeros(200,2);

    while ((workingIndex + subIndex) < maxIndex) && (withinRange == true)

        if subIndex > 0

            timeDifference = data(workingIndex + subIndex,2) - buffer(subIndex,2);

            if timeDifference <= fullGate
                buffer(subIndex + 1,:) = data(workingIndex + subIndex,:);
                subIndex = subIndex + 1; 
            else
                withinRange = false;
            end % if

        else

            buffer(subIndex + 1,:) = data(workingIndex + subIndex,:);
            subIndex = subIndex + 1; 

        end % if      

    end % while

        event1 = false;
        event2 = false;
        event3 = false;
        event4 = false;

        compareIndex = 1;

        while (buffer(compareIndex) ~= 0) && (compareIndex < length(buffer))
            if buffer(compareIndex,1) == 1
                event1 = true; 
            else
                if buffer(compareIndex,1) == 2
                    event2 = true; 
                else
                    if buffer(compareIndex,1) == 3
                        event3 = true; 
                    else
                        % Should really only be four
                       event4 = true; 
                    end % if is 3
                end % if is 2
            end % if is 1
            compareIndex = compareIndex + 1;
        end % while buffer

        if (event1 == true) && (event2 == true) && (event3 == true) && (event4 == true)
            allEvents = allEvents + 1;
        else
            if (event1 == true) && (event2 == true) && (event3 == true) && (event4 == false)
                onlyEvent1Event2Event3 = onlyEvent1Event2Event3 + 1;
            else
                if (event1 == true) && (event2 == true) && (event3 == false) && (event4 == true)
                    onlyEvent1Event2Event4 = onlyEvent1Event2Event4 + 1;
                else
                    if (event1 == true) && (event2 == false) && (event3 == true) && (event4 == true)
                        onlyEvent1Event3Event4 = onlyEvent1Event3Event4 + 1;
                    else
                        if (event1 == false) && (event2 == true) && (event3 == true) && (event4 == true)
                            onlyEvent2Event3Event4 = onlyEvent2Event3Event4 + 1;
                        else
                            if (event1 == true) && (event2 == true) && (event3 == false) && (event4 == false)
                                onlyEvent1Event2 = onlyEvent1Event2 + 1;
                            else
                                if (event1 == true) && (event2 == false) && (event3 == true) && (event4 == false)
                                    onlyEvent1Event3 = onlyEvent1Event3 + 1;
                                else
                                    if (event1 == true) && (event2 == false) && (event3 == false) && (event4 == true)
                                        onlyEvent1Event4 = onlyEvent1Event4 + 1;
                                    else
                                        if (event1 == false) && (event2 == true) && (event3 == true) && (event4 == false)
                                            onlyEvent2Event3 = onlyEvent2Event3 + 1;
                                        else
                                            if (event1 == false) && (event2 == true) && (event3 == false) && (event4 == true)
                                                onlyEvent2Event4 = onlyEvent2Event4 + 1;
                                            else
                                                if (event1 == false) && (event2 == false) && (event3 == true) && (event4 == true)
                                                    onlyEvent3Event4 = onlyEvent3Event4 + 1;
                                                else
                                                    if (event1 == true) && (event2 == false) && (event3 == false) && (event4 == false)
                                                        onlyEvent1 = onlyEvent1 + 1;
                                                    else
                                                        if (event1 == false) && (event2 == true) && (event3 == false) && (event4 == false)
                                                            onlyEvent2 = onlyEvent2 + 1;
                                                        else
                                                            if (event1 == false) && (event2 == false) && (event3 == true) && (event4 == false)
                                                                onlyEvent3 = onlyEvent3 + 1;
                                                            else
                                                                onlyEvent4 = onlyEvent4 + 1;
                                                            end % if 3
                                                        end % if 2
                                                    end % if 1
                                                end % if 3&4
                                            end % if 2&4
                                        end % if 2&3
                                    end % if 1&4
                                end % if 1&3
                            end % if 1&2
                        end % if 2,3&4
                    end % if 1,3&4
                end % if 1,2&4
            end % if 1,2&3
        end % if all events

        workingIndex = workingIndex + subIndex;

end % while

I came up with the following basis for an approach:

temp=bsxfun(@plus, event1', -1000:1000);
matchOneTwo=sum(ismember(event2,temp));
matchOneThree=sum(ismember(event3,temp));
matchOneFour=sum(ismember(event4,temp));
temp=bsxfun(@plus, event2', -1000:1000);

etc... but this fails due to lack of memory when generating "temp". Can anyone help with another approach?

[EDIT] To provide a small scale example as requested by Dennis Jaheruddin. This was produced by hand and is not real data. The periodicity was just to help the reader see the events used in comaprisons.

event1=[125, 1500, 5000, 15000, 22349,25000, 35000, 45000, 55000, 60325, 65000, 75000, 85000, 91117, 95000];
event2=[1750, 7000, 17000, 21562, 27000, 37000, 47000, 57000, 60256, 67000, 77000, 87000, 97000];
event3=[1126, 9000, 19000, 29000, 30130, 39000, 49000, 59000, 69000, 79000, 89000, 91560, 99000, 120000];
event4=[1, 1975, 11000, 21000, 31000, 31159, 41000, 51000, 60112, 61000, 71000, 81000, 91000, 91001, 101000, 130000];

Then:

  • Match 1, 2, 3, & 4 = 1
  • Match 1 & 2 = 1
  • Match 3 & 4 = 1
  • Match 1, 2, & 4 = 1
  • Match 1, 3, & 4 = 1
  • All the rest have zero matches.
Was it helpful?

Solution

Based on the question and comment I think the key is the ismemberf File Exchange submission

It allows you to check which values of one set occur in another set with a certain tolerance. For example:

% How many elements from event1 are close to an element of event2?
% 3 elements
sum(ismemberf(event1,event2,'tol',1000)) 
% At how many positions are both event1 and event2 are close to an element in event4?
% At exactly 1 position
sum(ismemberf(event1(1:13),event4,'tol',1000)&ismemberf(event2,event4,'tol',1000)) 

Probably this is not all that you need, but it should be easy to build up from here.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top