Вопрос

Lets say that I have 10 datasets, 30 elements each. We can simulate it as:

A = rand(30, 10);

so each dataset is in one column. Now, I want to find set of n datasets which are correlated (or uncorrelated, whatever...).

For n=2 I can simply use R = corr(A) and find out that i.e. columns 1 and 3 show the highest correlation between each other. But what if I want to find set of three, or four correlated (or uncorrelated) datasets between each other? Is there a function for that or do I have to loop it somehow?

Thanks!

Это было полезно?

Решение

You can treat this as a random simulation problem. You pick three (four) datasets and find the largest cross-correlation score, which I define as sum of pairwise correlation score.

max_score = 0;
max_set = [];
max_prev = 0;
counter = 0;
while 1,
    idx = randperm(10);
    idx = idx(1:3); % or 1:4 for case of four
    score = R(idx(1), idx(2)) + R(idx(2), idx(3)) + R(idx(1), idx(3));
    if score > max_score,
        max_score = score;
        max_set = idx;
    end
    counter = counter + 1;
    if mod(counter, 1000) == 0, % every 1000 iteration check convergence
        if max_score - max_prev < 0.0001,
            break;
        end
    end
    max_prev = max_score;
end

Althought it is not a deterministic process, it doesn't take long to converge and give you global optimal.

Другие советы

As a really simple starting point you could take the sum down the columns of R to find the column that is the most correlated with the others. Then, from that column choose two columns that are the most strongly correlated with it. Something like this:

[~, ii] = max(sum(R));
[~, jj] = sort(R(:,ii),'descend');
three_cols = jj(1:3);

Alternatively you could locate the highest correlation value in the matrix, and then search along its column and row to find the next highest value, etc.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top