Pregunta

Hei, I have a cell array, the second column is the times of 'XX->XX', for example:

'AA->AA'    [21]    [4.2084]
'AA->AC'    [15]    [3.0060]
'AA->AG'    [ 9]    [1.8036]
'AA->AT'    [12]    [2.4048]
'AC->CA'    [14]    [2.8056]
'AC->CC'    [16]    [3.2064]
'AC->CG'    [ 5]    [1.0020]
'AC->CT'    [ 3]    [0.6012]
'AG->GA'    [11]    [2.2044]
'AG->GC'    [ 5]    [1.0020]
'AG->GG'    [ 8]    [1.6032]
'AG->GT'    [13]    [2.6052]
'AT->TA'    [10]    [2.0040]
'AT->TC'    [ 8]    [1.6032]
'AT->TG'    [ 2]    [0.4008]
'AT->TT'    [11]    [2.2044]
'CA->AA'    [17]    [3.4068]
'CA->AC'    [ 7]    [1.4028]
'CA->AG'    [ 9]    [1.8036]
'CA->AT'    [11]    [2.2044]
'CC->CA'    [15]    [3.0060]
'CC->CC'    [ 5]    [1.0020]
'CC->CG'    [ 4]    [0.8016]
'CC->CT'    [17]    [3.4068]
'CG->GA'    [ 1]    [0.2004]
'CG->GC'    [ 2]    [0.4008]
'CG->GG'    [ 9]    [1.8036]
'CG->GT'    [ 3]    [0.6012]
'CT->TA'    [ 7]    [1.4028]
'CT->TC'    [ 9]    [1.8036]
'CT->TG'    [ 9]    [1.8036]
'CT->TT'    [ 2]    [0.4008]
'GA->AA'    [10]    [2.0040]
'GA->AC'    [ 4]    [0.8016]
'GA->AG'    [10]    [2.0040]
'GA->AT'    [ 2]    [0.4008]
'GC->CA'    [ 2]    [0.4008]
'GC->CC'    [ 7]    [1.4028]
'GC->CG'    [ 6]    [1.2024]
'GC->CT'    [ 3]    [0.6012]
'GG->GA'    [ 6]    [1.2024]
'GG->GC'    [ 6]    [1.2024]
'GG->GG'    [ 4]    [0.8016]
'GG->GT'    [ 8]    [1.6032]
'GT->TA'    [ 6]    [1.2024]
'GT->TC'    [11]    [2.2044]
'GT->TG'    [ 8]    [1.6032]
'GT->TT'    [ 5]    [1.0020]
'TA->AA'    [ 8]    [1.6032]
'TA->AC'    [13]    [2.6052]
'TA->AG'    [ 9]    [1.8036]
'TA->AT'    [ 6]    [1.2024]
'TC->CA'    [13]    [2.6052]
'TC->CC'    [13]    [2.6052]
'TC->CT'    [ 4]    [0.8016]
'TG->GA'    [ 8]    [1.6032]
'TG->GC'    [ 5]    [1.0020]
'TG->GG'    [ 3]    [0.6012]
'TG->GT'    [ 6]    [1.2024]
'TT->TA'    [13]    [2.6052]
'TT->TC'    [ 2]    [0.4008]
'TT->TG'    [ 3]    [0.6012]
'TT->TT'    [ 5]    [1.0020]

Now, I am trying to calculate the probability: P('AA->AA')=TIMES('AA->AA')/SUM('AA->AA','AA->AC','AA->AG','AA->AT'), in other words, P('AA->AA')=TIMES('AA->AA')/SUM('AA->Anyone'). The same with others. I would like to use a loop to do that, but there is a extremal case in

'TC->CA'    [13]    [2.6052]
'TC->CC'    [13]    [2.6052]
'TC->CT'    [ 4]    [0.8016]

well, it is obviously that the times of 'TC->CG' is 0, which required to be considered also even we already know the probability should be 0. Of course, this extremal case can be happened at any other one like, may be it lack 'TT->TT' sometimes, or 'TC->CT' sometimes. Anyone get an idea how to do that? Thanks.

¿Fue útil?

Solución

Try this -

%%// Get the cell data into data1
data1 = INPUT_DATA;

%%// Get the data from columns separately
col1 = data1(:,1);
tag_data = vertcat(col1{:});

col2 = data1(:,2);
times_data = vertcat(col2{:});

col3 = data1(:,3);
col3_data = vertcat(col3{:});

%%// Get full data for tag, times and column3
char_array = ['A' 'C' 'G' 'T'];
full_tag_data = char_array(combinator(4,3,'p','r'));
full_tag_data = [full_tag_data(:,1:2) repmat('->',[size(full_tag_data,1) 1]) full_tag_data(:,2:3)];

present_rows = ismember(full_tag_data,tag_data,'rows');
full_times_data = double(present_rows);
full_times_data(present_rows) = times_data;

full_col3_data = double(present_rows);
full_col3_data(present_rows) = col3_data;

%%// Get the sum values
full_col3_data_summed = sum(reshape(full_col3_data,4,[]),1);
full_col3_data_summed = reshape(repmat(full_col3_data_summed,[4 1]),[],1);

%%// Store the required values into a cell array out_cell1
out_cell1 = cell(size(present_rows,1),2);
out_cell1(:,1) = cellstr(full_tag_data);
out_cell1(:,2) = num2cell(full_times_data);
out_cell1(:,3) = num2cell(full_col3_data);

%%// The probabilities are added into the cell array as the fourth column
out_cell1(:,4) = num2cell(full_times_data./full_col3_data_summed);

Note: The above code uses a function combinator, which is available here.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top