Question

I have to write a function to replace the characters of a string with those letters.

    A=U
T=A
G=C
C=G

Example:

     Input: 'ATAGTACCGGTTA'

Therefore, the output should be:

    'UAUCAUGGCCAAU'

I can replace only one character. However, I have no how to do several. I could replace several if '"G=C and C=G" this condition was not there. I use:

    in='ATAGTACCGGTTA'
check=in=='A'
in(check)='U'
ans='UTUGTUCCGGTTU'

if I keep doing this at some point G will be replaced by C then then all the C will be replaced by G. How can I stop this?? Any help will be appreciated.

Était-ce utile?

La solution 2

The simplest way would be to use an intermediary letter. For instance:

in='ATAGTACCGGTTA'
in(in == 'A')='U'
in(in == 'T')='A'
in(in == 'C')='X'
in(in == 'G')='C'
in(in == 'X')='G'

This way you keep the 'C' and 'G' characters separate.

EDIT:

As others have mentioned, there are a few things other things you could do to improve this approach (though personally I think Notlikethat's way is cleanest). For instance, if you use a second variable, you don't have to worry about keeping 'C' and 'G' separate:

in='ATAGTACCGGTTA'
out=in;
out(in == 'A')='U';
out(in == 'T')='A';
out(in == 'C')='G';
out(in == 'G')='C';

Alternatively, you could make your indices first, then index after:

in='ATAGTACCGGTTA'
inA=in=='A';
inT=in=='T';
inC=in=='C';
inG=in=='G';
in(inA)='U';
in(inT)='A';
in(inC)='G';
in(inG)='C';

Finally, my personal favourite for sheer idiocy:

out=char(in+floor((68-in).*(in<70)*7/4)*4-round(ceil((in-67)/4)*3.7));

(Seriously, that last one works)

Autres conseils

Just for fun, here's probably the absolute simplest way, via indexing:

key = 'UGCA';
[~, ~, idx] = unique(in);
out = key(idx');   % transpose idx since unique() returns a column vector

I do love indexing :D

Edit: As rightly pointed out, this is very optimised for the question as stated. Since [a, ~, idx] = unique(in); returns a and idx such that a(idx) == in, and by default a is sorted, we can just assume that a == 'ACGT' and pre-construct key to be the appropriate translation of indices into a.

If some characters from the known alphabet never appear in the input string, or if other unknown characters appear, then the indices don't match and the assumption breaks. In that case, we have to calculate the appropriate key explicitly - filling in the step that was optimised out above:

alph = 'ACGT';
trans = 'UGCA';
[key, ~, idx] = unique(in);
[~, alphidx, keyidx] = intersect(alph, key);  % find which elements of alph
                                              % appear at which points in key
key(keyidx) = trans(alphidx);   % translate the elements of key that we can
out = key(idx');

You can perform multiple character translation with bsxfun.

Inputs:

in = 'ATAGTACCGGTTA';
pat = ['A','T','G','C'];
subst = ['U','A','C','G'];
out0  ='UAUCAUGGCCAAU';

Translate all characters simultaneously:

>> ii = (1:numel(pat))*bsxfun(@eq,in,pat.'); %' instead of repmat and .*
>> out = subst(ii)
out =
UAUCAUGGCCAAU
>> isequal(out,out0)
ans =
     1

Say you only want to translate a subset of the characters, leaving part of the sequence intact, it is easily solved with logical indexing and a few extra lines:

% Leave the Gs and Cs in place
pat = ['A','T'];
subst = ['U','A'];

ii = (1:numel(pat))*bsxfun(@eq,in,pat.'); %' same
out = char(zeros(1,numel(in)));
nz = ii>0;
out(nz) = subst(ii(nz));
out(~nz) = in(~nz)

out =

UAUGAUCCGGAAU

The original Gs and Cs are unchanged; A became U, and T became A (T is gone).

I would suggest to use containter.Map:

m=containers.Map({'A','T','G','C'},{'U','A','C','G'})
mapfkt=@(input)(cell2mat(m.values(num2cell(input))))

Usage:

mapfkt('ATAGTACCGGTTA')

Here is another method that should be fairly efficient, general, and in the line of thought of your original attempt:

%Suppose this is your input
myString = 'abcdeabcde';
fromSting = 'ace';
toString = 'xyz';

%Then it just takes this:
[idx fromLocation] = ismember(myString,fromSting)
myString(idx)=toString(fromLocation(idx))

If you know that all letters need to be replaced, the last line can be slightly simplified as you wont need to use idx.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top