Question

I am writing a program which reads in a number of DNA characters (which is always divisible by 3) and checks if they correspond to the same amino acid. For example AAT and AAC both correspond to N so my program should print "It's the same". It does this fine but i just don't know how to compare 6/9/12/any multiple of 3 and see if the definitions are the same. For example:

AAAAACAAG
AAAAACAAA 

Should return me It's the same as they are both KNK.

This is my code:

sequence = {}
d = 0
for line in open('codon_amino.txt'):
  pattern, character = line.split()
  sequence[pattern] = character
a = input('Enter original DNA: ')
b = input('Enter patient DNA: ')
for i in range(len(a)):
  if sequence[a] == sequence[b]:
    d = d + 0
  else:
    d = d + 1
if d == 0:
  print('It\'s the same')
else:
  print('Mutated!')

And the structure of my codon_amino.txt is structured like:

AAA K
AAC N
AAG K
AAT N
ACA T
ACC T
ACG T
ACT T

How do i compare the DNA structures in patters of 3? I have it working for strings which are 3 letters long but it returns an error for anything more.

EDIT:

If i knew how to split a and b into a list which was in intervals of three that might help so like:

a2 = a.split(SPLITINTOINTERVALSOFTHREE)

then i could easily use a for loop to iterate through them, but how do i split them in the first place?

EDIT: THE SOLUTION:

sequence = {}
d = 0
for line in open('codon_amino.txt'):
  pattern, character = line.split()
  sequence[pattern] = character
a = input('Enter original DNA: ')
b = input('Enter patient DNA: ')
for i in range(len(a)):
  if all(sequence[a[i:i+3]] == sequence[b[i:i+3]] for i in range(0, len(a), 3)):
    d = d + 1
  else:
    d = d + 0
if d == 0:
  print('The patient\'s amino acid sequence is mutated.')
else:
  print('The patient\'s amino acid sequence is not mutated.')
Was it helpful?

Solution

I think you can replace your second loop and comparisons with:

if all(sequence[a[i:i+3]] == sequence[b[i:i+3]] for i in range(0, len(a), 3)):
    print('It\'s the same')
else:
    print('Mutated!')

The all function iterates over the generator expression, and will be False if any of the values is False. The generator expression compares length-three slices of the strings.

OTHER TIPS

I think what you should do is :

  • write a function to split a string into chunks a 3 characters. (Some hints here)
  • write a function to convert a string into it's corresponding amino acid sequence (using previous function)
  • compare the sequences.

If this is what you mean:

def DNA(string): 
    return [string[i:i+3] for i in xrange(0,len(string),3)]

amino_1 = DNA("AAAAACAAG")
amino_2 = DNA("AAAAACAAA")

print amino_1, amino_2
print amino_1 == amino_2

Output: ['AAA', 'AAC', 'AAG'] ['AAA', 'AAC', 'AAA']
False
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top