Question

I have pairs of coding DNA sequences which I wish to perform pairwise codon alignments via Python, I have "half completed" the process.

So far..

  • I retrive pairs of orthologous DNA sequences from genbank using Biopython package.
  • I translate the orthologous pairs into peptide sequences and then align them using EMBOSS Needle program.

I wish to..

  • Transfer the gaps from the peptide sequences into the original DNA sequences.

Question

I would appreciate suggestions for programs/code (called from Python) that can transfer gaps from aligned peptide sequence pairs onto codons of the corresponding nucleotide sequence pairs. Or programs/code that can carry out the pairwise codon alignment from scratch.

enter image description here

Was it helpful?

Solution

All you need to do is split the nucleotide sequence into triplets. Each amino-acid is a triplet, each gap is three gaps. so in pseudo code:

for x in range(0, len(aminoacid)):
    if x != "-":
       print nucleotide[3x:3x+3]
    else:
       print "---"

OTHER TIPS

You can make a mapping of peptides to nucleotides with the addition of your missing character:

codons = str.maketrans({'M' : 'ATG',
                        'R' : 'CGT',
                        ...,
                        '-' : '---'}) # Your missing character

peptide = 'M-R'
result = peptide.translate(codons)

and then translate the full sequence.

I understand you've asked this question three years ago, but this post is the first thing I find with my google search 'codon alignment python'. Therefore, I wanted to respond to this for everyone that might stumble upon this still looking for a library to do this.

You can use the library PyCogent for this.

They explain it well on their website: http://pycogent.org/examples/align_codons_to_protein.html

In the end I made my own Python function, thought I may as well share it.

It takes an aligned peptide sequence with gaps and the corresponding un-aligned nucleotide sequence and gives an aligned nucleotide sequence:

Function

def gapsFromPeptide( peptide_seq, nucleotide_seq ):
    """ Transfers gaps from aligned peptide seq into codon partitioned nucleotide seq (codon alignment) 
          - peptide_seq is an aligned peptide sequence with gaps that need to be transferred to nucleotide seq
          - nucleotide_seq is an un-aligned dna sequence whose codons translate to peptide seq"""
    def chunks(l, n):
        """ Yield successive n-sized chunks from l."""
        for i in xrange(0, len(l), n):
            yield l[i:i+n]
    codons = [codon for codon in chunks(nucleotide_seq,3)]  #splits nucleotides into codons (triplets) 
    gappedCodons = []
    codonCount = 0
    for aa in peptide_seq:  #adds '---' gaps to nucleotide seq corresponding to peptide
        if aa!='-':
            gappedCodons.append(codons[codonCount])
            codonCount += 1
        else:
            gappedCodons.append('---')
    return(''.join(gappedCodons))

Usage

>>> unaligned_dna_seq = 'ATGATGATG'
>>> aligned_peptide_seq = 'M-MM'
>>> aligned_dna_seq = gapsFromPeptide(aligned_peptide_seq, unaligned_dna_seq)
>>> print(aligned_dna_seq)

    ATG---ATGATG
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top