Question

I have a 15-mer nucleotide motif that uses degenerate nucleotide sequences. Example: ATNTTRTCNGGHGCN.

I would search a set of sequences for the occurrence of this motif. However, my other sequences are exact sequences, i.e. they have no ambiguity.

I have tried doing a for loop within the sequences to search for this, but I have not been able to do non-exact searches. The code I use is modeled after the code on the Biopython cookbook.

for pos,seq in m.instances.search(test_seq):
    print pos, seq

I would like to search for all possible exact instances of the non-exact 15-mer. Is there a function available, or would I have to resort to defining my own function for that? (I'm okay doing the latter, just wanted to triple-check with the world that I'm not duplicating someone else's efforts before I go ahead - I have already browsed through what I thought was the relevant parts of the docs.)

Was it helpful?

Solution

Use Biopython's nt_search. It looks for a subsequence in a DNA sequence, expanding ambiguity codes to the possible nucleotides in that position. Example:

>>> from Bio import SeqUtils
>>> pat = "ATNTTRTCNGGHGCN"
>>> SeqUtils.nt_search("CCCCCCCATCTTGTCAGGCGCTCCCCCC", pat)
['AT[GATC]TT[AG]TC[GATC]GG[ACT]GC[GATC]', 7]

It returns a list where the first item is the search pattern, followed by the positions of the matches.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top