Frage

I have a basic knowledge of python (completed one class) and I'm unsure of how to tackle this next script. I have two files, one is a newick tree - looks like this, but much larger:

(((1:0.01671793,2:0.01627631):0.00455274,(3:0.02781576,4:0.05606947):0.02619237):0.08529440,5:0.16755623);

The second file is a tab delimited text file that looks like this but is much larger:

1 \t Human
2 \t Chimp
3 \t Mouse
4 \t Rat
5 \t Fish

I want to replace the sequence ID numbers (only those followed by colons) in the newick file with the species names in the text file to create

(((Human:0.01671793,Chimp:0.01627631):0.00455274,(Mouse:0.02781576,Rat:0.05606947):0.02619237):0.08529440,Fish:0.16755623);

My pseudocode (after opening both files) would look something like

for line in txtfile:
    if line[0] matches \(\d*\ in newick:
        replace that \d* with line[2]

Any suggestions would be greatly appreciated!

War es hilfreich?

Lösung

this can be done by defining a callback function that is run on every match of the regexp \(\d*:.

here's an (unrelated) example from https://docs.python.org/2/library/re.html#text-munging that illustrates how the callback function is used together with re.sub() that performs the regexp substitution:

>>> def repl(m):
...   inner_word = list(m.group(2))
...   random.shuffle(inner_word)
...   return m.group(1) + "".join(inner_word) + m.group(3)
>>> text = "Professor Abdolmalek, please report your absences promptly."
>>> re.sub(r"(\w)(\w+)(\w)", repl, text)
'Poefsrosr Aealmlobdk, pslaee reorpt your abnseces plmrptoy.'
>>> re.sub(r"(\w)(\w+)(\w)", repl, text)
'Pofsroser Aodlambelk, plasee reoprt yuor asnebces potlmrpy.'

Andere Tipps

You can also do it using findall:

import re

s = "(((1:0.01671793,2:0.01627631):0.00455274,(3:0.02781576,4:0.05606947):0.02619237):0.08529440,5:0.16755623)"

rep = {'1':'Human',
'2':'Chimp',
'3':'Mouse',
'4':'Rat',
'5':'Fish'}

for i in re.findall(r'(\d+:)', s):
    s = s.replace(i, rep[i[:-1]]+':')

>>> print s
(((Human:0.01671793,Chimp:0.01627631):0.00455274,(Mouse:0.02781576,Rat:0.05606947):0.02619237):0.08529440,Fish:0.16755623)
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top