سؤال

I have a list of list of lists like this

matches = [[['rootrank', 'Root'], ['domain', 'Bacteria'], ['phylum', 'Firmicutes'], ['class', 'Clostridia'], ['order', 'Clostridiales'], ['family', 'Lachnospiraceae'], ['genus', 'Lachnospira']], 
           [['rootrank', 'Root'], ['domain', 'Bacteria'], ['phylum', '"Proteobacteria"'], ['class', 'Gammaproteobacteria'], ['order', '"Vibrionales"'], ['family', 'Vibrionaceae'], ['genus', 'Catenococcus']], 
           [['rootrank', 'Root'], ['domain', 'Archaea'], ['phylum', '"Euryarchaeota"'], ['class', '"Methanomicrobia"'], ['order', 'Methanomicrobiales'], ['family', 'Methanomicrobiaceae'], ['genus', 'Methanoplanus']]]

And I want to construct a phylogenetic tree from them. I wrote a node class like so (based partially on this code):

class Node(object):
    """Generic n-ary tree node object
    Children are additive; no provision for deleting them."""

    def __init__(self, parent, category=None, name=None):
        self.parent = parent
        self.category = category
        self.name = name
        self.childList = []

        if  parent is None:
            self.birthOrder  =  0
        else:
            self.birthOrder  =  len(parent.childList)
            parent.childList.append(self)

    def fullPath(self):
        """Returns a list of children from root to self"""
        result  =  []
        parent  =  self.parent
        kid     =  self

        while parent:
            result.insert(0, kid)
            parent, kid  =  parent.parent, parent

        return result

    def ID(self):
        return '{0}|{1}'.format(self.category, self.name)

And then I try to construct my tree like this:

node = None
for match in matches:
    for branch in match:
        category, name = branch
        node = Node(node, category, name)
        print [n.ID() for n in node.fullPath()]

This works for the first match, but when I start with the second match it is appended at the end of the tree instead of starting again at the top. How would I do that? I tried some variations on searching for the ID, but I can't get it to work.

هل كانت مفيدة؟

المحلول 2

The issue is that node is always the bottommost node in the tree, and you are always appending to that node. You need to store the root node. Since ['rootrank', 'Root'] appears at the beginning of each of the lists, I'd recommend pulling that out and using it as the root. So you can do something like:

rootnode = Node(None, 'rootrank', 'Root')
for match in matches:
    node = rootnode
    for branch in match:
        category, name = branch
        node = Node(node, category, name)
        print [n.ID() for n in node.fullPath()]

This will make the matches list more readable, and gives the expected output.

نصائح أخرى

I would highly recommend using a phylogenetics library like Dendropy.

The 'standard way of writing phylogenetic trees is with the Newick format (parenthetical statements like ((A,B),C)). If you use Dendropy, reading that tree would be as simple as

>>> import dendropy
>>> tree1 = dendropy.Tree.get_from_string("((A,B),(C,D))", schema="newick")

or to read from a stream

>>> tree1 = dendropy.Tree(stream=open("mle.tre"), schema="newick")

The creator of the library maintains a nice tutorial too.

Do yourself a favor and don't reinvent the wheel. Python-graph (a.k.a. pygraph) does all that you ask here and most of the things that you'll ask next.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top