Question

For an application I'm working on I need to create lists from nested tuples, representing the data contained in each branch.

For reference the tuples represent a Huffman tree, an example is:

tree = (1.0, (0.5, (0.25, (0.125, 'd'),(0.125, 'c')), (0.25, 'b')), (0.5,'a'))

This was created from a Huffman routine with the following probabilities:

a:0.5, b:0.25, c:0.125, d:0.125

I would like to out put a list which looks like

[['a'],['b','c','d']]

I've tried the following code:

def makeList(tree):
    if len(tree) == 2:
        return [tree[0]]
    else:
        rightlist = []
        leftlist = []
        right = list(tree[1])
        left = list(tree[2])
        for i in range(1, len(right)):
            rightlist.append(right[i])
        for i in range(1, len(left)):
            leftlist.append(left[i])
        return [rightlist, leftlist]

However this returns

[['a'],[(0.25, (0.125, 'd'),(0.125,'c')),(0.25,'b')]

Which isn't quite what I want.

How could I go about modifying my code above to produce the output I want?

EDIT

I have produced some code which given a balanced input:

('a',0.25), ('b', 0.25), ('c', 0.25), ('d',0.25)

produces the output I want:

[['a','b'], ['c','d']]

def makeList(tree):
if len(tree) == 2:
    print("I'm in here")
    return [tree[1]]
else:
    right = tree[1]
    left = tree[2]
    rightlist = []
    leftlist = []

    for i in range(0, len(right)):
        if type(right[i]) == tuple:
            print('right: ' + str(right[i]))
            rightlist.append(right[i][1])

    for i in range(0, len(left)):
        if type(left[i]) == tuple:
            print('left: ' + str(left[i]))
            leftlist.append(left[i][1])

    return [rightlist, leftlist]

However, it fails on the following inputs (output below):

exampleData = [(0.5, 'a'), (0.5,'b')]

[[],[[]]

exampleData = [(0.5, 'a'), (0.25,'b'), (0.25,'c')]

[[],['b'.'c']]

exampleData = [(0.5,'a'), (0.25,'b'), (0.125,'c'), (0.125,'d')]

[[]],['b',(0.125, 'd')]]

However, the gold-standard test that this needs to pass is creating these lists for random trees:

probs = np.random.dirichlet([1]*4).tolist()
indices = range(0,4)
exampleData = zip(probs, indices)
huffTree = makeHuffmanTree(exampleData)
groups = makeLists(groups)
Était-ce utile?

La solution 2

Given you have the tree already, with up to two branches:

import Queue

def leaves(tree):
    result = []
    queue = Queue.Queue()
    queue.put(tree)
    while not queue.empty():
        node = queue.get()
        if type(node[1]) == tuple:
            for subnode in node[1:]:
                queue.put(subnode)
        else:
            result.append(node[1])
    return result

def makeList(tree):
    if len(tree) == 2:
        return [tree[1]]

    left = tree[1]
    right = tree[2]
    return [leaves(left), leaves(right)]

This takes the two branches and grabs the leaves of each one, discarding the first half of each leaf. It does it using a breadth-first search to avoid the recursion problem.

I couldn't convert the exampleData lists into trees in order to test them, but it works with the first problem.

Autres conseils

I have a recursive solution.

def makeListAndFlatten(tree):
    treeList = makeList(tree)
    branchA = treeList[0]
    branchB = treeList[1]
    flatA = flatten(branchA)
    flatB = flatten(branchB)
    return [flatA, flatB]

def makeList(tree):
    if len(tree) == 2:
        return tree[1]
    else:
        for i in range(1,len(tree)):
                return [tree[len(tree)-1][1], makeList(tree[i])]

def flatten(nestedList):
        def aux(listOrItem):
            if isinstance(listOrItem, list):
                for elem in listOrItem:
                    for item in aux(elem):
                        yield item
            else:
                yield listOrItem
        return list(aux(nestedList))

If we run:

makeListAndFlatten(tree)

This gives the result:

[['a'], ['b', 'c', 'd']]

A list containing two lists with the leaves from the lower branches on both sides.

EDIT:

This code was based on the format given in the original question:

tree = (1.0, (0.5, (0.25, (0.125, 'd'),(0.125, 'c')), (0.25, 'b')), (0.5,'a'))

if the input format is changed then this will not work.

seems like as a general algorithm, you would need a function which (1) calculates the total weight of the tree below and then (2) implement tree rotation to rotate the tree until you get it balanced. i.e. this is in some ways just a variation on a standard tree balancing algorithm, except that for an AVL tree for instance, you are balancing the depth, and here you are balancing the weight in the data itself.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top