문제

I decided to implement a small Huffman encoding script. After testing it on a small probablity list, I get the correct results when printing the tree during construction, and wrong results if I don't. What could be the cause of the problem?

Here is my code:

from __future__ import division
import heapq


class LeafNode:
    def __init__(self,symbol,prob):
        self.symbol = symbol
        self.prob = prob
    def __repr__(self):
        return "(%s: %s)" % (self.symbol, self.prob)
class InternalNode:
    def __init__(self,prob,left,right):
        self.prob = prob
        self.left = left
        self.right= right
    def __repr__(self):
        return "(internal : %s)" % (self.prob)

def getDict(seq):
    d = dict()
    for symbol in seq:
        if symbol in d:
            d[symbol] += 1
        else:
            d[symbol] = 1
    return d

def returnProbList(seq):
    data = getDict(seq)
    sum_of_all = sum(data.values())
    l = sorted(data.items(), key=lambda x:x[1])
    return [LeafNode(x[0], x[1]/sum_of_all) for x in l]

def createTree(probs):
    heapq.heapify(probs)
    while len(probs) > 1:
        a = heapq.heappop(probs)
        b = heapq.heappop(probs)
        print a,b #removing this shows wrong results.
        f = InternalNode(a.prob+b.prob,a,b)
        heapq.heappush(probs,f)
    return probs[0]

def printSymbols(tree, seq = ''):    
    if isinstance(tree, InternalNode):
        printSymbols(tree.left, seq+'0')
        printSymbols(tree.right, seq+'1')
    else:
        print tree.symbol, seq

s = "This is some short text I have written. It seems that space is the most common symbol."

#l = returnProbList(s)
l = []
l.append(LeafNode('a4',0.05))
l.append(LeafNode('a3',0.2))
l.append(LeafNode('a2',0.35))
l.append(LeafNode('a1',0.4))



#print l
tree = createTree(l)
printSymbols(tree)

Debuging it using pdb has given me even different results.

#Without print
a4 00
a3 01
a2 10
a1 11
#With print
a1 0
a4 100
a3 101
a2 11
#With pdb
a1 0
a2 10
a3 110
a4 111
도움이 되었습니까?

해결책

This really has nothing to do with printing. Your problem is here:

heapq.heappush(probs,f)

f is an instance of your InternalNode class, but the class doesn't define any ordering. Therefore Python defaults to ordering InternalNode instances by memory address. This isn't at all what you want, and memory addresses vary depending on what else you do (print, run PDB, create or delete other objects ...).

The easiest fix is to add a __cmp__ method to your class:

    def __cmp__(a, b):
        return cmp(a.prob, b.prob)

Then the output will be consistent.

EDIT: hmm, you're also getting memory-address ordering for LeafNode instances, so add a __cmp__ to that too.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top