Months ago when I was learning about Huffman coding, I used this code to try some things out. To build the frequency list I used Counter with my function genFreq_original()
since I wanted a generic input (images as numpy arrays or text). However, when creating the Huffman tree of an image, there was a problem. I could bypass it by creating the frequency list with a different function, genFreq_fix()
. As far as I can tell, the frequency lists of an image with either of these two functions is the same. a
is the one created with genFreq_original and it fails, while b
is the other one and works with everything.
The error I got while creating the tree with makeHuffTree_original()
is the following:
Traceback (most recent call last):
File "C:\Users\ip507\Desktop\huffman.py", line 67, in <module>
makeHuffTree_original(a)
File "C:\Users\ip507\Desktop\huffman.py", line 29, in makeHuffTree_original
childL, childR = heappop(tree), heappop(tree)
ValueError: setting an array element with a sequence.
I printed my variables in the function for each iteration to spot any inconsistency, but nothing. From my understanding of how a Huffman tree is created, I modified the function (makeHuffTree_test()
) to do without heaps; just straight sorting the list, removing the first two elements and putting them back into the list as a combined parent node. But that time I got a different error.
Traceback (most recent call last):
File "C:\Users\ip507\Desktop\huffman.py", line 68, in <module>
makeHuffTree_test(a)
File "C:\Users\ip507\Desktop\huffman.py", line 37, in makeHuffTree_test
tree = sorted(tree)
ValueError: setting an array element with a sequence.
Yet, both of these two functions give the same result when b
is the input frequency list. In my attempt to trace the error, I created the function sideBySize()
which is the same as makeHuffTree_original()
, but the trees of both frequencies are created at the same time. By comparing the result of both in each step, I found no difference until the sudden error. Other things I have tried include:
a == b
is always True
type(a) == type(b)
is always True, as well as all respective individual elements
- I created a random numpy array with the same size as the image and values between 0 and 255 in case numpy is to blame, but surprisingly, both
a
and b
work!
- Suggested: repr(a), repr(b). They are both equivalent to each other as well as to str(a). repr(a) just prints the contents of the variable, which is the frequency list.
I have revisited this problem quite a few times and it has always perplexed me. While there are many ways to code Huffman encoding and I have a workaround for my issue, the educational curiosity is still burning. Can someone explain this sorcery?
I'm using Python 2.7.5 on Windows 7 64bit. Here is my code. Minimum functional code posted after this.
import urllib2 as urllib
import io
import Image
import numpy as np
from collections import Counter
from heapq import heappop, heappush, heapify
def genFreq_original(data):
if isinstance(data, (np.ndarray, np.generic)):
freqList = Counter(data.flat)
else:
freqList = Counter(data)
tree = [(freqList[key], key) for key in freqList.keys()]
return tree
def genFreq_fix(data):
Nx, Ny = data.shape
freqList = np.zeros(shape=(256,), dtype=int)
for i in xrange(Nx):
for j in xrange(Ny):
freqList[data[i,j]] += 1
tree = [(freqList[i], i) for i in xrange(256) if freqList[i] != 0]
return tree
def makeHuffTree_original(tree):
heapify(tree)
while len(tree) > 1:
childL, childR = heappop(tree), heappop(tree)
parent = (childL[0] + childR[0], childL, childR)
heappush(tree, parent)
#print len(tree)
print 'Successful'
def makeHuffTree_test(tree):
while len(tree) > 1:
tree = sorted(tree)
childL, childR = tree.pop(0), tree.pop(0)
parent = (childL[0] + childR[0], childL, childR)
tree.append(parent)
#print len(tree)
print 'Successful'
def sideBySide(a, b):
heapify(a), heapify(b)
while len(a) > 1 and len(b) > 1:
d1, d2 = heappop(b), heappop(b)
c1, c2 = heappop(a), heappop(a)
p2 = (d1[0] + d2[0], d1, d2)
p1 = (c1[0] + c2[0], c1, c2)
heappush(b, p2)
heappush(a, p1)
#print len(a), all([a == b, p1 == p2, c1 == d1, c2 == d2])
print 'Successful'
url = urllib.urlopen('http://media-cdn.tripadvisor.com/media/photo-s/02/e7/dc/22/akumal-beach-resort.jpg')
fname = io.BytesIO(url.read())
dat = np.asarray(Image.open(fname).convert('L'), dtype=int) # image array
#dat = (np.random.random(dat.shape)*256).astype(int) # random numpy array
a = genFreq_original(dat) # This list fails
b = genFreq_fix(dat) # This list works
#print 'a == b: ', a == b
#print makeHuffTree_test(b) == makeHuffTree_original(b) # same result with both functions
makeHuffTree_original(a) # heappop error
#makeHuffTree_test(a) # sorted(list) error
#sideBySide(a, b) # step-by-step comparison of a and b
Minimal functional code
The question boils down to this: given two variables, a
and b
, which seem identical, why does makeHuffTree_original()
behave differently to both of them?
import urllib2 as urllib
import io
import Image
import numpy as np
from collections import Counter
from heapq import heappop, heappush, heapify
def genFreq_original(data):
if isinstance(data, (np.ndarray, np.generic)):
freqList = Counter(data.flat)
else:
freqList = Counter(data)
tree = [(freqList[key], key) for key in freqList.keys()]
return tree
def genFreq_fix(data):
Nx, Ny = data.shape
freqList = np.zeros(shape=(256,), dtype=int)
for i in xrange(Nx):
for j in xrange(Ny):
freqList[data[i,j]] += 1
tree = [(freqList[i], i) for i in xrange(256) if freqList[i] != 0]
return tree
def makeHuffTree_original(tree):
heapify(tree)
while len(tree) > 1:
childL, childR = heappop(tree), heappop(tree)
parent = (childL[0] + childR[0], childL, childR)
heappush(tree, parent)
#print len(tree)
print 'Successful'
url = urllib.urlopen('http://media-cdn.tripadvisor.com/media/photo-s/02/e7/dc/22/akumal-beach-resort.jpg')
fname = io.BytesIO(url.read())
dat = np.asarray(Image.open(fname).convert('L'), dtype=int) # image array
a = genFreq_original(dat) # This list fails
b = genFreq_fix(dat) # This list works
print 'a == b:', a == b
makeHuffTree_original(b) # Works
makeHuffTree_original(a) # heappop error