Its apparently been a known issue for almost 3 years. The reason for ZeroDivisionError
is because of the following lines in __init__
,
if bins == None:
bins = freqdist.B()
self._freqdist = freqdist
self._T = self._freqdist.B()
self._Z = bins - self._freqdist.B()
Whenever the bins
argument is not specified, it defaults to None
so self._Z
is really just freqdist.B() - freqdist.B()
and
self._P0 = self._T / float(self._Z * (self._N + self._T))
reduces down to,
self._P0 = freqdist.B() / 0.0
Additionally, if you specify bins
as any value greater than freqdist.B()
, in executing this line of your code,
print lm.entropy(fake_test)
you will receive NotImplementedError
because within the WittenBellProbDist
class,
def discount(self):
raise NotImplementedError()
The discount
method is apparently also used in prob
and logprob
of the NgramModel
class so you won't be able to call them either.
One way to fix these problems, without changing NLTK
, would be to inherit from WittenBellProbDist
and override the relevant methods.