Domanda

I am trying to implement Theil's index (http://en.wikipedia.org/wiki/Theil_index) in Python to measure inequality of revenue in a list.

The formula is basically Shannon's entropy, so it deals with log. My problem is that I have a few revenues at 0 in my list, and log(0) makes my formula unhappy. I believe adding a tiny float to 0 wouldn't work as log(tinyFloat) = -inf, and that would mess my index up.

[EDIT] Here's a snippet (taken from another, much cleaner -and freely available-, implementation)

    def error_if_not_in_range01(value):
        if (value <= 0) or (value > 1):
            raise Exception, \
                str(value) + ' is not in [0,1)!'
    def H(x)
        n = len(x)
        entropy = 0.0
        sum = 0.0
        for x_i in x: # work on all x[i]
            print x_i
            error_if_not_in_range01(x_i)
            sum += x_i
            group_negentropy = x_i*log(x_i)
            entropy += group_negentropy
        error_if_not_1(sum)
        return -entropy
    def T(x):
        print x
        n = len(x)
        maximum_entropy = log(n)
        actual_entropy = H(x)
        redundancy = maximum_entropy - actual_entropy
        inequality = 1 - exp(-redundancy)
        return redundancy,inequality

Is there any way out of this problem?

È stato utile?

Soluzione

If I understand you correctly, the formula you are trying to implement is the following:

enter image description here

In this case, your problem is calculating the natural logarithm of Xi / mean(X), when Xi = 0.

However, since that has to be multiplied by Xi / mean(X) first, if Xi == 0 the value of ln(Xi / mean(X)) doesn't matter because it will be multiplied by zero. You can treat the value of the formula for that entry as zero, and skip calculating the logarithm entirely.

In the case that you are implementing Shannon's formula directly, the same holds:

enter image description here

In both the first and second form, calculating the log is not necessary if Pi == 0, because whatever value it is, it will have been multiplied by zero.

UPDATE:

Given the code you quoted, you can replace x_i*log(x_i) with a function as follows:

def Group_negentropy(x_i):
    if x_i == 0:
        return 0
    else:
        return x_i*log(x_i)

def H(x)
    n = len(x)
    entropy = 0.0
    sum = 0.0
    for x_i in x: # work on all x[i]
        print x_i
        error_if_not_in_range01(x_i)
        sum += x_i
        group_negentropy = Group_negentropy(x_i)
        entropy += group_negentropy
    error_if_not_1(sum)
    return -entropy
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top