Question

Good day. I have been searching through related posts without getting the ideal solution I would like to find. Let me describe my problem:

I am analyzing texts from a corpus, and extracting features from those texts, then storing the features in an array. Some of these features involve ratios, for example the ratio of masculine pronoun "he" to femenine pronoun "she". The thing is, for some of the variables, the value will be zero, and they will raise ZeroDivisionError.

Since I calculate about 100 of these ratios, wrapping a try / catch exception around every ratio calculation sounds like too cumbersome.

I found out that I can do

#16,RATIO_masculine_femenine
feature_map.append(numOfHe / numOfShe if numOfShe else 0)

But it is still a bit too much laborious. I was wondering if there is a way to state at the beggining of the script that any ZeroDivisionError should be substituted by NaN or 0, or whatever other value that may suit.

Thanks

Was it helpful?

Solution

The pythonic answer is to wrap it in a function, e.g.:

def ratio(a, b):
    if b == 0:
        return 0
    else:
        return a / b

feature_map.append(ratio(numOfHe, numOfShe))

The exact form of the function depends on the rest of you code, but if you're writing a line like that hundreds of times, then you should probably be wrapping it in a function, or at least using a loop. Also, variable names like numOfHe and numOfShe hint that you might be better served by a dict.

UPDATE

I see from your code link that each calc is actually totally different, so you probably can't easily loop it. Since the calcs are still relatively simple, you could try a trick with eval like this:

calcs = [
    ...
    (12, 'h + ha + hw + hy'),
    (13, '(h + ha) / (hw + hy)'),
    ...
]

for index, calc in calcs:
    try:
        v = eval(calc, locals())
    except ZeroDivisionError:
        v = 0
    feature_map.append(v)

You could also add other info to calcs, and use a namedtuple instead. You could also do use classes instead to evaluate the calcs dynamically as you need them, if that helps.

OTHER TIPS

If you wrap your int object in a custom subclass, you can address it once:

class SafeInt(int):
    def __div__(self, y):
        try:
            return SafeInt(super(SafeInt, self).__div__(y))
        except ZeroDivisionError:
            return SafeInt(0)

Overriding all ints:

original_int = int
int = SafeInt
int(5) / 0
# O: 0

Overriding some ints:

SafeInt(5) / 0
# O: 0

You have to be careful though about keeping the object a SafeInt. You'll notice everything I return inside __div__ is wrapped in SafeInt(). int objects being immutable, you have to explicitly return a new SafeInt object every time. Which means you probably need to make a decorator that each function in SafeInt() to ensure that. I leave that as an exercise to the reader!

Otherwise you'll end up with this:

>>> SafeInt(5) / 0
0   # this is a SafeInt object
>>> _ / 0
0   # this is a SafeInt object; no error
>>> SafeInt(5) + 0
5   # this is a basic int object
>>> _ / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero

One final note: you can pass SafeInt as the argument to defaultdict to make all members SafeInt!


Edit: Knowing you wanted it to happen to all ints, I hoped something like this might work, but it's disallowed (for good reason):

>>> def wrapdiv(olddiv):
...     def newdiv(self, y):
...         try:
...             olddiv(self, y)
...         except ZeroDivisionError:
...             return 0
...     return newdiv
...
>>> int.__div__ = wrapdiv(int.__div__)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't set attributes of built-in/extension type 'int'
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top