Domanda

Very often when coding in Python (or any language where type is not specified) I run into the following problem:

I want to write a function that can accept either a thing, or a container of such things in general. For example consider this subset function:

def subset(df, months=all_months):
    return df[pd.DatetimeIndex(df['Timestamp']).month.isin(months)]

I would now like to be able to call this both with a list of months, or a single month. The second case however is only possible by something ugly like:

subset(df, months=["June"]) # Why not just subset(df, "June")?

I can think of 2 solutions, neither of which feels elegant to me:

  • I can use isinstance and treat scalars different that containers. This is not the best because I type check and also there are cases where the code in the function is complex and needs major revision when treating scalar instead of list

  • I could cast the argument to list: months = [months]. Again I would be using the isinstance though.

Is there are better looking solution than the ones I thought of? Even better, is there a well known solution that I am not aware of?

EDIT

Both answers provided acceptable solutions. I decided to stick with the one proposed by @Doc Brown because I can easily isolate the type checking into a decorator and want to avoid introducing unwrapping into the API for workplace specific reasons.

È stato utile?

Soluzione

Use an argument list:

def subset(df, *months):
    for month in months: ...

subset(df, 1)
subset(df, 1, 2)
subset(df, *[1, 2])

It's easiest for users of your api to understand how things work when functions work in exactly one way. Behaving differently depending on what you're passed can be very confusing.

Altri suggerimenti

Using type checks (for example, by utilizing isinstance) can be a sensible solution (even the implementation of the isin function you used for demonstration purposes makes use of it). In a language with dynamic typing like Python, there is no function overloading like in statically typed languages. So often the cleanest and simplest approach to distinguish between different kinds of parameters is indeed to use isinstance, especially when you want to solve a problem in a generic manner.

However, one has to be careful not to detect strings here as sequences (which are sequences of characters). You only want "a sequence of strings" or a single "string" as valid. Given you want to support built-in collections (or strings) as input, you can do the requires type check with a helper function like this one:

def makeSequence(obj):
    if type(obj) in [list, tuple, xrange]:
        return obj
    else:
        return [obj]

So with this implementation

def subset(df, months=all_months):
    months=makeSequence(months)
    # ... maybe some complex code dealing with a collection of months ...
    return df[pd.DatetimeIndex(df['Timestamp']).month.isin(months)]

the code does not need "major revision when treating scalar instead of list", it is a simple, straightforward change to the function which just adds one line at the beginning, and leaves the already existing code untouched.

Note further that the "bad reputation" of functions like isinstance or the kind of type checking above is probably caused by cases where type checking is abused in a manner which violates the Open-Closed principle. When, for example, isinstance is called for a list of different types, maybe from a class hierarchy, with special case handling for each type, new requirements often lead to a necessity of not forgetting to extend that list. In such a situation the usage of isinstance becomes error-prone and should be avoided. However, the use case above is not exactly the same: for this use case, makeSequence most probably needs only to support the built-in collections of Python, so nothing which will change very frequently.

Once you have a basic layer of such functions which works on scalars and lists as well, other functions based on that layer do not need to implement such special treatment any more. So maybe you can try to implement such a layer?

You could simply rely on English grammar:

def subset(df, *, months=None, month=None):
    if months is None:
        if month is None:
            raise ValueError('subset must be called with either month= or months= parameter')
        else:
            months = [month]

    return df[pd.DatetimeIndex(df['Timestamp']).month.isin(months)]
Autorizzato sotto: CC-BY-SA insieme a attribuzione
scroll top