문제

Question: What are the pros and cons of writing an __init__ that takes a collection directly as an argument, rather than unpacking its contents?

Context: I'm writing a class to process data from several fields in a database table. I iterate through some large (~100 million rows) query result, passing one row at a time to a class that performs the processing. Each row is retrieved from the database as a tuple (or optionally, as a dictionary).

Discussion: Assume I'm interested in exactly three fields, but what gets passed into my class depends on the query, and the query is written by the user. The most basic approach might be one of the following:

class Direct:
    def __init__(self, names):
        self.names = names

class Simple:
    def __init__(self, names):
        self.name1 = names[0]
        self.name2 = names[1]
        self.name3 = names[2]

class Unpack:
    def __init__(self, names):
        self.name1, self.name2, self.name3 = names

Here are some examples of rows that might be passed to a new instance:

good = ('Simon', 'Marie', 'Kent')                 # Exactly what we want
bad1 = ('Simon', 'Marie', 'Kent', '10 Main St')   # Extra field(s) behind
bad2 = ('15', 'Simon', 'Marie', 'Kent')           # Extra field(s) in front
bad3 = ('Simon', 'Marie')                         # Forgot a field

When faced with the above, Direct always runs (at least to this point) but is very likely to be buggy (GIGO). It takes one argument and assigns it exactly as given, so this could be a tuple or list of any size, a Null value, a function reference, etc. This is the most quick-and-dirty way I can think of to initialize the object, but I feel like the class should complain immediately when I give it data it's clearly not designed to handle.

Simple handles bad1 correctly, is buggy when given bad2, and throws an error when given bad3. It's convenient to be able to effectively truncate the inputs from bad1 but not worth the bugs that would come from bad2. This one feels naive and inconsistent.

Unpack seems like the safest approach, because it throws an error in all three "bad" cases. The last thing we want to do is silently fill our database with bad information, right? It takes the tuple directly, but allows me to identify its contents as distinct attributes instead of forcing me to keep referring to indices, and complains if the tuple is the wrong size.

On the other hand, why pass a collection at all? Since I know I always want three fields, I can define __init__ to explicitly accept three arguments, and unpack the collection using the *-operator as I pass it to the new object:

class Explicit:
    def __init__(self, name1, name2, name3):
        self.name1 = name1
        self.name2 = name2
        self.name3 = name3

names = ('Guy', 'Rose', 'Deb')
e = Explicit(*names)

The only differences I see are that the __init__ definition is a bit more verbose and we raise TypeError instead of ValueError when the tuple is the wrong size. Philosophically, it seems to make sense that if we are taking some group of data (a row of a query) and examining its parts (three fields), we should pass a group of data (the tuple) but store its parts (the three attributes). So Unpack would be better.

If I wanted to accept an indeterminate number of fields, rather than always three, I still have the choice to pass the tuple directly or use arbitrary argument lists (*args, **kwargs) and *-operator unpacking. So I'm left wondering, is this a completely neutral style decision?

도움이 되었습니까?

해결책

This question is probably best answered by trying out the different approaches and seeing what makes the most sense to you and is the most easily understood by others reading your code.

Now that I have the benefit of more experience, I'd ask myself, how do I plan to access these values?

When I access any one of the values in this collection, am I likely to be using most or all of the values in that same subroutine or section of code? If so, the "Direct" approach is a good choice; it's the most compact and it lets me think about the collection as a collection until the point that I absolutely need to pay attention to what's inside.

On the other hand, if I'm using some values here, some values there, I don't want have to constantly remember which index to access or add verbosity in the form of dictionary keys when I could just be referring directly to the values using separately named attributes. I would probably avoid the "Direct" approach in this case so that I only have to even think about the fact that there's a collection when the class is first initialized.

Each of the remaining approaches involves splitting the collection up into different attributes, and I think the clear winner here is the "Explicit" approach. The "Simple" and "Unpack" approaches share a hidden dependency on the order of the collection, without offering any real advantage.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top