Question

If I defined a function like this:

def ccid_year(seq):
  year, prefix, index, suffix = seq
  return year

Is Python allowed to optimize it to be effectively:

def ccid_year(seq):
  return seq[0]

I'd prefer to write the first function because it documents the format of the data being passed in but would hope that Python would generate code that is effectively as efficient as the second definition.

Was it helpful?

Solution

The two functions are not equivalent:

def ccid_year_1(seq):
  year, prefix, index, suffix = seq
  return year

def ccid_year_2(seq):
  return seq[0]

arg = {1:'a', 2:'b', 0:'c', 3:'d'}
print ccid_year_1(arg)
print ccid_year_2(arg)

The first call prints 0 and the second prints c.

OTHER TIPS

I'll answer the question at face value later, but first: When in doubt, benchmark it! But first, recall that most time is spent in a small portion of the code (i.e., most code is irrelevant to performance!) and, in CPython, function call overhead usually dominates small inefficiencies. Not to mention that large-scale algorithmic inefficiencies (a.k.a. freaking stupid code) dwarfs micro-optimization concerns.

So either don't worry about this at all, or if you have reason to worry about it, first benchmark alternatives and second don't put it in a function. Note that "reasons to worry about it" must be weighted against the time spent worrying, and the maintenance burden (if there is one) of the manual optimization.

CPython, the reference implementation you most like use, is very conservative about optimizing at this level. While there is a peephole optimizer operating on bytecode, it is limited in scale. More generally, you can't expect much optimization crossing a single statement. The problem with statically optimizing Python code is that there's a billion ways even the most innocently-looking program frament can call into arbitrary code, which might do anything at all, so you can't omit these calls. While we're at it, your proposed optimization is invalid (in the sense that the program doesn't have the same behavior) if seq is of the wrong type (not a sequence, or a very weird sequence) or length (not exactly three items long)! Any program claiming to implement Python must maintain such differences, so it won't do the transformation you suggest literally. I assume this was just an off-hand illustration, but it does indicate you seriously underestimate how complex Python is (to implement, and doubly so to optimize). I and others have written about this at length before, so I'll stop now before this post becomes even larger.

PyPy on the other hand will, if this function is indeed called from a hot loop, probably optimize this and a million other things you didn't even think of, while compiling it down to a machine code loop that iterates faster than any Python loop could ever iterate on CPython. It will still contain a few checks to break out of the loop and take the proper action (e.g. raise an exception) if necessary, but they'll also be highly efficient if not triggered.

I do not know much about IronPython and Jython and other implementations, but if their lack of consistent several-times-faster-than-CPython benchmark results is any indicator, they do not perform significant optimizations. While the VMs IronPython and Jython include JIT compilers (not - but not quite - entirely unlike PyPy's), these JIT compilers are built for very different languages, and I'd be very surprised if they could look through the mess of code IronPython/Jython must execute to achieve Python semantics and perform such optimizations on it.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top