質問

I prefer using nested functions instead of methods or global functions in Python whenever possible. So I decided to test their performance because it seams that when you define a function in another function there will be an overhead for the definition of the inner function in each call of the outer function.

At best I was hoping for the global function to be just slightly faster, but surprisingly the nested function was faster. Does anyone have any idea why?

This is my code:

from time import clock

def a(n):
    return n + 1

def b1(loopcount):
    return sum([a(n) for n in range(loopcount)])

def b2(loopcount):
    def a(n):
        return n + 1
    return sum([a(n) for n in range(loopcount)])

powers = [5, 6, 7]
b1times = []
b2times = []
print "   ", "".join(["{:^10d}".format(n) for n in powers])    
for i in range(5):
    for power in powers:
        t = clock()
        b1(10**power)
        b1times.append(clock() - t)
    for power in powers:
        t = clock()
        b2(10**power)
        b2times.append(clock() - t)
    print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times])
    print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times])
    print ""
    b1times = []
    b2times = [] 

And this is the result on my computer:

        5         6         7
b1:  0.08200   0.82773   8.47946
b2:  0.06914   0.79637   8.18571

b1:  0.07332   0.82139   8.68262
b2:  0.06547   0.82088   8.19606

b1:  0.07963   0.82625   9.65037
b2:  0.06617   0.82027   8.21412

b1:  0.07630   0.82112   8.49082
b2:  0.06541   0.80686   8.20532

b1:  0.12328   0.87034   8.42964
b2:  0.07059   0.79717   8.24620

UPDATE: using @Janne Karila's comment

Now that I'm calling b1 and b2 more, b1 becomes faster. So as @Kos and @Pavel Anossov said in their answers a few factors affect the speed here and you can't make a general statement.
Thanks everyone!

from time import *

def a1(n):
    return n + 1

def b1(n):
    return a1(n)

def b2(n):
    def a2():
        return n + 1
    return a2()

powers = [4, 5, 6]
b1times = []
b2times = []
print "   ", "".join(["{:^10d}".format(n) for n in powers])    
for i in range(5):
    for power in powers:
        t = clock()
        sum([b1(n) for n in range(10**power)])
        b1times.append(clock() - t)
    for power in powers:
        t = clock()
        sum([b2(n) for n in range(10**power)])
        b2times.append(clock() - t)
    print "b1:", "".join(["{:^10.5f}".format(n) for n in b1times])
    print "b2:", "".join(["{:^10.5f}".format(n) for n in b2times])
    print ""
    b1times = []
    b2times = [] 
役に立ちましたか?

解決

There are several parts that have an effect here:

1) The time to define the function (create the function object),
2) The time to look up the function object by name,
3) The time to actually call the function.

Your global function example is faster with 1) (no need to redefine a in each call to b1). However, it is slower with in 2) because global variable lookup is slower than local lookup.

Why can't we have both then?

I've extended your benchmark with a solution that uses the global function, but avoids the global lookup using a local variable. It seems to be the fastest of the three on my machine:

        5         6         7
b1:  0.04147   0.44421   4.46508
b2:  0.03399   0.43321   4.41121
b3:  0.03258   0.41821   4.25542

b1:  0.03240   0.42998   4.39774
b2:  0.03320   0.43465   4.42229
b3:  0.03155   0.42109   4.23669

b1:  0.03273   0.43321   4.37266
b2:  0.03326   0.43551   4.42208
b3:  0.03137   0.42356   4.25341

b1:  0.03253   0.43104   4.40466
b2:  0.03401   0.43719   4.42996
b3:  0.03155   0.41681   4.24132

b1:  0.03244   0.42965   4.37192
b2:  0.03310   0.43629   4.42727
b3:  0.03117   0.41701   4.23932

他のヒント

LOAD_GLOBAL is much slower than LOAD_FAST.

b1

  5           0 LOAD_GLOBAL              0 (sum)
              3 BUILD_LIST               0
              6 LOAD_GLOBAL              1 (range)
              9 LOAD_FAST                0 (loopcount)
             12 CALL_FUNCTION            1
             15 GET_ITER
        >>   16 FOR_ITER                18 (to 37)
             19 STORE_FAST               1 (n)
             22 LOAD_GLOBAL              2 (a)
             25 LOAD_FAST                1 (n)
             28 CALL_FUNCTION            1
             31 LIST_APPEND              2
             34 JUMP_ABSOLUTE           16
        >>   37 CALL_FUNCTION            1
             40 RETURN_VALUE

b2

  8           0 LOAD_CONST               1 (<code object a at 01E57A40, ...>)
              3 MAKE_FUNCTION            0
              6 STORE_FAST               1 (a)

 10           9 LOAD_GLOBAL              0 (sum)
             12 BUILD_LIST               0
             15 LOAD_GLOBAL              1 (range)
             18 LOAD_FAST                0 (loopcount)
             21 CALL_FUNCTION            1
             24 GET_ITER
        >>   25 FOR_ITER                18 (to 46)
             28 STORE_FAST               2 (n)
             31 LOAD_FAST                1 (a)
             34 LOAD_FAST                2 (n)
             37 CALL_FUNCTION            1
             40 LIST_APPEND              2
             43 JUMP_ABSOLUTE           25
        >>   46 CALL_FUNCTION            1
             49 RETURN_VALUE

Whether the difference makes up for creating a function every time depends on the function (and how many times you use LOAD_GLOBAL).

 

Creating a local reference to a global function to use in an inner loop is a relatively common optimisation:

def a(n):
    return n + 1

def b1(loopcount):
    local_a = a
    return sum([local_a(n) for n in range(loopcount)])

This should still be faster than looking up a global and more maintainable than a nested function.

In general, no, as each time the outer function is run, the inner function must be redefined.

This said, performance really doesn't matter unless you can show it does (for example, a particular loop is too slow and causing performance issues) - so I'd recommend using what is most readable - premature optimization is a bad thing.

That said, I would argue global functions are probably a better solution as they are easier to reuse across your code and I would say more readable - after all, flat is better than nested.

I think it is caused by the order of searching object definition in python.

Whenever the interpreter meets an object name, it will first search the local object definition dict which records the mapping between object name and the object itself. If not found in local dict, then global, then builtin.

And, all things in python are objects, including function.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top