Question

Consider:

>>> timeit.timeit('from win32com.client import Dispatch', number=100000)
0.18883283882571789
>>> timeit.timeit('import win32com.client', number=100000)
0.1275979248277963

It takes significantly longer to import only the Dispatch function rather than the entire module, which seems counter intuitive. Could someone explain why the overhead for taking a single function is so bad? Thanks!

Was it helpful?

Solution

That's because:

from win32com.client import Dispatch

is equivalent to:

import win32com.client              #import the whole module first
Dispatch = win32com.client.Dispatch #assign the required attributes to global variables
del win32com                        #remove the reference to module object

But from win32com.client import Dispatch has its own advantages, for example if you're using win32com.client.Dispatch multiple times in your code then it's better to assign it to a variable, so that number of lookups can be reduced. Otherwise each call to win32com.client.Dispatch() will first search search for win32com and then client inside win32com, and finally Dispatch inside win32com.client.


Byte-code comparison:

From the byte code it is clear that number of steps required for from os.path import splitext are greater than the simple import.

>>> def func1():
    from os.path import splitext
...     
>>> def func2():
    import os.path
...     
>>> import dis
>>> dis.dis(func1)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               2 (('splitext',))
              6 IMPORT_NAME              0 (os.path)
              9 IMPORT_FROM              1 (splitext)
             12 STORE_FAST               0 (splitext)
             15 POP_TOP             
             16 LOAD_CONST               0 (None)
             19 RETURN_VALUE        
>>> dis.dis(func2)
  2           0 LOAD_CONST               1 (-1)
              3 LOAD_CONST               0 (None)
              6 IMPORT_NAME              0 (os.path)
              9 STORE_FAST               0 (os)
             12 LOAD_CONST               0 (None)
             15 RETURN_VALUE     

Module caching:

Note that after from os.path import splitext you can still access the os module using sys.modules because python caches the imported modules.

From docs:

Note For efficiency reasons, each module is only imported once per interpreter session. Therefore, if you change your modules, you must restart the interpreter – or, if it’s just one module you want to test interactively, use reload(), e.g. reload(modulename).

Demo:

import sys
from os.path import splitext
try:
    print os
except NameError:
    print "os not found"
try:
    print os.path
except NameError:
    print "os.path is not found"

print sys.modules['os']

output:

os not found
os.path is not found
<module 'os' from '/usr/lib/python2.7/os.pyc'>

Timing comparisons:

$ python -m timeit -n 1 'from os.path import splitext'
1 loops, best of 3: 5.01 usec per loop
$ python -m timeit -n 1 'import os.path'
1 loops, best of 3: 4.05 usec per loop
$ python -m timeit -n 1 'from os import path'
1 loops, best of 3: 5.01 usec per loop
$ python -m timeit -n 1 'import os'
1 loops, best of 3: 2.86 usec per loop

OTHER TIPS

The entire module still has to be imported to get the name you want from it...You'll also find that the OS is caching the module so subsequent access to the .pyc file will be quicker.

The main issue here is that your code isn't timing what you think it is timing. timieit.timeit() will run the import statement in a loop, 100000 times, but at most the first iteration will actually perform the import. All other iterations simply look up the module in sys.modules, look up the name Dispatch in the module's globals and add this name to the importing module's globals. So it's essentially only dictionary operations, and small variations in the byte code will become visible since there relative influence compared to the very cheap dictionary operations is big.

If, on the other hand, you measure the time it takes to actually import the module, you can't see any difference between the two approaches, since in both cases this time is completely dominated by the actual import, and the differences fiddling around with the name dictionary become negligable. We can force reimports by deleting the module from sys.modules in each iteration:

In [1]: import sys

In [2]: %timeit from os import path; del sys.modules["os"]
1000 loops, best of 3: 248 us per loop

In [3]: %timeit import os.path; del sys.modules["os"]
1000 loops, best of 3: 248 us per loop

In [4]: %timeit from os import path
1000000 loops, best of 3: 706 ns per loop

In [5]: %timeit import os.path
1000000 loops, best of 3: 444 ns per loop
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top