timeit modlue vs time module vs time in Linux - why are the results different greatly?

https://stackoverflow.com/questions/21793612

12-10-2022
|

Domanda

everyone! I am trying to test the creation performance of dictionary with objects, but I get some weird results. I used three different methods to measure the time to create lots of dictionary in Python. The first solution is time module. I know it is not accurate. The test file is "node_time.py"

from __future__ import print_function
from time import time

class node(object):
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.right = None
        self.left = None
        self.parent = None
        self.depth = 0
        return

begin = time()
content = [node(i,i) for i in range(1000000)]
print(time()-begin)

The second method is timeit module. It should be a much better choice. The test file is "node_timeit.py"

from __future__ import print_function
from timeit import repeat

class node(object):
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.right = None
        self.left = None
        self.parent = None
        self.depth = 0
        return

cmd = "content = [node(i,i) for i in range(1000000)]"
prepare = "from __main__ import node"
cost = min(repeat(cmd, prepare, repeat=1, number =1))
print(cost)

The third method is to use the system command "time" in Linux. The test file is "node_sys.py"

from __future__ import print_function

class node(object):
    def __init__(self, key, value):
        self.key = key
        self.value = value
        self.right = None
        self.left = None
        self.parent = None
        self.depth = 0
        return

content = [node(i,i) for i in range(1000000)]

Finally the result is quite different.

-bash-4.2$ python2 node_time.py
5.93654894829
-bash-4.2$ python2 node_timeit.py
2.6723048687
-bash-4.2$ time python2 node_sys.py
real    0m8.587s
user    0m7.344s
sys     0m0.716s

The result with time module method (measure the wall-clock time) should be greater than the currect value. But with Linux command "time", the sum of user CPU time and sys CPU time would be as much as 8.060 s. Which result is the correct one? And why they are so much different? Thanks for any comment!

Soluzione

The difference between the time and timeit timings is because

By default, timeit() temporarily turns off garbage collection during the timing.

When you allocate a lot of memory, normally the cyclic garbage collector will kick in to see if it can reclaim some of that. To get more consistent timings, timeit disables this behavior for the duration of the timing.

Compare the timings with time, with and without garbage collection:

>>> def t1():
...   s = time.time()
...   content = [node(i, i) for i in range(1000000)]
...   print time.time() - s
...
>>> t1()
3.27300000191
>>> gc.disable()
>>> t1()
1.92200016975

to the timings with timeit, with and without garbage collection:

>>> gc.enable()
>>> timeit.timeit('content = [node(i, i) for i in range(1000000)]', 'from __main
__ import node; import gc; gc.enable()', number=1)
3.2806941528164373
>>> timeit.timeit('content = [node(i, i) for i in range(1000000)]', 'from __main
__ import node', number=1)
1.8655694847876134

As you can see, both methods produce the same timing with the same GC settings.

As for the command line time command, that includes the entire runtime of the program, including interpreter setup and teardown and other parts the other timings don't include. I suspect one of the big contributors to the difference is the time taken to free all the node objects you allocated:

>>> def t2():
...   s = time.time()
...   [node(i, i) for i in range(1000000)]
...   # List and contents are deallocated
...   print time.time() - s
...
>>> t2()
3.96099996567

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow