Question

Inspired by this nice answer,

Here's a benchmark:

import timeit

def test1():
    a = [1,2,3]
    a.insert(0,1)

def test2():
    a = [1,2,3]
    a[0:0]=[1]

print (timeit.timeit('test1()','from __main__ import test1'))
print (timeit.timeit('test2()','from __main__ import test2'))

For me, test2 is sligtly faster (~10%). Why is that the case? I would expect it to be slower since:

  1. slice assignment must be able to accept iterables of any length and therefore must be more general.
  2. in slice assignment, we need to create a new list on the right hand side just to get it to work.

Can anybody help me understand this?

(using python 2.7 on OS-X 10.5.8)

Was it helpful?

Solution

Your first test case has to call the method insert on the list a, whereas all the operations in test2 are handled directly in byte code. Note the CALL_FUNCTION in the disassembly of test1 below. Calling functions is moderately expensive in Python: certainly expensive enough to account for a few percent difference in run time.

>>> import dis
>>> dis.dis(test1)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 BUILD_LIST               3
             12 STORE_FAST               0 (a)

  3          15 LOAD_FAST                0 (a)
             18 LOAD_ATTR                0 (insert)
             21 LOAD_CONST               4 (0)
             24 LOAD_CONST               1 (1)
             27 CALL_FUNCTION            2
             30 POP_TOP             
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE        
>>> dis.dis(test2)
  2           0 LOAD_CONST               1 (1)
              3 LOAD_CONST               2 (2)
              6 LOAD_CONST               3 (3)
              9 BUILD_LIST               3
             12 STORE_FAST               0 (a)

  3          15 LOAD_CONST               1 (1)
             18 BUILD_LIST               1
             21 LOAD_FAST                0 (a)
             24 LOAD_CONST               4 (0)
             27 LOAD_CONST               4 (0)
             30 STORE_SLICE+3       
             31 LOAD_CONST               0 (None)
             34 RETURN_VALUE        

Bad explanation

I posted this first, but after consideration I think it is not correct. The difference I describe here should only make a noticeable difference when there is a lot of data to be moved, which isn't the case in the test here. And even with a lot of data the difference is only a couple of percent:

import timeit

def test1():
    a = range(10000000)
    a.insert(1,1)

def test2():
    a = range(10000000)
    a[1:1]=[1]

>>> timeit.timeit(test1, number=10)
6.008707046508789
>>> timeit.timeit(test2, number=10)
5.861173868179321

The method list.insert is implemented by the function ins1 in listobject.c. You'll see that it copies the item references for the tail of the list one by one:

for (i = n; --i >= where; )
    items[i+1] = items[i];

On the other hand slice assignment is implemented by the function list_ass_slice, which calls memmove:

memmove(&item[ihigh+d], &item[ihigh],
        (k - ihigh)*sizeof(PyObject *));

So I think the answer to your question is that the C library function memmove is better optimized than the simple loop. See here for the glibc implementation of memmove: I believe that when called from list_ass_slice it eventually ends up calling _wordcopy_bwd_aligned which you can see is heavily hand-optimized.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top