Python is a language. CPython is an bytecode compiler and an interpreter for Python.
It will take some code:
for i in xrange(value):
z = i**2
if(i==1000000):
print i
if z < i:
print "yes"
and give you "bytecode":
- load the iterator into the
for
loop and loop its contents intoi
- load
i
, load2
, run binary power, storez
- load
i
, load1000000
, compare - load
i
, print - load
z
, loadi
, compare - load
'yes'
, print - finish
In full:
1 0 SETUP_LOOP 70 (to 73)
3 LOAD_NAME 0 (xrange)
6 LOAD_NAME 1 (value)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 56 (to 72)
16 STORE_NAME 2 (i)
2 19 LOAD_NAME 2 (i)
22 LOAD_CONST 0 (2)
25 BINARY_POWER
26 STORE_NAME 3 (z)
3 29 LOAD_NAME 2 (i)
32 LOAD_CONST 1 (1000000)
35 COMPARE_OP 2 (==)
38 POP_JUMP_IF_FALSE 49
4 41 LOAD_NAME 2 (i)
44 PRINT_ITEM
45 PRINT_NEWLINE
46 JUMP_FORWARD 0 (to 49)
5 >> 49 LOAD_NAME 3 (z)
52 LOAD_NAME 2 (i)
55 COMPARE_OP 0 (<)
58 POP_JUMP_IF_FALSE 13
6 61 LOAD_CONST 2 ('yes')
64 PRINT_ITEM
65 PRINT_NEWLINE
66 JUMP_ABSOLUTE 13
69 JUMP_ABSOLUTE 13
>> 72 POP_BLOCK
>> 73 LOAD_CONST 3 (None)
76 RETURN_VALUE
It's worth noting that in Python, an integer is an instance of the class int
or long
. This means that there is not only the number, but a pointer and another piece of informations saying what class it is at least. This makes a lot of overhead.
But it's also worth noting how xrange
works.
xrange
creates a class instance (LOAD_NAME (xrange)
, CALL_FUNCTION
) that can be iterated over by the for
. The for
will (basically) delegate to a function call on the iterator's __iter__
. There is a function call every loop.
Further, every time you want to get or set the variable z
or i
, it has to look in the locals dictionary. This is really slow.
Running pure Python-Code in Cython:
When you run it in Cython (the third example in your question), it compiles to C. But all this C does is tell the CPython virtual machine what to do.
CPython alone: a guy reading from a book, and merticulously carrying out its functions.
CPython with Cython: a guy shouting instructions to the guy who merticulously carries out its functions.
It might be a tiny bit faster, but the slow part is still that CPython is slowly doing the work.
Using cythonized code:
What happens when you cdef long long
, then?
Cython knows that
xrange
is acting on along long
:It knows the loop is valid (so it doesn't have to check that you gave it a
list
or somesuch)It knows the loop won't overflow (because it's undefined if it does!)
It can therefore turn it into a C loop (
for (int index=0; index<copy_of_value; index++) { i = index; ... }
)
This avoids the
int
andlong
classes, which have a lot of indirection overhead and type checkingThis avoids dictionary lookups. Things are always where you put them on the stack
For example
i ** 2
is much simpler as the routine can be inlined (it's always a number, dude) and work directly on the integer and ignore overflow
So the result ends up being run mostly by C, and only goes to CPython for some cleanup stuff and the print
calls.
Make sense?