Question

I wrote a simple function that receives something that implements .index() and a list of characters to check for.

My assumption was that since strings and tuples are both immutable they would have similar performance (or at the least, tuple would outperform list). Instead, tuple seems equivalent to list. Why is that?

from string import ascii_lowercase
from random import randint
from timeit import Timer


def check_index(st, guesses):
    for i in guesses:
        st.index(i)

if __name__ == "__main__":
    num_checks = 10
    lst = [n for n in ascii_lowercase]
    st = ascii_lowercase
    tup = tuple(lst)
    guesses = [ascii_lowercase[randint(0, len(ascii_lowercase)-1)]
               for n in xrange(num_checks)]

    def run_string():
        check_index(st, guesses)

    def run_list():
        check_index(lst, guesses)

    def run_tuple():
        check_index(tup, guesses)

    t2 = Timer(run_list)
    print "List", t2.timeit()
    t3 = Timer(run_tuple)
    print "Tuple", t3.timeit()
    t = Timer(run_string)
    print "String", t.timeit()

Sample runs (Python 2.7.6)

List 5.26431703568
Tuple 5.28769207001
String 3.16058015823

List 5.30263400078
Tuple 5.17412590981
String 3.17718791962

List 5.21962976456
Tuple 5.35261583328
String 3.22652792931
Was it helpful?

Solution

tl;dr: String index calls have a lot of optimizations under the hood, and don't have to do "rich comparisons". Both lists and tuples do have to do them; mutability doesn't matter.


If you open up stringobject.c, you can see that index calls string_find_internal, which calls stringlib_find_slice, which calls stringlib_find, which calls fastsearch.

fastsearch links to this page in its comments, check it out if you're interested. It has some interesting bits on how Boyer-Moore substring searching is implemented in python. But that's not important in your 1-character substring search, which fastsearch explicitly special-cases:

    } else if (mode == FAST_SEARCH) {
        for (i = 0; i < n; i++)
            if (s[i] == p[0])
                return i;
    }

Very tight loop, and tight loops are pretty dang fast in C.

Okay, that covers strings. What about tuples and lists?

It's far simpler this time; this is the loop that both types have to do:

for (i = start; i < stop && i < Py_SIZE(self); i++) {
    int cmp = PyObject_RichCompareBool(self->ob_item[i], v, Py_EQ);
    if (cmp > 0)
        return PyInt_FromSsize_t(i);

Instead of just checking equality with ==, this loop has to do PyObject_RichCompareBool every iteration through the loop. I won't get into the code in too many specifics (check out object.c if you're curious), but essentially this has to do some type checking, then see if the types implement rich comparisons, then call that comparison function, check if it returned the NotImplemented singleton, and finally return the comparison.

So, that loop is the bulk of the work that's done in an index call, and both lists and tuples have to do it. Strings get to cheat.

OTHER TIPS

index compares every element with the given object. Since tuples stores like lists python objects, the comparision takes the same time. Strings are faster, because comparing characters is faster than comparing objects.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top