Question

I learnt that in some immutable classes, __new__ may return an existing instance - this is what the int, str and tuple types sometimes do for small values.

But why do the following two snippets differ in the behavior?

With a space at the end:

>>> a = 'string '
>>> b = 'string '
>>> a is b
False

Without a space:

>>> c = 'string'
>>> d = 'string'
>>> c is d
True

Why does the space bring the difference?

Was it helpful?

Solution

This is a quirk of how the CPython implementation chooses to cache string literals. String literals with the same contents may refer to the same string object, but they don't have to. 'string' happens to be automatically interned when 'string ' isn't because 'string' contains only characters allowed in a Python identifier. I have no idea why that's the criterion they chose, but it is. The behavior may be different in different Python versions or implementations.

From the CPython 2.7 source code, stringobject.h, line 28:

Interning strings (ob_sstate) tries to ensure that only one string object with a given value exists, so equality tests can be one pointer comparison. This is generally restricted to strings that "look like" Python identifiers, although the intern() builtin can be used to force interning of any string.

You can see the code that does this in Objects/codeobject.c:

/* Intern selected string constants */
for (i = PyTuple_Size(consts); --i >= 0; ) {
    PyObject *v = PyTuple_GetItem(consts, i);
    if (!PyString_Check(v))
        continue;
    if (!all_name_chars((unsigned char *)PyString_AS_STRING(v)))
        continue;
    PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
}

Also, note that interning is a separate process from the merging of string literals by the Python bytecode compiler. If you let the compiler compile the a and b assignments together, e.g. by placing them in a module or an if True:, you would find that a and b would be the same string.

OTHER TIPS

This behavior is not consistent, and as others have mentioned depends on the variant of Python being executed. For a deeper discussion, see this question.

If you want to make sure that the same object is being used you can force the interning of strings by the appropriately named intern:

intern(...) intern(string) -> string

``Intern'' the given string.  This enters the string in the (global)
table of interned strings whose purpose is to speed up dictionary lookups.
Return the string itself or the previously interned string object with the
same value.
>>> a = 'string '
>>> b = 'string '
>>> id(a) == id(b)
False
>>> a = intern('string ')
>>> b = intern('string ')
>>> id(a) == id(b)
True

Note in Python3, you have to explicitly import intern from sys import intern.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top