Question

As you may have noticed before, CPython sometimes stores a single copy of identical immutable objects.

e.g.

>>> a = "hello"
>>> b = "hello"
>>> a is b
True

>>> a, b = 7734, 7734
>>> a is b
True

It appears that the hashing for what I assume is heap is performed after type inferencing

>>> a, b = 7734, 07734
>>> a is b
False

>>> a, b = 7734, 017066
>>> a is b
True

Is there any way to introspect the interpreter and print out this supposed heap of immutable objects?

Was it helpful?

Solution

No, interned objects are maintained in a range of locations, no one method exists to list them all.

  • Strings can be interned, as you discovered, and you can intern strings yourself by using the intern() function.
  • Small integers between -5 and 256 are interned.
  • Tuples are reused; the empty tuple (()) is a singleton, and 2000 each of tuple sizes 1 through to 20 are kept cached for recycling. (Just the tuple objects, not the contents).
  • None is a singleton, as are Ellipsis, NotImplemented, True and False.
  • As of Python 3.3, instance __dict__ dictionaries can share keys to save on memory.
  • The compiler can mark immutable (and in certain circumstances, mutable) sourcecode literals as constants, store them as such with the bytecode and re-use them each time the bytecode is run. This applies to strings, numbers, tuples, lists (if used with an in statement) and as of Python 3.2 sets (again, when used with in).

There may be more I haven't discovered yet.

These optimizations all help to avoid too much heap churn. And apart from None, Ellipsis, NotImplemented, True and False being a singletons they are all CPython-specific optimisations, they are not part of the Python language definition itself.

OTHER TIPS

It's a little more complicated than you make it out to be. For instance, in your examples with large integers, the same object is not reused when the uses aren't part of the same expression.

>>> a = 7734
>>> b = 7734
>>> a is b
False

On the other hand, as your first example shows, this does work with strings...but not all strings.

>>> a = "this string includes spaces"
>>> b = "this string includes spaces"
>>> a is b
False

The following objects are actually interned by default: small integers, the empty tuple, and strings that look like Python identifiers. What you're seeing with large integers and other immutable objects is an optimization due to the fact that they're being used in the same expression.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top