Is there a Python module for handling Python object addresses?

https://stackoverflow.com/questions/3684151

02-10-2019
|

Question

(When I say "object address", I mean the string that you type in Python to access an object. For example 'life.State.step'. Most of the time, all the objects before the last dot will be packages/modules, but in some cases they can be classes or other objects.)

In my Python project I often have the need to play around with object addresses. Some tasks that I have to do:

Given an object, get its address.
Given an address, get the object, importing any needed modules on the way.
Shorten an object's address by getting rid of redundant intermediate modules. (For example, 'life.life.State.step' may be the official address of an object, but if 'life.State.step' points at the same object, I'd want to use it instead because it's shorter.)
Shorten an object's address by "rooting" a specified module. (For example, 'garlicsim_lib.simpacks.prisoner.prisoner.State.step' may be the official address of an object, but I assume that the user knows where the prisoner package is, so I'd want to use 'prisoner.prisoner.State.step' as the address.)

Is there a module/framework that handles things like that? I wrote a few utility modules to do these things, but if someone has already written a more mature module that does this, I'd prefer to use that.

One note: Please, don't try to show me a quick implementation of these things. It's more complicated than it seems, there are plenty of gotchas, and any quick-n-dirty code will probably fail for many important cases. These kind of tasks call for battle-tested code.

UPDATE: When I say "object", I mostly mean classes, modules, functions, methods, stuff like these. Sorry for not making this clear before.

Solution 2

I released the address_tools module which does exactly what I asked for.

Here is the code. Here are the tests.

It's part of GarlicSim, so you can use it by installing garlicsim and doing from garlicsim.general_misc import address_tools. Its main functions are describe and resolve, which are parallel to repr and eval. The docstrings explain everything about how these functions work.

There is even a Python 3 version on the Python 3 fork of GarlicSim. Install it if you want to use address_tools on Python 3 code.

OTHER TIPS

Short answer: No. What you want is impossible.

The long answer is that what you think of as the "address" of an object is anything but. life.State.step is merely one of the ways to get a reference to the object at that particular time. The same "address" at a later point can give you a different object, or it could be an error. What's more, this "address" of yours depends on the context. In life.State.step, the end object depends not just on what life.State and life.State.step are, but what object the name life refers to in that namespace.

Specific answers to your requests:

The end object has no way whatsoever of finding out how you referred to it, and neither has any code that you give the object to. The "address" is not a name, it's not tied to the object, it's merely an arbitrary Python expression that results in an object reference (as all expressions do.) You can only make this work, barely, with specific objects that aren't expected to move around, such as classes and modules. Even so, those objects can move around, and frequently do move around, so what you attempt is likely to break.
As mentioned, the "address" depends on many things, but this part is fairly easy: __import__() and getattr() can give you these things. They will, however, be extremely fragile, especially when there's more involved than just attribute access. It can only remotely work with things that are in modules.
"Shortening" the name requires examining every possible name, meaning all modules and all local names, and all attributes of them, recrusively. It would be a very slow and time-consuming process, and extremely fragile in the face of anything with a __getattr__ or __getattribute__ method, or with properties that do more than return a value.
is the same thing as 3.

For points 3 and 4, I guess that you are looking for facilities like

from life import life  # life represents life.life
from garlicsim_lib.simpacks import prisoner

However, this is not recommended, as it makes it harder for you or people who read your code to quickly know what prisoner represents (where module does it come from? you have to look at the beginning of the code to get this information).

For point 1, you can do:

from uncertainties import UFloat

print UFloat.__module__  # prints 'uncertainties'

import sys
module_of_UFloat = sys.modules[UFloat.__module__]

For point 2, given the string 'garlicsim_lib.simpacks.prisoner', you can get the object it refers to with:

obj = eval('garlicsim_lib.simpacks.prisoner')

This supposes that you have imported the module with

import garlicsim_lib  # or garlicsim_lib.simpacks

If you even want this to be automatic, you can do something along the lines of

import imp

module_name = address_string.split('.', 1)[0]
mod_info = imp.find_module(module_name)
try:
    imp.load_module(module_name, *mod_info)
finally:
    # Proper closing of the module file:
    if mod_info[0] is not None:
        mod_info[0].close()

This works only in the simplest cases (garlicsim_lib.simpacks need to be available in garlicsim_lib, for instance).

Coding things this way is, however, highly unusual.

Twisted has #2 as twisted/python/reflect.py . You need something like it for making a string-based configuration system, like with Django's urls.py configuration.

Take a look at the code and the version control log to see what they had to do to make it work - and fail! - the right way.

The other things you are looking for place enough restrictions on the Python environment that there is no such thing as a general purpose solution.

Here's something which somewhat implements your #1

>>> import pickle
>>> def identify(f):
...   name = f.__name__
...   module_name = pickle.whichmodule(f, name)
...   return module_name + "." + name
... 
>>> identify(math.cos)
'math.cos'
>>> from xml.sax.saxutils import handler
>>> identify(handler)
'__main__.xml.sax.handler'
>>>

Your #3 is underdefined. If I do

__builtin__.step = path.to.your.stap

then should the search code find it as "step"?

The simplest implementation I can think of simply searches all modules and looks for top-level elements which are exactly what you want

>>> import sys
>>> def _find_paths(x):
...   for module_name, module in sys.modules.items():
...     if module is None:
...         continue
...     for (member_name, obj) in module.__dict__.items():
...       if obj is x:
...         yield module_name + "." + member_name
... 
>>> def find_shortest_name_to_object(x):
...   return min( (len(name), name) for name in _find_paths(x) )[1]
... 
>>> find_shortest_name_to_object(handler)
'__builtin__._'
>>> 5
5
>>> find_shortest_name_to_object(handler)
'xml.sax.handler'
>>>

Here you can see that 'handler' was actually in _ from the previous expression return, making it the shortest name.

If you want something else, like recursively searching all members of all modules, then just code it up. But as the "_" example shows, there will be surprises. Plus, this isn't stable, since importing another module might make another object path available and shorter.

That's why people say over and over again that what you want isn't actually useful for anything, and that's why there's no modules for it.

And as for your #4, how in the world will any general package cater to those naming needs?

In any case, you wrote

Please, don't try to show me a quick implementation of these things. It's more complicated than it seems, there are plenty of gotchas, and any quick-n-dirty code will probably fail for many important cases. These kind of tasks call for battle-tested code.

so don't think of my examples as solutions but as examples of why what you're asking for makes little sense. It's such a fragile solution space adn the few who venture there (mostly for curiosity) have such different concerns that a one-off custom solution is the best thing. A module for most of these makes no sense, and if it did make sense the explanation of what the module does would probably be longer than the code.

And hence the answer to your question is "no, there are no such modules."

What makes your question even more confusing is that the C implementation of Python already defines an "object address". The docs for id() say:

CPython implementation detail: This is the address of the object.

What you're looking for is the name, or the path to the object. Not the "Python object address."

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow