Working with “external” object identifiers

https://softwareengineering.stackexchange.com/questions/388290

21-02-2021
|

Question

Note: I wrote this question in a way that made it seem I'm mainly concerned with the memory usage and ways to optimise this. Rather, my original intention (and my current intention, upon revisiting this question) is to ask for solutions to the maintainability problem: the fact that one has to make sure that a corresponding pair of key (in the associative container) and field (in the object itself) for the name/ID of the object stay in sync.

I have often encountered a situation in which users of a program (not necessarily human, just an external agent) need to be able to reference objects of a class with a certain "external" identifier or name, in order to create, access, modify or delete them. A couple of examples:

a command-line application where end-users refer to specific instances by name, e.g. a calculator that deals with geometrical objects
a GUI with a dynamically created drop-down list where each item corresponds to an instance

This identifier could be an integer, for example, or a string. We'll suppose it's a string for the sake of the argument.

Where should these IDs be stored?

Let me clarify. In order for the program to "link" a certain identifier to the corresponding actual object, it needs to have some sort of associative container with IDs as keys and their corresponding objects as values.

Now, suppose some method of this class needs access to the identifier; e.g. a prettyPrint method that also prints the object's name. In this case, should the identifier be a field of the class? This seems like the obvious solution. But the problem is that now the identifiers are duplicated: as keys in the associative container, and as fields in each instance. That would imply duplicate "management tasks"—i.e., when adding a new object to the associative container, one would need to specify the identifier as both key and field. It would also entail slightly more memory usage, but this is the least significant of the two problems.

Is there a general approach to this problem? (Or am I fussing over nothing?)

One approach would be to have the associative container keep references to the ID field as keys. But I don't know how to do that, at least in a language like C++.

Solution 3

A different but perhaps easier approach to dealing with the duplication (from the perspective of maintainability) would be to make a custom class for the associative container that just makes sure to keep everything in sync.

Python example:

class Named(Protocol): 
    name: str


class NameToObjectMapping:
    def __init__(self):
        self._dict: Dict[str, Named] = {}
    
    def __getitem__(self, name: str) -> Named:
        return self._dict[name]

    def add(obj: Named):
        self._dict[obj.name] = obj

    def rename(obj: Named, new_name: str):
        old_name = obj.name
        del self._dict[old_name]
        obj.name = new_name
        self.add(obj)

The downside to this is that this would force the user to make any renamings through this mapping, and not directly on the objects themselves.

By the way, I do consider this solution to be distinct enough from the other one to warrant a different answer: this is more about making the problem easier to deal with than actually solving it. Although perhaps it is the most straightforward approach.

OTHER TIPS

Is there a general way to overcome this problem? (Or am I fussing over nothing?)

Yes, you are too concerned about this.

So you are duplicating data. As long as you have enough RAM available, this isn't a big deal. Keep a copy of the Ids as keys within the collection, and keep them as fields on the objects themselves.

Optimizations like this should only be entertained when you have measured a performance issue related to CPU usage, or the capacity of available disk or RAM storage — and the key here is a measured performance problem.

This sounds like premature optimization. Go for the solution that is easiest to implement and maintain.

As explained in Greg's answer, optimization-wise, there is nothing to worry about.

Now, regarding the maintainability problem (i.e. having to keep two copies of the same data in sync), a solution would be to use references to the object's ID/name field as keys to the associative container, if the language allows it.

Note that since strings are usually immutable objects, one may need to wrap them in a mutable object and store a reference to that instead. The Python example shown in the question is wrong; this would be a possible correct version:

# class of objects that need ID or name
class MyClass:
    # mutable class to wrap the immutable string value
    class Name:
        def __init__(self, value: str):
            self.value = value
        
        # enable direct string comparisons
        def __eq__(self, other): return self.value == other
        def __le__(self, other): return self.value < other

    def __init__(self, name: str):
        self._name = MyClass.Name(name)
    
    @property
    def name(self) -> str:
        return self._name.value

    @name.setter
    def name(self, value: str):
        self._name.value = value


# associative container from names to objects
my_instances = []

# add an object
obj = MyClass("Bob")
my_instances.append((obj._name, obj))  # use wrapper object as key

# retrieve an object
print(next(o for n, o in my_instances if n == "Bob"))

# change an object's name
obj.name = "Tony"
print(next(o for n, o in my_instances if n == "Tony"))  # works fine

Licensed under: CC-BY-SA with attribution

Not affiliated with softwareengineering.stackexchange