Best way to store fixed-keys key:value datasets that are accessed by keys in python?

https://stackoverflow.com/questions/14301511

15-01-2022
|

문제

What I want is to be able to handle sets of data that have a fixed set of keys. All keys are strings. The data will never be edited. I know this can be done with normal dicts like so:

data_a = {'key1': 'data1a', 'key2': 'data2a', 'key3': 'data3a'}
data_b = {'key1': 'data1b', 'key2': 'data2b', 'key3': 'data3b'}
data_c = {'key1': 'data1c', 'key2': 'data2c', 'key3': 'data3c'}

They must be able to be called like so:

data_a['key1'] # Returns 'data1a'

However, this looks to be a waste of memory (since dictionaries apparently keep themselves 1/3 empty or something like that, along with also storing the keys multiple times) and also tedious to create as well since I need to keep entering the same keys over and over again in my code. I also risk accidentally changing something in the datasets.

My current solution is to have a set of keys stored in a tuple first, then store the data as tuples too. It looks like this:

keys = ('key1', 'key2', 'key3')
data_a = ('data1a', 'data2a', 'data3a')
data_b = ('data1b', 'data2b', 'data3b')
data_c = ('data1b', 'data2c', 'data3c')

To retrieve data, I would do this:

data_a[keys.index('key1')] # Returns 'data1a'

Then, I learned about this thing called namedtuples which seem to be able to do what I needed:

import collections
Data = collections.namedtuple('Data', ('key1', 'key2', 'key3'))
data_a = Data('data1a', 'data2a', 'data3a')
data_b = Data('data1b', 'data2b', 'data3b')
data_c = Data('data1b', 'data2c', 'data3c')

However, it appears I can't simply call the value by the key. Instead, to retrieve the data by the key, I have to use getattr, which doesn't seem very intuitive:

getattr(data_a,'key1') # Returns 'data1a'

My criteria is for memory efficiency first, then performance efficiency. Of these 3 methods, which would be the best way to do things? Or am I missing something and there's a more pythonic idiom to get what I want?

EDIT: I've now recently also learned about the existence of __slots__, which apparently runs more efficiently for key:value pairs while pretty much consuming the same(?) amount of memory. Would an implementation acting similar to this be a suitable alternative to namedtuples?

해결책

Yes, __slots__ should do.

class Data:
    __slots__ = ["key1", "key2"]

    def __init__(self, k1, k2):
        self.key1, self.key2 = k1, k2

    def __getitem__(self, key):
        if key not in self.__slots__:
            raise KeyError("%r not found" % key)
        return getattr(self, key)

Let's try that out:

>>> Data(1, 2)["key1"]
1

The conditional on key not in self.__slots__ is a sanity check; getattr would happily fetch __init__ for us if it weren't present.

다른 팁

namedtuple seems the right thing to use. If your "keys" are fixed, you don't need getattr and can use the normal syntax for retrieving objects' attributes:

In [1]: %paste
import collections
Data = collections.namedtuple('Data', ('key1', 'key2', 'key3'))
data_a = Data('data1a', 'data2a', 'data3a')
data_b = Data('data1b', 'data2b', 'data3b')
data_c = Data('data1b', 'data2c', 'data3c')

## -- End pasted text --

In [2]: data_a.key1
Out[2]: 'data1a'

This usage is also demonstrated in the docs:

>>> # Basic example
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(11, y=22)     # instantiate with positional or keyword arguments
>>> p[0] + p[1]             # indexable like the plain tuple (11, 22)
33
>>> x, y = p                # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y               # fields also accessible by name
33
>>> p                       # readable __repr__ with a name=value style
Point(x=11, y=22)

You don't usually use getattr if the second argument (attribute name) is constant. It's only needed if it may change:

In [3]: attr = input('Attribute: ')
Attribute: key3

In [4]: getattr(data_b, attr)
Out[4]: 'data3b'

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow