Pergunta

What I want is to be able to handle sets of data that have a fixed set of keys. All keys are strings. The data will never be edited. I know this can be done with normal dicts like so:

data_a = {'key1': 'data1a', 'key2': 'data2a', 'key3': 'data3a'}
data_b = {'key1': 'data1b', 'key2': 'data2b', 'key3': 'data3b'}
data_c = {'key1': 'data1c', 'key2': 'data2c', 'key3': 'data3c'}

They must be able to be called like so:

data_a['key1'] # Returns 'data1a'

However, this looks to be a waste of memory (since dictionaries apparently keep themselves 1/3 empty or something like that, along with also storing the keys multiple times) and also tedious to create as well since I need to keep entering the same keys over and over again in my code. I also risk accidentally changing something in the datasets.

My current solution is to have a set of keys stored in a tuple first, then store the data as tuples too. It looks like this:

keys = ('key1', 'key2', 'key3')
data_a = ('data1a', 'data2a', 'data3a')
data_b = ('data1b', 'data2b', 'data3b')
data_c = ('data1b', 'data2c', 'data3c')

To retrieve data, I would do this:

data_a[keys.index('key1')] # Returns 'data1a'

Then, I learned about this thing called namedtuples which seem to be able to do what I needed:

import collections
Data = collections.namedtuple('Data', ('key1', 'key2', 'key3'))
data_a = Data('data1a', 'data2a', 'data3a')
data_b = Data('data1b', 'data2b', 'data3b')
data_c = Data('data1b', 'data2c', 'data3c')

However, it appears I can't simply call the value by the key. Instead, to retrieve the data by the key, I have to use getattr, which doesn't seem very intuitive:

getattr(data_a,'key1') # Returns 'data1a'

My criteria is for memory efficiency first, then performance efficiency. Of these 3 methods, which would be the best way to do things? Or am I missing something and there's a more pythonic idiom to get what I want?

EDIT: I've now recently also learned about the existence of __slots__, which apparently runs more efficiently for key:value pairs while pretty much consuming the same(?) amount of memory. Would an implementation acting similar to this be a suitable alternative to namedtuples?

Foi útil?

Solução

Yes, __slots__ should do.

class Data:
    __slots__ = ["key1", "key2"]

    def __init__(self, k1, k2):
        self.key1, self.key2 = k1, k2

    def __getitem__(self, key):
        if key not in self.__slots__:
            raise KeyError("%r not found" % key)
        return getattr(self, key)

Let's try that out:

>>> Data(1, 2)["key1"]
1

The conditional on key not in self.__slots__ is a sanity check; getattr would happily fetch __init__ for us if it weren't present.

Outras dicas

namedtuple seems the right thing to use. If your "keys" are fixed, you don't need getattr and can use the normal syntax for retrieving objects' attributes:

In [1]: %paste
import collections
Data = collections.namedtuple('Data', ('key1', 'key2', 'key3'))
data_a = Data('data1a', 'data2a', 'data3a')
data_b = Data('data1b', 'data2b', 'data3b')
data_c = Data('data1b', 'data2c', 'data3c')

## -- End pasted text --

In [2]: data_a.key1
Out[2]: 'data1a'

This usage is also demonstrated in the docs:

>>> # Basic example
>>> Point = namedtuple('Point', ['x', 'y'])
>>> p = Point(11, y=22)     # instantiate with positional or keyword arguments
>>> p[0] + p[1]             # indexable like the plain tuple (11, 22)
33
>>> x, y = p                # unpack like a regular tuple
>>> x, y
(11, 22)
>>> p.x + p.y               # fields also accessible by name
33
>>> p                       # readable __repr__ with a name=value style
Point(x=11, y=22)

You don't usually use getattr if the second argument (attribute name) is constant. It's only needed if it may change:

In [3]: attr = input('Attribute: ')
Attribute: key3

In [4]: getattr(data_b, attr)
Out[4]: 'data3b'
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top