Filtering a List of Objects by multiple Object Attributes

https://stackoverflow.com/questions/22947265

30-06-2023
|

题

I am building a Google App Engine application (Python 2.7) using the NDB API. I am new to python development and have a feeling this is a question that has been answered before, but I have been unable to find, through my search efforts, something resembling this problem/solution. I decided to pose my question here.

I have a Document model class that I need to query and get the most "current" Documents. Specifically, I want to get a list of document objects(entities) with distinct document names and whose expiration date (a datetime.date object) is the greatest value.

For example, a query of documents in descending order by expiration date such as:

documents = Document.query().order(-Document.expiration).fetch()

returns:

[{"name": "DocumentC", "expiration": datetime.date(2015, 3, 1)},
 {"name": "DocumentA", "expiration": datetime.date(2014, 4, 1)},
 {"name": "DocumentB", "expiration": datetime.date(2014, 2, 15)},
 {"name": "DocumentA", "expiration": datetime.date(2014, 1, 1)}]

Based on these query results, I want to remove the second (older) occurrence of "DocumentA" and get something like this:

[{"name": "DocumentC", "expiration": datetime.date(2015, 3, 1)},
 {"name": "DocumentA", "expiration": datetime.date(2014, 4, 1)},
 {"name": "DocumentB", "expiration": datetime.date(2014, 2, 15)}]

My solution is:

def current_docs(docs):
    output = []
    for d in docs:
        if not any(o['name'] == d['name'] for o in output):
            output.append(d)
    return output

cd = current_docs(documents)
# returns:
# [{'expiration': datetime.date(2015, 3, 1), 'name': 'DocumentC'},
# {'expiration': datetime.date(2014, 4, 1), 'name': 'DocumentA'},
# {'expiration': datetime.date(2014, 2, 15), 'name': 'DocumentB'}]

This seems to give me the result I expect, but:

Is there a better way to filter the original query to get the results I want from the start?
If not, is there a better, more efficient approach than my solution?

解决方案

My approach at your second question:

def current_docs(docs):
  tmp = {}
  output = []
  for d in docs:
    if d['name'] in tmp:
      continue
    tmp[d['name']] = 1
    output.append(d)
  return output

Keep a dictionary of already added names and add only those that have not yet been added. Don't know anything about Google App Engine though :)

其他提示

Provided your data meets the restrictions noted in the documentation, you should be able to use a projection query and group_by=["name"] and distinct=True to accomplish this.

Docs: https://developers.google.com/appengine/docs/python/ndb/queries#projection

Alternatively, I'd recommend saving data into a precomputed table that contains only unique document names and the latest data/status for it. You incur additional cost at write time, but you get fast reads and don't have to rely on unfiltered data set fitting into the instance memory, which is required if you intend to filter at run time.

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow