I'm looking to optimize my read operations in my GAE python app. I don't want to go over my free quota. I'm basically storing data every so often. A lot of the data i'm getting might be duplicated so i have to check it before i store it. This results in a lot of read ops and some write ops. Here is how i'm doing it now:

#data is a JSON data list with hundreds of items 
for item in data:
  record = InfoDB.get_by_id(item['id'])
  if record:
     continue 
  else:
     entity = InfoDB(id=item['id'], data=item['data']).put()

Here is one way i thought of lowering the read ops. Though i'm not 100% sure if that's true. I'm thinking it may perform a read op every time the loop iterates.

#data is a JSON data list with hundreds of items
flag = False
db = InfoDB.query().fetch()
for item in data:
  for record in db:
    if record.id == item.id:
      flag = True

  if flag is True:
    continue
  else:
    entity = InfoDB(id=item['id'], data=item['data']).put() 

Is the above method actually saving me read operations since it's essentially just grabbing the entire datastore and then using a for loop to process the entire set every iteration? I realize this is slower but i don't see how else i could accomplish this efficiently.

Any other ideas?

EDIT:

Just to clarify, this is using NDB. Not the older DB.

有帮助吗?

解决方案 2

Your proposed method will result in many more read operations, not less, because now you read all entities, whether you need them or not.

This is how you can optimize it, if you can override the existing entities:

for item in data:
    InfoDB(id=item['id'], data=item['data']).put()

If you cannot override the existing entities, you should use a keys-only query:

for key in query.iter(keys_only=True):

Keys-only queries are now free, as opposed to fetching complete entities.

其他提示

If you know all the keys, do a entities = db.get([list of keys]) or entities = ndb.get_multi([list of keys])- which from your sample you do know all the id's.

This is far more efficient.

Then do a db.put(entities) or ndb.put_multi(entities)

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top