I finally figured it out. Since all data is in NDB datastore I wasn't really able to test everything locally, so I found logging.info("Memory Usage: %s", runtime.memory_usage().current())
extremely helpful. (Import with from google.appengine.api import runtime
).
The problem is the "In-Context Cache": query results are written back to the in-context cache. More information. See an example to disable the In-Context Cache for an Entity Kind.
My calculation was slightly wrong though. A generated CVS file is about 300 MB big. It is generated/ saved to Google Cloud Storage within 5 minutes.
Peak memory consumption was about 480MB.
In comparison, with an added gc.collect()
in the while True:
loop (link) as suggested by @brian in the comment above, the memory consumption peak was about 260MB. But it took quite long, about 20 minutes.