I am using the following script to create some rss snapshots (just saying).
The script runs on a backend and I am having some very heave ever increasing memory consumption.
class StartHandler(webapp2.RequestHandler):
@ndb.toplevel
def get(self):
user_keys = User.query().fetch(1000, keys_only=True)
if not user_keys:
return
logging.info("Starting Process of Users")
successful_count = 0
start_time = time.time()
for user_key in user_keys:
try:
this_start_time = time.time()
statssnapshot = StatsSnapShot(parent=user_key,
property=get_rss(user_key.id())
)
#makes a urlfetch
statssnapshot.put_async()
successful_count += 1
except:
pass
logging.info("".join(("Processed: [",
str(successful_count),
"] users after [",
str(int(time.time()-start_time)),
"] secs")))
return
EDIT
Here is also the rss functions lets say:
def get_rss(self, url):
try:
result = urlfetch.fetch(url)
if not result.status_code == 200:
logging.warning("Invalid URLfetch")
return
except urlfetch.Error, e:
logging.warning("".join("Fetch Failed to get ",url," with",e))
return
content = result.content #Around 500 - 200KB
reobj = re.compile(r'(?<=")[0-9]{21}(?=")')
user_ids = reobj.findall(content)
user_ids = set(user_ids)#set to fail if something is not unique
return user_ids
The script runs ok, but as the Users become more the script consumes more and more memory.
Coming from C I don't know how to manipulate memory and variables in Python that efficient.
For example I know that if a varible in python is not referenced again the garbage collector frees the memeory used for that variable, but then what seems to be my case and where am I doing it wrong?
How can I optimize this script not to have an ever increasing memory usage, but only consume the memory required for each user process?