simple python script using too much cpu

https://stackoverflow.com/questions/21590887

07-10-2022
|

Question

I was recently told off by my vps as my python script was using too much cpu (apparently the script was utilising the entire core for a few hours).

my script uses the twython library to stream tweets

def on_success(self, data):

    if 'text' in data:
        self.counter += 1
        self.tweetDatabase.save(Tweet(data))

        #we only want to commit when we have a batch
        if self.counter >= 1000:
            print("{0}: commiting {1} tweets".format(datetime.now(), self.counter))
            self.counter = 0
            self.tweetDatabase.commit()

Tweet is a class that's job is to throw away meta data about the tweet I do not need:

class Tweet():

    def __init__(self, json):

        self.user = {"id" : json.get('user').get('id_str'), "name" : json.get('user').get('name')}
        self.timeStamp = datetime.datetime.strptime(json.get('created_at'), '%a %b %d %H:%M:%S %z %Y')
        self.coordinates  = json.get('coordinates')
        self.tweet = {
                        "id" : json.get('id_str'),
                        "text" : json.get('text').split('#')[0],
                        "entities" : json.get('entities'),
                        "place" :  json.get('place')
                     }

        self.favourite = json.get('favorite_count')
        self.reTweet = json.get('retweet_count')

it also has a __str__ method that will return a super compact string representation of the object

the tweetDatabase.commit() just saves the tweets to a file while the tweetDatabase.Save() just saves the tweet to a list:

def save(self, tweet):
    self.tweets.append(tweet.__str__())

def commit(self):
    with open(self.path, mode='a', encoding='utf-8') as f:
        f.write('\n'.join(self.tweets))

    self.tweets = []

whats the best way to keep the cpu low? if I sleep I will be losing tweets as that will be time the program is spent not listening to twitters api. Dispite this I tried sleeping for a second after the program writes to file however this did nothing to bring the cpu down. For record saving to file every 1000 tweets is just over once a Minute.

many thanks

Solution

Try checking if you need to commit first in on_success(). Then, check if the tweet has data you want to save. You also might want to consider race conditions on the self.counter variable, and should probably have the update to the self.count be wrapped in a mutex or something similar.

OTHER TIPS

You can try profiling your program with

import cProfile
command = """<whatever line that starts your program>"""
cProfile.runctx( command, globals(), locals(), filename="OpenGLContext.profile" )

and then viewing the OpenGLContext.profile with RunSnakeRun (http://www.vrplumber.com/programming/runsnakerun/)

The bigger a block is, the more CPU time that function takes. This will help you to locate exactly which part of your program is taking a lot of CPU

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow