Fixed strptime exception with thread lock, but slows down the program
-
30-09-2019 - |
Pergunta
I have the following code, which when is running inside of a thread (the full code is here - https://github.com/eWizardII/homobabel/blob/master/lovebird.py)
for null in range(0,1):
while True:
try:
with open('C:/Twitter/tweets/user_0_' + str(self.id) + '.json', mode='w') as f:
f.write('[')
threadLock.acquire()
for i, seed in enumerate(Cursor(api.user_timeline,screen_name=self.ip).items(200)):
if i>0:
f.write(", ")
f.write("%s" % (json.dumps(dict(sc=seed.author.statuses_count))))
j = j + 1
threadLock.release()
f.write("]")
except tweepy.TweepError, e:
with open('C:/Twitter/tweets/user_0_' + str(self.id) + '.json', mode='a') as f:
f.write("]")
print "ERROR on " + str(self.ip) + " Reason: ", e
with open('C:/Twitter/errors_0.txt', mode='a') as a_file:
new_ii = "ERROR on " + str(self.ip) + " Reason: " + str(e) + "\n"
a_file.write(new_ii)
break
Now without the thread lock I generate the following error:
Exception in thread Thread-117: Traceback (most recent call last): File "C:\Python27\lib\threading.py", line 530, in __bootstrap_inner
self.run() File "C:/Twitter/homobabel/lovebird.py", line 62, in run
for i, seed in enumerate(Cursor(api.user_timeline,screen_name=self.ip).items(200)): File "build\bdist.win-amd64\egg\tweepy\cursor.py", line 110, in next
self.current_page = self.page_iterator.next() File "build\bdist.win-amd64\egg\tweepy\cursor.py", line 85, in next
items = self.method(page=self.current_page,
*self.args, **self.kargs) File "build\bdist.win-amd64\egg\tweepy\binder.py", line 196, in _call
return method.execute() File "build\bdist.win-amd64\egg\tweepy\binder.py", line 182, in execute
result = self.api.parser.parse(self, resp.read()) File "build\bdist.win-amd64\egg\tweepy\parsers.py", line 75, in parse
result = model.parse_list(method.api, json) File "build\bdist.win-amd64\egg\tweepy\models.py", line 38, in parse_list
results.append(cls.parse(api, obj)) File "build\bdist.win-amd64\egg\tweepy\models.py", line 49, in parse
user = User.parse(api, v) File "build\bdist.win-amd64\egg\tweepy\models.py", line 86, in parse
setattr(user, k, parse_datetime(v)) File "build\bdist.win-amd64\egg\tweepy\utils.py", line 17, in parse_datetime
date = datetime(*(time.strptime(string, '%a %b %d %H:%M:%S +0000 %Y')[0:6])) File "C:\Python27\lib\_strptime.py", line 454, in _strptime_time
return _strptime(data_string, format)[0] File "C:\Python27\lib\_strptime.py", line 300, in _strptime
_TimeRE_cache = TimeRE() File "C:\Python27\lib\_strptime.py", line 188, in __init__
self.locale_time = LocaleTime() File "C:\Python27\lib\_strptime.py", line 77, in __init__
raise ValueError("locale changed during initialization") ValueError: locale changed during initialization
The problem is with thread lock on, each thread runs itself serially basically, and it takes way to long for each loop to run for there to be any advantage to having a thread anymore. So if there isn't a way to get rid of the thread lock, is there a way to have it run the for loop inside of the try statement faster?
Solução
According to a previous Answer on StackOverflow, time.strptime
is not thread-safe. Unfortunately, the error referenced in that question is different than the error you're experiencing.
Their solution was to call time.strptime
prior to initializing any threads, and then subsequent calls to time.strptime
in various threads will work.
I think the same solution may work in your situation after reviewing the _strptime
and locale
standard library modules. I can't be certain it will work since I can't test your code locally, but I thought I'd provide you with a potential solution.
Let me know if this works.
Edit:
I've done a bit more research and the Python standard library is calling setlocale
in the locale.h
C header file. According to the setlocale documentation, this is not thread-safe and that calls to setlocale
should occur before initializing threads as I mentioned previously.
Unfortunately, setlocale
is called each time you call time.strptime
. So, I suggest the following:
- Test out the solution laid out earlier, try calling
time.strptime
before initializing the threads and remove the locks. - If #1 doesn't work, you'll probably need to roll your own
time.strptime
function that is thread-safe as mentioned in the Python documentation for the locale module.
Outras dicas
The problem you are running into is related to missing thread safety of the used functions and modules.
As you can see here, tweepy is not re-entrant nor thread safe. As you can see here, Python's LocaleTime
is not too.
For a multi-threaded application like yours, wrap the tweepy API through your own class that is synchronized (RLock'ed). But do not derive from the tweepy class, make a has-a relationship with a private attribute to tweepy instance.