I had exactly the same problem. If you search boto on GitHub, you will see, we are not alone.
There's also a known accepted issue: https://github.com/boto/boto/issues/2207
Reaching performance limits of AWS S3
The truth is, that we got so used to boto and AWS S3 service, we have forgotten, these are really distributed systems, which might break in some cases.
I was archiving (download, tar, upload) huge number of files (about 3 years with around 15 feeds each having about 1440 versions a day) and using Celery to do this faster. And I have to say, that I was sometime getting these errors more often, probably reaching performance limits of AWS S3. These errors were often appearing in chunks (in my case I was uploading about 60 Mbps for couple of hours).
Training S3 performance
When I was measuring performance, it was "trained". After some hour, the responsiveness of S3 bucket jumped up, AWS have probably detected higher load and spin up some more instances serving it.
Try latest stable version of boto
Other thing is, that boto
is trying to retry in many cases, so many failures are hidden to our calls. Sometime I got a bit better with upgrading to the latest stable version.
My conclusion are:
- try upgrading to the latest stable
boto
- when error rate grows up, lower the pressure
- accept the fact, that AWS S3 is distributed service having rare performance problems
In your code, I would definitely recommend adding some sleep, (at least 5, but 30 s would seem fine to me), otherwise you are just pushing harder and harder to a system, which might be in shaky situation at the moment.