I am using Sidekiq and Redis To Go on a production site hosted on Heroku. I am spinning up multiple Sidekiq workers to do a job for me. Out of 600 workers, I got down to about 180 workers left before my workers got "stuck". They attempt to do a job, and I get one of two errors back:
WARN: {"retry"=>true, "queue"=>"default", "class"=>"F9LoadRecordWorker", "args"=>[25126], "jid"=>"0426e1db817e27986da6b636", "enqueued_at"=>1395332988.09929, "error_message"=>"Connection reset by peer - SSL_connect", "error_class"=>"Errno::ECONNRESET", "failed_at"=>1395337905.5061884, "retry_count"=>0}
or
WARN: {"retry"=>true, "queue"=>"default", "class"=>"F9LoadRecordWorker", "args"=>[25131], "jid"=>"79601ea488efc10f1fbcc433", "enqueued_at"=>1395332988.1172419, "error_message"=>"Connection refused - connect(2)", "error_class"=>"Errno::ECONNREFUSED", "failed_at"=>1395338127.4794347, "retry_count"=>1, "retried_at"=>1395338202.905867}
So the actual errors are either Connection reset by peer - SSL_connent or Connection refused - connect(2).
What is causing this? Why would 400~ workers succeed and then the last 200~ get stuck in this loop of retrying and getting continuous errors?