Rails rake parallelization thresholds and caveats

Question

`rake` instances

Every time you run rake, you are running a new instance of your ruby server, with all associated memory and related load-dependency usages. Look in your Rakefile for the inits.
- your number of instances in limited by memory and CPU used
- you must profile each memory and CPU to know how many can be run
- you could write a program to monitor and calculate what's possible, but heuristics will work better for one-off, and first experiments.

heuristically explore your database capacity, too.
- watch for write-locks that create blocking
- watch for slow reads due to missing indices
- look at your postgres configs to see concurrency limits, cache size, etc.

each rake task is its own ruby server, so multiple active_record.save actions impacts:
- blocking/waiting due to write-locking
- one instance getting 'old' data that was read prior to another's update .save

the number of records (7MM) is just a multiplier for all of the operations that occur upon each record. The operational complexity is the source of limitation, since theoretically, running 7MM workers would solve the problem in the minimum timescale
if 180hr is accurate (dubious), then (180 * 60 * 60 * 1000) / 7000000 == 92.57 ms per process.
Look for any shared-resource that is an IO blocker.
look for any common calculation that you can do in advance and cache. A lookup beats a calc.

You may want to consider beanstalk instead, but my guess is you'll find that more complicated than learning all these good foundations, first.