rake
instances
- Every time you run rake, you are running a new instance of your ruby server, with all associated memory and related load-dependency usages. Look in your Rakefile for the inits.
- your number of instances in limited by memory and CPU used
- you must profile each memory and CPU to know how many can be run
- you could write a program to monitor and calculate what's possible, but heuristics will work better for one-off, and first experiments.
datastore
- heuristically explore your database capacity, too.
- watch for write-locks that create blocking
- watch for slow reads due to missing indices
- look at your postgres configs to see concurrency limits, cache size, etc.
.save
- each rake task is its own ruby server, so multiple active_record.save actions impacts:
- blocking/waiting due to write-locking
- one instance getting 'old' data that was read prior to another's update
.save
operational complexity
- the number of records (7MM) is just a multiplier for all of the operations that occur upon each record. The operational complexity is the source of limitation, since theoretically, running 7MM workers would solve the problem in the minimum timescale
- if 180hr is accurate (dubious), then
(180 * 60 * 60 * 1000) / 7000000
==92.57 ms
per process. - Look for any shared-resource that is an IO blocker.
- look for any common calculation that you can do in advance and cache. A lookup beats a calc.
errata
- leave headroom for base OS processes. These will vary by your environment, but you mention AWS but best to conceptually learn how to monitor any system for activity
- run
top
in a separate screen / terminal as the rakes are running. - Prefer to run 2 tops in different screens. sort 1 by memory, sort the other by CPU
- have a way to monitor the rakes
- watch for events that bubble up the
top
processes. - if you do this long / well enough, you've profiled you headroom
- run
- run more rakes to fill your headroom
- don't overrun your memory or you'll get swapping
You may want to consider beanstalk instead, but my guess is you'll find that more complicated than learning all these good foundations, first.