Question

I have two ec2 instances, webserver and production. I need production to copy small files (2-5kb) to and from webserver, while webserver can have no access to production. I'm using ubuntu 12.04. At the moment I'm getting a 4-5 second delay using rsync over ssh. Is there a faster way to do this? I can live with a 1 second delay.

I'm considering starting instances in a vpc, and going through a networking procedure, but I'm not sure if that will be fast enough - there are also very few tutorials on this. Alternatively, I've started reading around nfs, but again I'm not sure if it will do the trick.

Was it helpful?

Solution

There are several possible reasons for your initial rsync delay

  • rsync does a survey on both sides to determine what is different before any data updates are done. This can take a while if you have some large data chunks or lots of directory entries. This is especially an issue if you have --checksum enabled, which does a full content checksum to check for differences.
  • rsync is generally used with SSH, which can have delays due to DNS lag and timeouts, so you might check to be sure your DNS records for both hosts have forward (A) and reverse (PTR) records, and that DNS is functions on both ends, or that the hosts are each known to each other through /etc/hosts or the like.

Make sure you test SSH connectivity for lag first, assuming you're using SSH as the transport mechanism for rsync (the default), with SSH keys in the ~/.ssh/authorized_keys file on the target side. If so, you should check that file as well to see if the record it's using involves a wrapper script with its own lag issues - this can be a surprise if someone else wrote it and you're the one troubleshooting it.

A separate problem is whether you should consider writing some code to make the delay irrelevant. Even a solid second doing actual updating can compromise things, and rsynced directories can easily grow in dynamic content and thus require more update time later on. In prior companies, we've occasionally had to maintain different hierarchies of code (two, say) and do rsync on the not-in-use one, then switch over. This may not apply to your situation, of course (similar issues can show up in git deployment updates and so on if there are scripting languages that run off of still-open source files, like bash tends to).

Timing (with time ...) for a test here on a tiny directory on a local network shows:

sent 160 bytes  received 13 bytes  115.33 bytes/sec
total size is 3455  speedup is 19.97

real    0m0.499s
user    0m0.008s
sys     0m0.000s

strace can let you see where the time goes:

strace -tt -f -o /tmp/log  rsync -avz  ....

On mine, it mostly looks like small amounts of delay awaiting feedback from the target hosts, roughly as I'd have expected.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top