Question

Is it possible to specify a time range so that rsync only operates on recently changed files.

I'm writing a script to backup recently added files over SSH and rsync seems like an efficient solution. My problem is that my source directories contain a huge backlog of older files which I have no interest in backing up.

The only solution I've come across so far is doing a find with ctime to generate a --files-from file. This works, but I have to deal with some old installations with versions of rsync that don't support --files-from. I'm considering generating --include-from patterns in the same way but would love to find something more elegant.

Was it helpful?

Solution

It looks like you can specify shell commands in the arguments to rsync (see Remote rsync executes arbitrary shell commands)

so I have been able to successfully limit the files that rsync looks at by using:

rsync -av remote_host:'$(find logs -type f -ctime -1)' local_dir

This looks for any files changed in the last day (-ctime -1) and then rsyncs those into local_dir.

I'm not sure if this feature is by design but I'm still digging into the documentation.

OTHER TIPS

Why not just take the heat on backing up the whole directory once and take advantage of the incremental backing up provided by rsync and rdiff and its cousins, you won't waste diskspace where they are backed up to because they'll be perpetually unchanged.

Backing up the whole thing is simpler, and has substantially less risk for errors. Trying to selectively backup some files and not others is a recipe for not backing up what you need without realizing it, then getting burned when you can't restore a critical file.

Otherwise you should reorganize your source directory so there is less 'decision making' in your backup script.

How about creating a temporary directory, symlinking or hardlinking the files in, then rsyncing that?

May I suggest you drop rsync and look at rdiff-backup?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top