To prevent this from being closed, I have narrowed my question to just the bash script.
EDITED QUESTION
I run a small network and made a mistake in a backup routine. I have rsync
running daily, and how it is set up is that if a folder is renamed on the source, then potential duplication on the backup device can occur.
rsync -varz --no-perms --exclude-from=/path/to/exclude_file --log-file=/path/to/rsync_logs
Recently a user made quite a few changes, and it resulted in a lot of duplication.
What kind of bash script strategies can I use to attack this? I've tried listing recursively and outputting to files and using diff
to compare these. This has lead me to see the impact of the duplication problem. If I could use some kind of automated process to remove the duplicates, that would save me loads of time.
I started by trying something like this:
find /mnt/data/ -maxdepth 2 -mindepth 1 -type d -printf '%f\n' > data.txt
and comparing to:
find /mnt/backup/ -maxdepth 2 -mindepth 1 -type d -printf '%f\n' > backup.txt
An example of my problem is this:
drwxr-xr-x 0 bob staff 0 Jun 25 2009 7-1-08
drwxr-xr-x 0 bob staff 0 Jun 25 2009 2008-07-01
This is an example from the backup drive, and the two directories are identical in their contents. The backup contains both and the source has only this one:
drwxr-xr-x 0 bob staff 0 Jun 25 2009 2008-07-01
This kind of issue is all throughout the backup drives.
EDIT
I created two lists and diff
ed them and then went through manually and reconciled the changes. It wasn't as bad as I originally thought, once I got into it. I gave +1s to both answers here (@Mark Pettit and @ebarrere), because I did end up using pieces from each answer. I ran several find commands in the course of this experiment and I ended up altering my rsync
script as well, to be more specific. Thank you guys.