Question

I'm trying to compare a remote md5sum result from some files into a server, with my local md5sum files and those that match, the hash and the filename should be removed from the local server.

The whole algorithm about get the md5sum from both is done, I have something like this:

remote_list="<hash values> <filename>.gz"
local_list="<hash values> <filename>.gz"

But now I need to do the comparison between what have into both lists. I was thinking in do two for's but I wonder if this is a good approach (and a efficient one).

So far I did this:

#!/bin/bash
datacenter="amazon"
hostname=`hostname`;
path="backup/server245"

s3=`s3cmd ls --list-md5 s3://company-backup/company/"$datacenter"/"$hostname"/"$path"/`;
s3_list=$(echo "$s3" | tr -s ' ' | cut -d ' ' -f 4,5 | sed 's= .*/= =');
echo "$s3_list"

locally=`md5sum /"$path"/*.gz`;
echo "$locally";

locally_list=$(echo "$locally" | sed 's= .*/= =');
echo "$locally_list";

Which give me this output:

d41d8cd98f00b204e9800998ecf8427e #md5 from remote folder
41eae9b40d23de2f02bf07635870f6d0 app.20121117040001.gz #remote file
541b1bf78682f48867cc99dbb53c4c3a app.20121118040001.gz #remote file
31d90af7969f5003b27f68e27e7f2cb1 app.gz #remote file
31d90af7969f5003b27f68e27e7f2cb1  /backup/server245/app.gz #local file

So follow the idea, I have app.gz in both places, so I can delete it from my local my machine. Any idea or suggestions ?

Was it helpful?

Solution

If you consider a match only if both the md5sum and the filename are the same, then it's simple:

sort remote_list local_list | uniq -d > duplicate_list

(Important note: this assumes that there are no repeats in either of the file lists. There certainly shouldn't be if you've done the md5sums correctly.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top