Question

I have the following command so far, just a little stuck on the next bit.

comm -23 <( find /dir1/report_dir2/dir3/2013* -name *\*MyFile* -exec basename {} \; | sort | uniq ) <( find /dir0/dir1/dir2/loadedreports/archive* -name *\*MyFile* -exec basename {} \; | sort | uniq ) > /home/Ben10/list.txt

Directory 1

The files in /dir1/report_dir2/dir3/2013* are csv files that may or may not have a .gz extension in their name, unzipping them would be out of the question as they're up to a gig each, and i've thousands of them.

i.e they'll look like MyFile20130618073529.csv or MyFile20130618073529.csv.gz

Directory 2

The files in /dir0/dir1/dir2/loadedreports/archive* have been loaded to a BI system, and will all end in .csv,

however, they will also be preceeded by dates of when they were loaded,

i.e. 2013-11-06_MyFile20130618073529.csv

I'm loading them into a BI DB from these csv files, and to check which ones i've loaded I need to check which ones are in /dir1/report_dir2/dir3/2013* but not in dir0/dir1/dir2/loadedreports/archive*

is there any way to disregard the .gz, and the 2013-11-06_ ? note, the 2013-11-06_ can be any date preceeding MyFile.

Thanks a million, any input greatly appreciated.

Was it helpful?

Solution

Try following

comm -23 <( find /dir1/report_dir2/dir3/2013* -name '*MyFile*' | perl -pe 's/.*(MyFile[^.]*\.csv)(\.gz)?$/$1/' | sort -u ) <( find /dir0/dir1/dir2/loadedreports/archive* -name '*MyFile*' | perl -pe 's/.*(MyFile[^.]*\.csv)$/$1/' | sort -u ) > /home/Ben10/list.txt

Here idea is to use perl search and replace, in place of basename, on found full file names to get desired basenames discarding .gz suffix and <date>_ prefix


To make above one liner more readable, I would prefer splitting it as follows

find /dir1/report_dir2/dir3/2013* -name '*MyFile*' | perl -pe 's/.*(MyFile[^.]*\.csv)(\.gz)?$/$1/' | sort -u > /home/Ben10/di1_list.txt

find /dir0/dir1/dir2/loadedreports/archive* -name '*MyFile*' | perl -pe 's/.*(MyFile[^.]*\.csv)$/$1/' | sort -u > /home/Ben10/di2_list.txt

comm -23 /home/Ben10/di1_list.txt /home/Ben10/di2_list.txt > /home/Ben10/list.txt

rm /home/Ben10/di1_list.txt /home/Ben10/di2_list.txt
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top