perl File::Find - delete files with certain conditions, then delete parent folder if empty

StackOverflow https://stackoverflow.com/questions/22361717

  •  13-06-2023
  •  | 
  •  

Question

I am attempting to use File::Find to 1) go thru a given folder and subfolders, deleting any files that are older than 30 days, and b) if the parent folder is empty after all the deletions, also delete it.

Here is my code:

use strict;
use warnings;
no warnings 'uninitialized';
use File::Find;
use File::Basename;
use File::Spec::Functions;

# excluding some home brew imports


# go into given folder, delete anything older than 30 days, and if folder is then empty,     delete it

my $testdir = 'C:/jason/temp/test';
$testdir =~ s#\\#/#g;

open(LOG, ">c:/jason/temp/delete.log");

finddepth({ wanted => \&myWanted, postprocess => \&cleanupDir }, $testdir);

sub myWanted {

   if ($_ !~ m/\.pdf$/i &&
       int(-M $_) > 30
      ) 
   {
      my $age = int(-M $_);
      my $path = $File::Find::name;
      print LOG "age : $age days - $path\n";
      unlink($path);

   }
}


sub cleanupDir {
   my $path = $File::Find::dir;
   if ( &folderIsEmpty($path) ) {
      print LOG "deleting : $path\n";
      unlink($path);
   } else {
      print LOG "$path not empty\n";
      my @files = glob("$path/*");
      foreach my $file(@files){
         print LOG "\t$file\n";
      }
   }

}

I had thought that finddepth() would go to the bottom of the tree and work its way up, but that didn't happen. The script, run on an unzip of some ebook contents, did not delete directories that had subfolders, even though all the files were deleted.

age : 54 days - C:/jason/temp/test/mimetype
age : 54 days - C:/jason/temp/test/META-INF/container.xml
age : 54 days - C:/jason/temp/test/META-INF/ncx.xml.kindle
deleting : C:/jason/temp/test/META-INF
age : 54 days - C:/jason/temp/test/OEBPS/content.opf
age : 54 days - C:/jason/temp/test/OEBPS/cover.html
age : 54 days - C:/jason/temp/test/OEBPS/ncx.xml
age : 54 days - C:/jason/temp/test/OEBPS/pagemap.xml
age : 54 days - C:/jason/temp/test/OEBPS/t01_00_text.html
age : 54 days - C:/jason/temp/test/OEBPS/t02_00_text.html
age : 54 days - C:/jason/temp/test/OEBPS/t03_00_text.html
age : 54 days - C:/jason/temp/test/OEBPS/t04_00_text.html
age : 54 days - C:/jason/temp/test/OEBPS/t05_00_text.html
age : 54 days - C:/jason/temp/test/OEBPS/t06_00_text.html
age : 54 days - C:/jason/temp/test/OEBPS/t07_00_text.html
age : 54 days - C:/jason/temp/test/OEBPS/t08_00_text.html
age : 54 days - C:/jason/temp/test/OEBPS/t08_01_text.html
age : 54 days - C:/jason/temp/test/OEBPS/media/cover.jpg
age : 54 days - C:/jason/temp/test/OEBPS/media/flamlogo.gif
age : 54 days - C:/jason/temp/test/OEBPS/media/logolnmb.jpg
age : 54 days - C:/jason/temp/test/OEBPS/media/stylesheet.css
deleting : C:/jason/temp/test/OEBPS/media
C:/jason/temp/test/OEBPS not empty
    C:/jason/temp/test/OEBPS/media
C:/jason/temp/test not empty
    C:/jason/temp/test/META-INF
    C:/jason/temp/test/OEBPS

looks like the C:/jason/temp/test/OEBPS/media/ was deleted, but that deletion was not registered by the time the preprocess func was called. Any ideas as to how to get this to work? thanks!

thanks, bp

Was it helpful?

Solution

As Miller has commented, you can't unlink a directory. Also, File::Find does a chdir into a node's containing directory before it calls wanted. That means that, in the postprocess subroutine, you are trying to remove your currently working directory. Windows won't like that.

I would write it like this. I have tested it, but you should obviously be very careful with anything that deletes the contents of your disk storage.

use strict;
use warnings;
use autodie;

use File::Find;
use File::Spec::Functions;

my $testdir = 'C:\jason\temp\test';

open my $log, '>', 'C:\jason\temp\delete.log';

finddepth(\&wanted, $testdir);

sub wanted {

  my $full_name = canonpath $File::Find::name;

  if (-f) {
    my $age  = int(-M);
    unless ( /\.pdf\z/ or $age <= 30) {
      print $log "Age: $age days - $full_name\n";
      unlink;
    }
  }
  elsif (-d) {
    my @contents = do {
      opendir my ($dh), $_;
      grep { not /\A\.\.?\z/ } readdir $dh;
    };
    rmdir unless @contents;
  }
}

OTHER TIPS

I suspect you aren't actually deleting the directory. From the documentation for unlink:

Note: unlink will not attempt to delete directories unless you are superuser and the -U flag is supplied to Perl. Even if these conditions are met, be warned that unlinking a directory can inflict damage on your filesystem. Finally, using unlink on directories is not supported on many operating systems. Use rmdir instead.

I never liked File::Find because it just is a mess. It swallows up your entire program because it wants everything to be in your wanted subroutine. Plus, I don't like the fact that half of my code is scattered all over the place. However, what other tools come standard with every installation of Perl. I have to make do.

I prefer to toss all of my files into an array. It keeps the code clean. My find just finds. I do the rest of my processing elsewhere. I also embed my wanted subroutine embedded in my find command. It keeps everything in one place.

Also, you can't use unlink to remove a directory. Use remove_tree from File::Path. That's a standard module. You can also use readdir to see how many subdirectories a directory has. That's a good way to check to see if it's empty:

use strict;
use warnings;
use feature qw(say);

use File::Find;
use File::Path qw(make_path remove_tree);

my $testdir     = 'C:/jason/temp/test';
my $mdate_limit = 30;

my @files;              # We'll store the files here
my %dirs;               # And we'll track the directories that my be empty

#
# First find the files
#
find ( sub {
    return unless -f;                  # We want just files.
    return if -M < $mdate_limit;       # Skip if we've modified since $mdate_limit days
    push @files, $File::Find::name;    # We're interested in this file,
    $dirs{$File::Find::dir} = 1;       # and the directory that file is in
}, $testdir );

#
# Delete the files that you've found
#

unlink @files;

#
# Go through the directories and see which are empty
#

for my $dir ( sort keys %dirs ) {
    opendir my $dir_fh, $dir or next;  # We'll skip bad reads
    my @dir_files = readdir $dir_fh;
    close $dir_fh;
    if ( @dir_files <= 2 ) {   # Directory is empty if there's only "." and ".." in it
        remove_tree( $dir )
          or warn qq(Can't remove directory "$dir"\n);
    }
}

Notice that I've embedded my wanted routine:

find ( sub {
    return unless -d;                  # We want just files.
    return if -M < $mdate_limit;       # File hast been modified in the $mdate_limit days
    push @files, $Find::File::name;    # We're interested in this file
    $dirs{$Find::File::dir} = 1;       # The directory that file is in
}, $testdir );

The alternative is this:

file (\&wanted, $testdir);

sub wanted {
    return unless -d;                  # Okay...
    return if -M < $mdate_limit;       # Um... Where's $mdate_limit defined?
    push @files, $Find::File::name;    # And @files?
    $dirs{$Find::File::dir} = 1;       # And %dirs?
}

The problem is that my wanted subroutine contains three global variables. And, it's possible for my find command to get separated from my wanted subroutine. In 3 months time, you'll have to search all over your code to find that wanted routine.

And, when you do see that wanted subroutine, there are those three mysterious global variables. Where are they defined? Is that a bug?

By combining the subroutine with my find, I guarantee that the subroutine the find command needs won't drift away from my find. Plus, it hides the globalness of those three variables embedded in my subroutine.

There is nothing preventing me from deleting the files inside the find command. It's usually not a good idea to change the directory structure while searching it, but this should be fine.

However, I like my find command to just find the files I'm interested in. I don't want 1/2 of my program stuffed in there. It becomes a maintenance nightmare. I'll put up with a bit of inefficiency. It might take a full second or two to load my @files array with a million of files, but I'll spend a lot longer than that as soon as I have to debug my program.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top