In Perl, why is my directory handle being closed when I try to recursively traverse directories?

StackOverflow https://stackoverflow.com/questions/23565584

  •  18-07-2023
  •  | 
  •  

Question

Here is my code. I am searching for duplicate directories. I need a Depth First Search. I use recursion, if DH sees a folder then it looks that folder. But after finishing this folder DH handling closing and program doesn't look the remaining parts of top folder.

        #! /usr/bin/perl 
use Digest::MD5;

dtraverse(@ARGV) ;

sub dtraverse {
   my $fullpathname ;
   my @subdirlist ; 
   my @filelist2  ; 
   my $newpath ;
   my $name ;
   my $d ; 

    print "entered    nnnn\n";
   $fullpathname = $_[0]; 
   opendir(DH,$fullpathname) or die("Cannot open directory\n"); 
   @subdirlist = () ; 
   @filelist2 = () ; 
   while ($name = readdir(DH) ) { 
       next if (($name eq ".") or ($name eq "..") ); 
       $newpath = $fullpathname . "/" . $name ; 
        print "asdasd == $name\n";
       if (-d $newpath ) {
            push(@subdirlist,$newpath) ;    
            $name2=$name;
            dtraverse($newpath) ;

AFTER THAT POINT DH IS CLOSING AND NOT LOOKING FOR THE REMANING FILES

            push @filelist2,$hashes{$newpath};
        }
       else {

            open (my $fh, '<', $newpath) or die "Can't open '$newpath': $!";
            binmode ($fh);
            $mumu= Digest::MD5->new->addfile($fh)->hexdigest, " $newpath\n";
            push(@filelist2,$mumu); 
            $data  {$newpath}=$mumu;
       }

   }
   $total="";
   foreach $mumus (sort @filelist2) {
        $total="$total" . "$mumus";
        $total2= Digest::MD5->new->add("$total")->hexdigest;
        $hashes{$fullpathname}=$total2; 
   }
   closedir(DH) ;

   print "hash of $fullpathname= $total2 \n";
   #print "DIR:$fullpathname  FILES:@filelist\n" ; 
}
Was it helpful?

Solution

You are using a bareword directory handle DH. These are package variables. Each time you do opendir DH, the previously opened handle is closed:

Using bareword symbols to refer to file handles is particularly evil because they are global, and you have no idea if that symbol already points to some other file handle.

So, use a lexical directory handle, opendir my $dh, just like the file handle you are using.

Of course, I would have probably gone with File::Find. Also, take a look at Yanick's entry in DFW.pm Dedup Hackathon.

The following likely buggy script using Path::Class and Digest::xxHash took about 10 seconds to check the 5876 files in my download folder:

#!/usr/bin/env perl

use strict;
use warnings;

use constant xxHASH_SEED => 0xDEADBEEF;

use feature 'say';
use Digest::xxHash qw(xxhash_hex);
use Path::Class;
use YAML::XS;

run(@ARGV) unless caller;

sub run {
    my $top = shift;
    die "Need top directory\n" unless defined $top;

    # dies if it cannot resolve
    $top = dir($top)->absolute->resolve;

    my $counter;
    my %dupes;

    $top->recurse(
        callback => sub {
            my $entry = shift;
            if (-d $entry and !(-x _)) {
                return $entry->PRUNE
            }
            return unless -r $entry;
            return unless -f _;

            $counter += 1;

            my $hash = xxhash_hex scalar($entry->slurp), xxHASH_SEED;
            # Don't stringify if you want to do
            # anything other than display file names
            push @{ $dupes{$hash} }, "$entry";
        },
        depthfirst => 1,
    );

    say "Hashed $counter files";
    my @dupes = grep @$_ > 1, values %dupes;

    if (@dupes) {
        print "Possible duplicates:\n", Dump \@dupes;
    }
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top