Develop a program to analyze the directory structure of a Linux disk and identify any files larger that 500kbytes

StackOverflow https://stackoverflow.com/questions/23354967

Question

I'm trying to develop a program that will analyze the directory structure of a Linux disk and identify any files larger that 500kbytes.

#!/usr/bin/perl

use File::Find::Rule;
use warnings;

my $filelist;

sub buildFile {
    open ($filelist, ">", "filelist.txt") || die $!;

# File find rule and # Provide specific list of directories to scan
    my $SubDirs= File::Find::Rule->directory->in('etc', 'dev', 'bin'); 


    # interpret Size Method and stored the list on @files 
    my @files = File::Find::Rule->size('500')->in($SubDirs);

print $filelist map { "$_\n" } @files;
return \$filelist;

}

Was it helpful?

Solution 2

I think it will be of more use to you to learn the workings of File::Find, around which File::Find::Rule is a wrapper. I still always have to read the documentation for the latter, despite having used the module dozens of times, and I think a plain File::Find solution is often easier to read.

A few comments on your own code

  • You must always use strict as well as use warnings at the top of every Perl program, especially if you are asking for help with it

  • Variables should be declared as close as possible to their first point of use. It is also better to use and and or instead of && and || for flow-control, so this is more appropriate

    open my $filelist, '>', 'filelist.txt' or die $!;
    
  • Local variable names are conventionally written with only lower-case alphanumerics and underscore, so your subroutine would ideally be spelled build_file and $SubDirs would be $sub_dirs or $subdirs. Capital letters are reserved for global identifiers, such package names

  • Using map like that to print an array of strings with a newline at the end of each one is wasteful of memory. It will generate a complete new list of the strings with newline appended and pass that entire list to print at once. Writing

    print $filelist "$_\n" for @files
    

    is IMO much clearer, and only one line at a time is prepared and passed to print

  • I can't imagine why you would want to return the value of the file handle $filelist, except perhaps to write more to the file after the subroutine returns. In any case you certainly don't want a reference to the file handle, and just return $filelist is correct

I would write something like this

use strict;
use warnings;

use File::Find;

sub build_file {
  my @dirs = @_;

  open my $list_fh, '>', 'filelist.txt' or die $!;

  find(sub {
    return unless -f;
    print $list_fh $File::Find::name, "\n" if -s _ > 500 * 1024;
  }, @dirs);

  return $list_fh;
}

my $fh = build_file('etc', 'dev', 'bin');

print $fh "More stuff after the list of files\n";

OTHER TIPS

Here is how you'd write that :)

use File::Find::Rule qw/ find rule /;
my @files = find( size => '>500Ki' , in => [ 'etc', 'dev', 'bin' ] );

or the iterator version (if the file list is potentially HUGE )

my $rule = rule( size => '>500Ki' )->start( 'etc', 'dev', 'bin' );
while ( defined ( my $file = $rule->match ) ) {
    print $filelist "$file\n";
}

I use find() for returning a list of files and rule() for other stuff ... but they're one and the same

update: typo fix (Ki not Kib as per Number::Compare) and a test program

#!/usr/bin/perl --
use strict; use warnings;
use Data::Dump qw/ dd /;
use Path::Tiny qw/ path tempdir cwd /;
use File::Find::Rule qw/ find rule /;
Main( @ARGV );
exit( 0 );
sub Main {
    my $temp = tempdir( CLEANUP => 1 );
    my $cwd = cwd();
    chdir $temp;
    makeThem( $temp );
    findThem( );
    chdir $cwd;
#~     $temp->remove_tree;
}
sub makeThem {
    my( $temp ) = @_;
    for my $bed ( qw/ bin etc dev / ){
        path( $temp, $bed )->mkpath;
        path( $temp, $bed, 'one' )->touch;
        path( $temp, $bed, 'two' )->touch;
        path( $temp, $bed, 'tri' )->spew(1x(1024*501));
    }
}
sub findThem {    
#~     my @files = find( size => '>500Kib' , in => [ 'etc', 'dev', 'bin' ] );
    my @files = find( size => '>500Ki' , in => [ 'etc', 'dev', 'bin' ] );
    dd( \@files );
    my $rule = rule( size => '>500Ki' )->start( 'etc', 'dev', 'bin' );
    while ( defined ( my $file = $rule->match ) ) {
        dd( $file );
    }
    dd( find( file => in => [ 'etc', 'dev', 'bin' ] ) );    
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top