Ayuda a fusionar las rutinas de código Perl juntas para el procesamiento de archivos

https://stackoverflow.com/questions/5052174

15-11-2019
|

Pregunta

Necesito una ayuda de PERL para poner estos (2) procesos / código para trabajar juntos.Pude hacer que trabajaban individualmente para probar, pero necesito ayuda para reunirlos especialmente con el uso de las construcciones de bucle.No estoy seguro de si debería ir con foreach ... en su caso, el código está abajo.

también, las mejores prácticas serían geniales también, ya que estoy aprendiendo este idioma.Gracias por tu ayuda.

Aquí está el flujo del proceso que busco:

Leer un directorio
Busque un archivo en particular
Use el nombre del archivo para eliminar una información clave para crear un archivo recién procesado
Procesar el archivo de entrada

Crea el archivo recién procesado para cada archivo de entrada Lea (si leí en 10, creo 10 archivos nuevos)

parte 1:

my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    next if ($file =~ /^\.+$/);
    #Get filename attributes
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
      print "$1\n";
      print "$2\n";
      print "$3\n";
    }
    print "$file\n";
}

parte 2:

use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my @heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
    my $digest = md5_hex($data);
    chomp;
    my (@values) = split /,/;
    my $extra = "__mykey__$sep1$digest$sep2" ;
    $extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(@values));
    $data .= "$extra$eorec"; 
    print NEWFILE "$data";
}
#print $data;
close (NEWFILE);

Solución

I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:

#!/usr/bin/env perl

# - Never forget these!
use strict;
use warnings;

use Digest::MD5 qw(md5_hex);

my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    # Parens on postfix "if" are optional; I prefer to omit them
    next if $file =~ /^\.+$/;
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
        process_file($file, $1, $2, $3);
    }
    print "$file\n";
}

sub process_file {
    my ($orig_name, $foo_x, $name_x, $p_x) = @_;

    my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";

    # - From your description of the task, it sounds like we actually want to
    #   read from the found file, not from <>, so opening it here to read
    # - Better to use lexical ("my") filehandle and three-arg form of open
    # - "or" has lower operator precedence than "||", so less chance of
    #   things being grouped in the wrong order (though either works here)
    # - Including $! in the error will tell why the file open failed
    open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
    open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";

    my $data  = '';
    my $line1 = <$in_fh>;
    chomp $line1;
    my @heading = split /,/, $line1;
    my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
    while (<$in_fh>) {
        chomp;
        my $digest   = md5_hex($data);
        my (@values) = split /,/;
        my $extra    = "__mykey__$sep1$digest$sep2";
        $extra .= "$heading[$_]$sep1$values[$_]$sep2"
          for (0 .. scalar(@values));
        # - Useless use of double quotes removed on next two lines
        $data .= $extra . $eorec;
        #print $out_fh $data;
    }
    # - Moved print to output file to here (where it will print the complete
    #   output all at once) rather than within the loop (where it will print
    #   all previous lines each time a new line is read in) to prevent
    #   duplicate output records.  This could also be achieved by printing
    #   $extra inside the loop.  Printing $data at the end will be slightly
    #   faster, but requires more memory; printing $extra within the loop and
    #   getting rid of $data entirely would require less memory, so that may
    #   be the better option if you find yourself needing to read huge input
    #   files.
    print $out_fh $data;

    # - $in_fh and $out_fh will be closed automatically when it goes out of
    #   scope at the end of the block/sub, so there's no real point to
    #   explicitly closing it unless you're going to check whether the close
    #   succeeded or failed (which can happen in odd cases usually involving
    #   full or failing disks when writing; I'm not aware of any way that
    #   closing a file open for reading can fail, so that's just being left
    #   implicit)
    close $out_fh or die "Failed to close file: $!";
}

Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.

Otros consejos

You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:

use Modern::Perl;
# use... 

sub get_input_files {
  # return an array of files (@)
}

sub extract_file_info {
   # takes the file name and returs an array of values (filename attrs)
}

sub process_file {
   # reads the input file, takes the previous attribs and build the output file
}


my @ifiles = get_input_files;
foreach my $ifile(@ifiles) {
   my @attrs = extract_file_info($ifile);
   process_file($ifile, @attrs);
}

Hope it helps

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow