帮助将Perl代码例程合并为文件处理

https://stackoverflow.com/questions/5052174

15-11-2019
|

题

我需要一些perl帮助将这些（2）进程/代码一起工作。我能够将它们单独工作以单独测试，但我需要帮助将它们带到一起，特别是使用循环构造。我不确定我是否应该使用foreach..aneways代码在下面。

此外，在我学习这种语言时，任何最佳实践也会很好。谢谢你的帮助。

这是我正在寻找的过程流程：

读取一个目录
寻找特定文件
使用文件名来删除一些关键信息以创建新处理的文件
处理输入文件

为每个输入文件创建新处理的文件读取（如果我在10中读取，我创建了10个新文件）

第1部分：

my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    next if ($file =~ /^\.+$/);
    #Get filename attributes
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
      print "$1\n";
      print "$2\n";
      print "$3\n";
    }
    print "$file\n";
}

第2部分：

use strict;
use Digest::MD5 qw(md5_hex);
#Create new file
open (NEWFILE, ">/backups/processed/foo$1.name.$2-foo_p$3.out") || die "cannot create file";
my $data = '';
my $line1 = <>;
chomp $line1;
my @heading = split /,/, $line1;
my ($sep1, $sep2, $eorec) = ( "^A", "^E", "^D");
while (<>)
{
    my $digest = md5_hex($data);
    chomp;
    my (@values) = split /,/;
    my $extra = "__mykey__$sep1$digest$sep2" ;
    $extra .= "$heading[$_]$sep1$values[$_]$sep2" for (0..scalar(@values));
    $data .= "$extra$eorec"; 
    print NEWFILE "$data";
}
#print $data;
close (NEWFILE);

解决方案

I've bashed your two code fragments together (making the second a sub that the first calls for each matching file) and, if I understood your description of the objective correctly, this should do what you want. Comments on style and syntax are inline:

#!/usr/bin/env perl

# - Never forget these!
use strict;
use warnings;

use Digest::MD5 qw(md5_hex);

my $target_dir = "/backups/test/";
opendir my $dh, $target_dir or die "can't opendir $target_dir: $!";
while (defined(my $file = readdir($dh))) {
    # Parens on postfix "if" are optional; I prefer to omit them
    next if $file =~ /^\.+$/;
    if ($file =~ /^foo(\d{3})\.name\.(\w{3})-foo_p(\d{1,4})\.\d+.csv$/) {
        process_file($file, $1, $2, $3);
    }
    print "$file\n";
}

sub process_file {
    my ($orig_name, $foo_x, $name_x, $p_x) = @_;

    my $new_name = "/backups/processed/foo$foo_x.name.$name_x-foo_p$p_x.out";

    # - From your description of the task, it sounds like we actually want to
    #   read from the found file, not from <>, so opening it here to read
    # - Better to use lexical ("my") filehandle and three-arg form of open
    # - "or" has lower operator precedence than "||", so less chance of
    #   things being grouped in the wrong order (though either works here)
    # - Including $! in the error will tell why the file open failed
    open my $in_fh, '<', $orig_name or die "cannot read $orig_name: $!";
    open(my $out_fh, '>', $new_name) or die "cannot create $new_name: $!";

    my $data  = '';
    my $line1 = <$in_fh>;
    chomp $line1;
    my @heading = split /,/, $line1;
    my ($sep1, $sep2, $eorec) = ("^A", "^E", "^D");
    while (<$in_fh>) {
        chomp;
        my $digest   = md5_hex($data);
        my (@values) = split /,/;
        my $extra    = "__mykey__$sep1$digest$sep2";
        $extra .= "$heading[$_]$sep1$values[$_]$sep2"
          for (0 .. scalar(@values));
        # - Useless use of double quotes removed on next two lines
        $data .= $extra . $eorec;
        #print $out_fh $data;
    }
    # - Moved print to output file to here (where it will print the complete
    #   output all at once) rather than within the loop (where it will print
    #   all previous lines each time a new line is read in) to prevent
    #   duplicate output records.  This could also be achieved by printing
    #   $extra inside the loop.  Printing $data at the end will be slightly
    #   faster, but requires more memory; printing $extra within the loop and
    #   getting rid of $data entirely would require less memory, so that may
    #   be the better option if you find yourself needing to read huge input
    #   files.
    print $out_fh $data;

    # - $in_fh and $out_fh will be closed automatically when it goes out of
    #   scope at the end of the block/sub, so there's no real point to
    #   explicitly closing it unless you're going to check whether the close
    #   succeeded or failed (which can happen in odd cases usually involving
    #   full or failing disks when writing; I'm not aware of any way that
    #   closing a file open for reading can fail, so that's just being left
    #   implicit)
    close $out_fh or die "Failed to close file: $!";
}

Disclaimer: perl -c reports that this code is syntactically valid, but it is otherwise untested.

其他提示

You are using an old-style of Perl programming. I recommend you to use functions and CPAN modules (http://search.cpan.org). Perl pseudocode:

use Modern::Perl;
# use... 

sub get_input_files {
  # return an array of files (@)
}

sub extract_file_info {
   # takes the file name and returs an array of values (filename attrs)
}

sub process_file {
   # reads the input file, takes the previous attribs and build the output file
}


my @ifiles = get_input_files;
foreach my $ifile(@ifiles) {
   my @attrs = extract_file_info($ifile);
   process_file($ifile, @attrs);
}

Hope it helps

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow