Question

I would like to read 100 KB from <>, do some testing on that and then put the 100 KB back, so they will be read by <> later.

In metacode:

$data100kb = read(<>,100000);
testing($data100kb);
unget(<>,$data100kb);
while(<>) {
  do stuff;
}

I do not know in advance if <> will supply me an actual file, a pipe or a concatenation of actual files. So it should work with:

cat bigfile_a bigfile_b | perl my_program

Assume bigfiles are 1000*RAM size, so copying the input is prohibitively expensive.

It is acceptable if I can only read from STDIN.

Background

The first 100kb tells me how to parse the full input, but the parser needs this input as well.

Était-ce utile?

La solution

This seems to work for STDIN. It would be great if it could be done faster.

read(STDIN, $first, 100000);
unget($first);

compute($first);

while($_=get_line()) {
    # Similar to while(<>)
}

my @line_cache;
sub get_line {
    if(@line_cache) {
        my $line = shift @line_cache;
        if(@line_cache) {
            # not last line                                                                                                            
            return $line;
        } else {
            # last line - may be incomplete                                                                                            
            if(substr($line, -1, 1) eq $/) {
                # Line is complete                                                                                                     
                return $line;
            } else {
                return $line. scalar(<STDIN>);
            }
        }
    } else {
        return scalar(<STDIN>);
    }
}

sub unget {
    for(@_) {
        # Split into lines                                                                                                             
        push @line_cache, split m:(?<=$/):;
    }
}

Autres conseils

For posterity... I wrote FileHandle::Unget to address this problem.

I don't know whether this satisfies your need. If you insist on using <>, then I guess you have to use tie.

#copy STDIN to another filehandle: $fh
my $fakefile = join '', <STDIN>;
open my $fh, '<', \$fakefile;

#read 100kb
read $fh, my $data100kb, 100_000;

#do something with the data
#$data100kb =~ y/a/b/;
#print $data100kb;

#reset $fh
seek $fh, 0, 0;

while(<$fh>){
    print;# do some stuff
}
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top