Question

I would like to read 100 KB from <>, do some testing on that and then put the 100 KB back, so they will be read by <> later.

In metacode:

$data100kb = read(<>,100000);
testing($data100kb);
unget(<>,$data100kb);
while(<>) {
  do stuff;
}

I do not know in advance if <> will supply me an actual file, a pipe or a concatenation of actual files. So it should work with:

cat bigfile_a bigfile_b | perl my_program

Assume bigfiles are 1000*RAM size, so copying the input is prohibitively expensive.

It is acceptable if I can only read from STDIN.

Background

The first 100kb tells me how to parse the full input, but the parser needs this input as well.

Was it helpful?

Solution

This seems to work for STDIN. It would be great if it could be done faster.

read(STDIN, $first, 100000);
unget($first);

compute($first);

while($_=get_line()) {
    # Similar to while(<>)
}

my @line_cache;
sub get_line {
    if(@line_cache) {
        my $line = shift @line_cache;
        if(@line_cache) {
            # not last line                                                                                                            
            return $line;
        } else {
            # last line - may be incomplete                                                                                            
            if(substr($line, -1, 1) eq $/) {
                # Line is complete                                                                                                     
                return $line;
            } else {
                return $line. scalar(<STDIN>);
            }
        }
    } else {
        return scalar(<STDIN>);
    }
}

sub unget {
    for(@_) {
        # Split into lines                                                                                                             
        push @line_cache, split m:(?<=$/):;
    }
}

OTHER TIPS

For posterity... I wrote FileHandle::Unget to address this problem.

I don't know whether this satisfies your need. If you insist on using <>, then I guess you have to use tie.

#copy STDIN to another filehandle: $fh
my $fakefile = join '', <STDIN>;
open my $fh, '<', \$fakefile;

#read 100kb
read $fh, my $data100kb, 100_000;

#do something with the data
#$data100kb =~ y/a/b/;
#print $data100kb;

#reset $fh
seek $fh, 0, 0;

while(<$fh>){
    print;# do some stuff
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top