“Out of memory” with simple Win32::Unicode::File readline loop and Strawberry Perl

https://stackoverflow.com/questions/9070906

20-04-2021
|

Question

The issue I have can be found by running the following code in Strawberry perl 5.12.3.0 on Windows XP.

    #!/usr/bin/perl -w

    use strict;
    use warnings;
    use Win32::Unicode::File;
    use Encode;

    my $fname = shift @ARGV;

    my $fh = Win32::Unicode::File->new;
    if ($fh->open('<', $fname)){
      while (my $line = $fh->readline()){}
      close $fh;
    }else{
      print "Couldn't open file: $!\n";
    }

The only thing that is happening here is that I perform a readline and this keeps eating memory until I get an Out of memory error from Strawberry perl. I am using a really big file but since this code is stream based it shouldn't matter. Am I missing something here or is there a leak somewhere in Strawberry perl? I tested the exactly same code in ActivePerl and there it works fine, i.e., it doesn't eat memory.

Update: Replacing Win32::Unicode::File with the normal diamond operator seems to work on my distribution at least. See the following code.

    use strict;
    use warnings;

    my $fname = shift @ARGV;

    if (open(my $fh, '<', $fname)){
      while (my $line = <$fh>){}
      close $fh;
    }else{ print "Couldn't open file: $!\n";}

So that would suggest the problem lies with Win32::Unicode module right?

Solution 2

A little unorthodox I guess, but I'm going to answer my own question. I have replaced the Win32::Unicode::File package with the Path::Class::Unicode package instead for reading the unicode file. This works fine (i.e. no memory eating) so it seems like the problem is in the Win32::Unicode::File package and is most likely a bug. I have contacted the author of the package and he's looking into it. Please let me know if you want me to supply the code. It's pretty straightforward.

OTHER TIPS

Maybe $/ (or $INPUT_RECORD_SEPARATOR) is not a new line? Or $[ (index of first array element and first character in a (sub)string) is not 0.

Those two vars are used by the module during read or readline.

BTW: It's so damn slow because it uses 3 function calls to reads each line one character at a time and then calls Encode::decode for each read character and then adds it to the line buffer that readline returns to your code. Yuck!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow