Is returning a whole array from a Perl subroutine inefficient?

https://stackoverflow.com/questions/546175

23-08-2019
|

Question

I often have a subroutine in Perl that fills an array with some information. Since I'm also used to hacking in C++, I find myself often do it like this in Perl, using references:

my @array;
getInfo(\@array);

sub getInfo {
   my ($arrayRef) = @_;
   push @$arrayRef, "obama";
   # ...
}

instead of the more straightforward version:

my @array = getInfo();

sub getInfo {
   my @array;
   push @array, "obama";
   # ...
   return @array;
}

The reason, of course, is that I don't want the array to be created locally in the subroutine and then copied on return.

Is that right? Or does Perl optimize that away anyway?

Solution

What about returning an array reference in the first place?

sub getInfo {
  my $array_ref = [];
  push @$array_ref, 'foo';
  # ...
  return $array_ref;
}

my $a_ref = getInfo();
# or if you want the array expanded
my @array = @{getInfo()};

Edit according to dehmann's comment:

It's also possible to use a normal array in the function and return a reference to it.

sub getInfo {
  my @array;
  push @array, 'foo';
  # ...
  return \@array;
}

OTHER TIPS

Passing references is more efficient, but the difference is not as big as in C++. The argument values themselves (that means: the values in the array) are always passed by reference anyway (returned values are copied though).

Question is: does it matter? Most of the time, it doesn't. If you're returning 5 elements, don't bother about it. If you're returning/passing 100'000 elements, use references. Only optimize it if it's a bottleneck.

If I look at your example and think about what you want to do I'm used to write it in this manner:

sub getInfo {
  my @array;
  push @array, 'obama';
  # ...
  return \@array;
}

It seems to me as straightforward version when I need return large amount of data. There is not need to allocate array outside sub as you written in your first code snippet because my do it for you. Anyway you should not do premature optimization as Leon Timmermans suggest.

To answer the final rumination, no, Perl does not optimize this away. It can't, really, because returning an array and returning a scalar are fundamentally different.

If you're dealing with large amounts of data or if performance is a major concern, then your C habits will serve you well - pass and return references to data structures rather than the structures themselves so that they won't need to be copied. But, as Leon Timmermans pointed out, the vast majority of the time, you're dealing with smaller amounts of data and performance isn't that big a deal, so do it in whatever way seems most readable.

This is the way I would normally return an array.

sub getInfo {
  my @array;
  push @array, 'foo';
  # ...
  return @array if wantarray;
  return \@array;
}

This way it will work the way you want, in scalar, or list contexts.

my $array = getInfo;
my @array = getInfo;

$array->[0] == $array[0];

# same length
@$array == @array;

I wouldn't try to optimize it unless you know it is a slow part of your code. Even then I would use benchmarks to see which subroutine is actually faster.

There's two considerations. The obvious one is how big is your array going to get? If it's less than a few dozen elements, then size is not a factor (unless you're micro-optimizing for some rapidly called function, but you'd have to do some memory profiling to prove that first).

That's the easy part. The oft overlooked second consideration is the interface. How is the returned array going to be used? This is important because whole array dereferencing is kinda awful in Perl. For example:

for my $info (@{ getInfo($some, $args) }) {
    ...
}

That's ugly. This is much better.

for my $info ( getInfo($some, $args) ) {
    ...
}

It also lends itself to mapping and grepping.

my @info = grep { ... } getInfo($some, $args);

But returning an array ref can be handy if you're going to pick out individual elements:

my $address = getInfo($some, $args)->[2];

That's simpler than:

my $address = (getInfo($some, $args))[2];

Or:

my @info = getInfo($some, $args);
my $address = $info[2];

But at that point, you should question whether @info is truly a list or a hash.

my $address = getInfo($some, $args)->{address};

What you should not do is have getInfo() return an array ref in scalar context and an array in list context. This muddles the traditional use of scalar context as array length which will surprise the user.

Finally, I will plug my own module, Method::Signatures, because it offers a compromise for passing in array references without having to use the array ref syntax.

use Method::Signatures;

method foo(\@args) {
    print "@args";      # @args is not a copy
    push @args, 42;   # this alters the caller array
}

my @nums = (1,2,3);
Class->foo(\@nums);   # prints 1 2 3
print "@nums";        # prints 1 2 3 42

This is done through the magic of Data::Alias.

3 other potentially LARGE performance improvements if you are reading an entire, largish file and slicing it into an array:

Turn off BUFFERING with sysread() instead of read() (manual warns about mixing)
Pre-extend the array by valuing the last element - saves memory allocations
Use Unpack() to swiftly split data like uint16_t graphics channel data

Passing an array ref to the function allows the main program to deal with a simple array while the write-once-and-forget worker function uses the more complicated "$@" and arrow ->[$II] access forms. Being quite C'ish, it is likely to be fast!

I know nothing about Perl so this is a language-neutral answer.

It is, in a sense, inefficient to copy an array from a subroutine into the calling program. The inefficiency arises in the extra memory used and the time taken to copy the data from one place to another. On the other hand, for all but the largest arrays, you might not give a damn, and might prefer to copy arrays out for elegance, cussedness or any other reason.

The efficient solution is for the subroutine to pass the calling program the address of the array. As I say, I haven't a clue about Perl's default behaviour in this respect. But some languages provide the programmer the option to choose which approach.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow