You can use either a reference or pass the entire hash or array. Your choice. There are two issues that might make you choose one over the other:
- Passing other parameters
- Memory Management
Perl doesn't really have subroutine parameters. Instead, you're simply passing in an array of parameters. What if you're subroutine is seeing which array has more elements. I couldn't do this:
foo(@first, @second);
because all I'll be passing in is one big array that combines all the members of both. This is true with hashes too. Imagine a program that takes two hashes and finds the ones with common keys:
@common_keys = common(%hash1, %hash1);
Again, I'm combining all the keys and their values in both hashes into one big array.
The only way around this issue is to pass a reference:
foo(\@first, \@second);
@common_keys = common(\%hash1, \%hash2);
In this case, I'm passing the memory location where these two hashes are stored in memory. My subroutine can use those hash references. However, you do have to take some care which I'll explain with the second explanation.
The second reason to pass a reference is memory management. If my array or hash is a few dozen entries, it really doesn't matter all that much. However, imagine I have 10,000,000 entries in my hash or array. Copying all those members could take quite a bit of time. Passing by reference saves me memory, but with a terrible cost. Most of the time, I'm using subroutines as a way of not affecting my main program. This is why subroutines are suppose to use their own variables and why you're taught in most programming courses about variable scope.
However, when I pass a reference, I'm breaking that scope. Here's a simple program that doesn't pass a reference.
#! /usr/bin/env perl
use strict;
use warnings;
my @array = qw(this that the other);
foo (@array);
print join ( ":", @array ) . "\n";
sub foo {
my @foo_array = @_;
$foo_array[1] = "FOO";
}
Note that the subroutine foo
1 is changing the second element of the passed in array. However, even though I pass in @array
into foo
, the subroutine doesn't change the value of @array
. That's because the subroutine is working on a copy (created by my @foo_array = @_;
). Once the subroutine exists, the copy disappears.
When I execute this program, I get:
this:that:the:other
Now, here's the same program, except I'm passing in a reference, and in the interest of memory management, I use that reference:
#! /usr/bin/env perl
use strict;
use warnings;
my @array = qw(this that the other);
foo (\@array);
print join ( ":", @array ) . "\n";
sub foo {
my $foo_array_ref = shift;
$foo_array_ref->[1] = "FOO";
}
When I execute this program, I get:
this:FOO:the:other
That's because I don't pass in the array, but a reference to that array. It's the same memory location that holds @array
. Thus, changing the reference in my subroutine causes it to be changed in my main program. Most of the time, you do not want to do this.
You can get around this by passing in a reference, then copying that reference to an array. For example, if I had done this:
sub foo {
my @foo_array = @{ shift() };
I would be making a copy of my reference to another array. It protects my variables, but it does mean I'm copying my array over to another object which takes time and memory. Back in the 1980s when I first was programming, this was a big issue. However, in this age of gigabyte memory and quadcore processors, the main issue isn't memory management, but maintainability. Even if your array or hash contained 10 million entries, you'll probably not notice any time or memory issues.
This also works the other way around too. I could return from my subroutine a reference to a hash or the entire hash. Many people like returning a reference, but this could be problematic.
In object oriented Perl programming, I use references to keep track of my objects. Normally, I'll have a reference to a hash I can use to store other values, arrays, and hashes.
In a recent program, I was counting IDs and how many times they are referenced in a log file. This was stored in an object (which is just a reference to a hash). I had a method that would return the entire hash of IDs and their counts. I could have done this:
return $self->{COUNT_HASH};
But, what happened, if the user started modifying that reference I passed? They would be actually manipulating my object without using my methods to add and subtract from the IDs. Not something that I want them to do. Instead, I create a new hash, and then return a reference to that hash:
my %hash_counts = % { $self-{COUNT_HASH} };
return \%hash_count;
This copied my reference to an array, and then I passed the reference to the array. This protects my data from outside manipulation. I could still return a reference, but the user would no longer have access to my object without going through my methods.
By the way, I like using wantarray
which gives the caller a choice on how they want their data:
my %hash_counts = %{ $self->{COUNT_HASH} };
return want array ? %hash_counts : \%hash_counts;
This allows me to return a reference or a hash depending how the user called my object:
my %hash_counts = $object->totals(); # Returns a hash
my $hash_counts_ref = $object->totals(); # Returns a reference to a hash
1 A footnote: The @_
array is pointing to the same memory location as the parameters of your calling subroutine. Thus, if I pass in foo(@array)
and then did $_[1] = "foo";
, I would be changing the second element of @array
.