I want to calculate variance for each row of numbers in perl. I've written this subroutine:
################################################################
# variance
#
#
# A subroutine to compute the variance of an array
# division by n-1 i s used
#
sub var{
my ($data) = @_;
if (@$data ==1) {
return 0;
}
my $mean = mean ($data);
my $sqtotal = 0;
foreach (@$data) {
$sqtotal += ($_ - $mean) ** 2
}
my $var = $sqtotal / (scalar @$data - 1);
return $var;
}
If I gave it this array with 58 elements of the same number
[0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98,0.98]
The calculation gave me 1.25421964097639e-30.
I also tried to use the Statistics::Descriptive module (http://metacpan.org/pod/Statistics::Descriptive) and it gave me 2.11916254524942e-15.
And I also tried this site (http://www.alcula.com/calculators/statistics/variance/) and its result is 2.2438191655582E-15.
Why the results are not the same...
I could have just used the module but it was extremely memory intensive somehow for my file which basically consist of million lines of 58 numbers. I'm not sure why it used up so much memory.
Can someone tell me why my calculation gave a different number from the module and also how to make the module work with less memory? Is the memory intensive thing just the inherent drawback of that module. Several posts seem to suggest that.
Thanks!