If you really are working with a "large" data set and the data is already grouped by id like in your example, then I suggest that you process these as you go instead of building a huge hash.
use strict;
use warnings;
# Skip Header row
<DATA>;
my @group;
my $lastid = '';
while (<DATA>) {
my ($id, $data) = split /,\s*/, $_, 2;
if ($id ne $lastid) {
processData($lastid, @group);
@group = ();
}
push @group, $data;
$lastid = $id;
}
processData($lastid, @group);
sub processData {
my $id = shift;
return if ! @_;
print "$id " . scalar(@_) . "\n";
# Rest of code here
}
__DATA__
identifier,feature 1, feature 2, feature 3, ...
29239999, 2,5,3,...
29239999, 2,4,3,...
29239999, 2,6,7,...
17221882, 2,6,7,...
17221882, 1,1,7,...
Outputs
29239999 3
17221882 2