如何从 Perl 数组中删除重复项？

https://stackoverflow.com/questions/7651

08-06-2019
|

题

我有一个 Perl 数组：

my @my_array = ("one","two","three","two","three");

如何从数组中删除重复项？

解决方案

您可以执行类似的操作，如中所示 perlfaq4:

sub uniq {
    my %seen;
    grep !$seen{$_}++, @_;
}

my @array = qw(one two three two three);
my @filtered = uniq(@array);

print "@filtered\n";

输出：

one two three

如果您想使用模块，请尝试 uniq 函数来自 List::MoreUtils

其他提示

Perl 文档附带了一系列很好的常见问题解答。您的问题经常被问到：

% perldoc -q duplicate

从上面命令的输出中复制并粘贴的答案如下所示：

Found in /usr/local/lib/perl5/5.10.0/pods/perlfaq4.pod
 How can I remove duplicate elements from a list or array?
   (contributed by brian d foy)

   Use a hash. When you think the words "unique" or "duplicated", think
   "hash keys".

   If you don't care about the order of the elements, you could just
   create the hash then extract the keys. It's not important how you
   create that hash: just that you use "keys" to get the unique elements.

       my %hash   = map { $_, 1 } @array;
       # or a hash slice: @hash{ @array } = ();
       # or a foreach: $hash{$_} = 1 foreach ( @array );

       my @unique = keys %hash;

   If you want to use a module, try the "uniq" function from
   "List::MoreUtils". In list context it returns the unique elements,
   preserving their order in the list. In scalar context, it returns the
   number of unique elements.

       use List::MoreUtils qw(uniq);

       my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
       my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7

   You can also go through each element and skip the ones you've seen
   before. Use a hash to keep track. The first time the loop sees an
   element, that element has no key in %Seen. The "next" statement creates
   the key and immediately uses its value, which is "undef", so the loop
   continues to the "push" and increments the value for that key. The next
   time the loop sees that same element, its key exists in the hash and
   the value for that key is true (since it's not 0 or "undef"), so the
   next skips that iteration and the loop goes to the next element.

       my @unique = ();
       my %seen   = ();

       foreach my $elem ( @array )
       {
         next if $seen{ $elem }++;
         push @unique, $elem;
       }

   You can write this more briefly using a grep, which does the same
   thing.

       my %seen = ();
       my @unique = grep { ! $seen{ $_ }++ } @array;

安装列表::更多实用工具来自 CPAN

然后在你的代码中：

use strict;
use warnings;
use List::MoreUtils qw(uniq);

my @dup_list = qw(1 1 1 2 3 4 4);

my @uniq_list = uniq(@dup_list);

我通常的做法是：

my %unique = ();
foreach my $item (@myarray)
{
    $unique{$item} ++;
}
my @myuniquearray = keys %unique;

如果您使用哈希并将项目添加到哈希中。您还可以了解每个项目在列表中出现的次数。

可以使用简单的 Perl oneliner 来完成。

my @in=qw(1 3 4  6 2 4  3 2 6  3 2 3 4 4 3 2 5 5 32 3); #Sample data 
my @out=keys %{{ map{$_=>1}@in}}; # Perform PFM
print join ' ', sort{$a<=>$b} @out;# Print data back out sorted and in order.

PFM 块执行以下操作：

@in 中的数据被输入到 MAP 中。MAP 构建一个匿名哈希。从哈希中提取密钥并输入@out

变量@array是具有重复元素的列表

%seen=();
@unique = grep { ! $seen{$_} ++ } @array;

最后一张还是蛮不错的我只是稍微调整一下：

my @arr;
my @uniqarr;

foreach my $var ( @arr ){
  if ( ! grep( /$var/, @uniqarr ) ){
     push( @uniqarr, $var );
  }
}

我认为这可能是最易读的方法。

方法一：使用哈希

逻辑：哈希只能有唯一的键，因此迭代数组，为数组的每个元素分配任意值，并将元素保留为该哈希的键。返回哈希的键，它是您的唯一数组。

my @unique = keys {map {$_ => 1} @array};

方法二：方法 1 的扩展以实现可重用性

如果我们要在代码中多次使用此功能，最好创建一个子例程。

sub get_unique {
    my %seen;
    grep !$seen{$_}++, @_;
}
my @unique = get_unique(@array);

方法三：使用模块 `List::MoreUtils`

use List::MoreUtils qw(uniq);
my @unique = uniq(@array);

前面的答案几乎总结了完成此任务的可能方法。

不过，我建议对那些有以下情况的人进行修改：不关心数数重复项，但是做关心秩序。

my @record = qw( yeah I mean uh right right uh yeah so well right I maybe );
my %record;
print grep !$record{$_} && ++$record{$_}, @record;

请注意，之前建议的 grep !$seen{$_}++ ... 增量 $seen{$_} 在求反之前，所以无论是否已经被加过，都会发生增量 %seen 或不。然而，上面的短路时 $record{$_} 是真的，留下曾经听到过的东西 %record'.

你也可以选择这种荒谬的做法，它利用了自动生存和散列键的存在：

...
grep !(exists $record{$_} || undef $record{$_}), @record;

然而，这可能会导致一些混乱。

如果你既不关心顺序也不关心重复计数，你可以使用哈希切片和我刚才提到的技巧进行另一个黑客攻击：

...
undef @record{@record};
keys %record; # your record, now probably scrambled but at least deduped

试试这个，似乎 uniq 函数需要一个排序列表才能正常工作。

use strict;

# Helper function to remove duplicates in a list.
sub uniq {
  my %seen;
  grep !$seen{$_}++, @_;
}

my @teststrings = ("one", "two", "three", "one");

my @filtered = uniq @teststrings;
print "uniq: @filtered\n";
my @sorted = sort @teststrings;
print "sort: @sorted\n";
my @sortedfiltered = uniq sort @teststrings;
print "uniq sort : @sortedfiltered\n";

使用唯一哈希键的概念：

my @array  = ("a","b","c","b","a","d","c","a","d");
my %hash   = map { $_ => 1 } @array;
my @unique = keys %hash;
print "@unique","\n";

输出：a c b d

许可以下： CC-BY-SA 和归因

不隶属于 StackOverflow

如何从 Perl 数组中删除重复项？

方法一：使用哈希

方法二：方法 1 的扩展以实现可重用性

方法三：使用模块 List::MoreUtils

方法三：使用模块 `List::MoreUtils`