문제

I'm looking for a regular expression that matches a repetitive pattern.

For example

The great eagle flied high flied high.  

Repeted: flied high

The call was done at night was done at night.  

Repeated: was done at night

Is there a way to achieve this? I just want the regular expression so that I can use a grep -P to filter some files.

Found 5 files under folders: home folder, home folder, home folder, home folder, home folder  

Repeated: home folder

The query returned this preferences for this user: color black, fried chicken, color black, fried chicken, white shirt, brown color

Repeated: color black,

In essence, what i wanna do is to find "repetitive sentences" to match against".

도움이 되었습니까?

해결책 2

Yes, just utilize \1 in a regex to indicate a repeated matched pattern. I intentionally limitted this regex to only match 2-4 word phrases to limit how hard it'd have to work:

#!usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    if (my @phrases = /\b(\S+(?:\s+\S+){1,3})\s+\1/g) {
        print "$_\n" for @phrases;
    }
}

__DATA__
The great eagle flied high flied high.
The call was done at night was done at night.

Outputs

flied high
was done at night

다른 팁

You haven't defined your question very well. As it stands you could write

my $s = 'The great eagle flied high flied high.';
print qq{"$1"\n} if $s =~ /(.+)\1/;

output

" flied high"

but then, if you apply your second string

my $s = 'The call was done at night was done at night.';
print qq{"$1"\n} if $s =~ /(.+)\1/;

output

"l"

So the solution depends on the dataset that you have. If you can define your problem more tightly then we can help you better.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top