Question

I'm looking for a regular expression that matches a repetitive pattern.

For example

The great eagle flied high flied high.  

Repeted: flied high

The call was done at night was done at night.  

Repeated: was done at night

Is there a way to achieve this? I just want the regular expression so that I can use a grep -P to filter some files.

Found 5 files under folders: home folder, home folder, home folder, home folder, home folder  

Repeated: home folder

The query returned this preferences for this user: color black, fried chicken, color black, fried chicken, white shirt, brown color

Repeated: color black,

In essence, what i wanna do is to find "repetitive sentences" to match against".

Was it helpful?

Solution 2

Yes, just utilize \1 in a regex to indicate a repeated matched pattern. I intentionally limitted this regex to only match 2-4 word phrases to limit how hard it'd have to work:

#!usr/bin/perl

use strict;
use warnings;

while (<DATA>) {
    if (my @phrases = /\b(\S+(?:\s+\S+){1,3})\s+\1/g) {
        print "$_\n" for @phrases;
    }
}

__DATA__
The great eagle flied high flied high.
The call was done at night was done at night.

Outputs

flied high
was done at night

OTHER TIPS

You haven't defined your question very well. As it stands you could write

my $s = 'The great eagle flied high flied high.';
print qq{"$1"\n} if $s =~ /(.+)\1/;

output

" flied high"

but then, if you apply your second string

my $s = 'The call was done at night was done at night.';
print qq{"$1"\n} if $s =~ /(.+)\1/;

output

"l"

So the solution depends on the dataset that you have. If you can define your problem more tightly then we can help you better.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top