Question

This question comes in an attempt to understand one of the answer in : How to check that a string is a palindrome using regular expressions?

Answer given by Markus Jarderot is :

/^((.)(?1)\2|.?)$/

Can someone please explain, whats exactly happening here....i need to do similar in Perl, but not able to understand this solution!!!

PS : I am not very good in perl so please go easy ....and also "this can't be considered a regular expression if you want to be strict" - i read this line, so i am aware that this not regex strictly

Was it helpful?

Solution

  • ^ - matches beginning of string
  • ( - starts capture group #1
  • (.) - matches any single character except a newline, save it in capture group #2
  • (?1) - recurse = replace this group with the entire regexp capture group #1
  • \2 - matches the same thing as capture group #2. This requires the first and last characters of the string to match each other
  • | - creates an alternative
  • .? - optionally matches any one character that isn't a newline - This handles the end of the recursion, by matching an empty string (when the whole string is an even length) or a single character (when it's an odd length)
  • ) - ends capture group #1
  • $ - matches end of string or before a newline at the end of the string.

The recursion (?1) is the key. A palindrome is an empty string, a 1-character string, or a string whose first and last characters are the same and the substring between them is also a palindrome.

OTHER TIPS

It might be easier to understand with this analogous function, that does the same thing for arrays:

sub palindrome {
  if (scalar(@_) >= 2) {
    my $first_dot = shift;
    my $slash_two = pop;
    return $first_dot eq $slash_two && palindrome(@_);
  } else {
    # zero or one items
    return 1;
  }
}

print "yes!\n" if palindrome(qw(one two three two one));
print "really?\n" if palindrome(qw(one two three two two one));

The (?1) notation is a recursive reference to the start of the first parenthesis in the regex, the \2 is a backreference in the current recursion to the (.). Those two are anchored at the start and end of 'whatever is matching at the current recursion depth', so everything else is matched at the next depth down.


ikegami suspects this is faster:

sub palindrome {
   my $next = 0;
   my %symbols;
   my $s = join '', map chr( $symbols{$_} ||= $next++ ), @_;
   return $s =~ /^((.)(?1)\2|.?)\z/s;
}

I made this regEx few days ago. If you use it like this it will give you an array of all palindromes in a certain text. The example is for #JavaScript but you can use the regEx itself in any language to do the job. Works perfect for words to 21 chars or numbers to 21 digits. You can make it more accurate if you need to.


const palindromeFinder = /\b(\w?)(\w?)(\w?)(\w?)(\w?)(\w?)(\w?)(\w?)(\w?)(\w)\S?\10\9\8\7\6\5\4\3\2\1\b/g;

console.log(inputString.match(palindromeFinder));
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top