Regex for matching duplicate consecutive punctuation characters with the exception of 3 periods

StackOverflow https://stackoverflow.com/questions/18821518

  •  28-06-2022
  •  | 
  •  

Question

I have regex

(\p{P})\1 

which successfully matches duplicate consecutive punctuation characters like

;;
,,
\\

, but i need to exclude 3 period (ellipsis) punctuation.

...
Was it helpful?

Solution

Be careful, as some approaches will not successfully match strings of the form .## (i.e. a '.' before repeating punctuation). Assuming that is something that should match.

This solution satisfies the following requirements: -

  1. Repeated punctuation is matched.
  2. Ellipsis (...) is not matched.
  3. Two dots (..) and four or more dots are matched.
  4. Repeated punctuation is matched when preceded or followed by dots, e.g. .##

This is the regex:

(?>(\p{P})\1+)(?<!([^.]|^)\.{3})

Explanation:

  • ?> means atomic grouping. Specifically, throw away all backtracking positions. It means that if '...' fails to match, then don't step back and try and match '..'.
  • (\p{P})\1+) means match 2 or more punctuation characters - you already had this.
  • (?<!([^.]|^)\.{3}) means search backwards from the end of the repeated character match and fail if you find three dots not preceded by a dot or beginning of string. This fails three dots while allowing two dots or four dots or more to work.

The following test cases pass and illustrate use:

string pattern = @"(?>(\p{P})\1+)(?<!([^.]|^)\.{3})";

//Your examples:
Assert.IsTrue( Regex.IsMatch( @";;", pattern ) );
Assert.IsTrue( Regex.IsMatch( @",,", pattern ) );
Assert.IsTrue( Regex.IsMatch( @"\\", pattern ) );
//two and four dots should match
Assert.IsTrue( Regex.IsMatch( @"..", pattern ) );
Assert.IsTrue( Regex.IsMatch( @"....", pattern ) );

//Some success variations
Assert.IsTrue( Regex.IsMatch( @".;;", pattern ) );
Assert.IsTrue( Regex.IsMatch( @";;.", pattern ) );
Assert.IsTrue( Regex.IsMatch( @";;///", pattern ) );            
Assert.IsTrue( Regex.IsMatch( @";;;...//", pattern ) ); //If you use Regex.Matches the matches contains ;;; and // but not ...
Assert.IsTrue( Regex.IsMatch( @"...;;;//", pattern ) ); //If you use Regex.Matches the matches contains ;;; and // but not ...            

//Three dots should not match
Assert.IsFalse( Regex.IsMatch( @"...", pattern ) );
Assert.IsFalse( Regex.IsMatch( @"a...", pattern ) );
Assert.IsFalse( Regex.IsMatch( @";...;", pattern ) );                        

//Other tests
Assert.IsFalse( Regex.IsMatch( @".", pattern ) );
Assert.IsFalse( Regex.IsMatch( @";,;,;,;,", pattern ) );  //single punctuation does not match                        
Assert.IsTrue( Regex.IsMatch( @".;;.", pattern ) );
Assert.IsTrue( Regex.IsMatch( @"......", pattern ) );                                       
Assert.IsTrue( Regex.IsMatch( @"a....a", pattern ) );
Assert.IsFalse( Regex.IsMatch( @"abcde", pattern ) );     

OTHER TIPS

To avoid matching ...

(?<![.])(?![.]{3})(\p{P})\1
(?<!\.)(?!\.{3}(?!\.))(\p{P})\1+

This will match any repeated punctuation (including .... or ...... etc) unless it is the string .... For example:

; -- No Match
;; -- Match
,, -- Match
,,,, -- Match
\\ -- Match
... -- No Match
.... -- Match
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top