Question

I'd like to use this regular expression for validating IPv6 but I want to understand everything it does https://stackoverflow.com/a/1934546/3112803

^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$/iD

but I don't know what this flag at the end does: /iD. I know the /i flag means ignore case but I can't find what D does anywhere. That answer has been upvoted a lot some I'm assuming its valid, but this post says there is no D flag: https://stackoverflow.com/a/4415233/3112803

I'm trying to use this in PL/SQL and it's not validing any valid string correctly:

if ( REGEXP_LIKE(v,'/^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$/iD') ) then
Was it helpful?

Solution

It's a flag in the PCRE flavour of Regex. See the note on the PHP.net manual page:

http://php.net/manual/en/reference.pcre.pattern.modifiers.php (under the code examples)

D (PCRE_DOLLAR_ENDONLY) - If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.

OTHER TIPS

D flag is only valid in PCRE. Below is quoted from PHP's documentation:

D (PCRE_DOLLAR_ENDONLY)

If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.

Summary

This regex in PCRE flavor matches the following format:

  • IPv6: 2001:0db8:85a3:0000:0000:8a2e:0370:7334
  • IPv6 with leading 0's omitted: 2001:db8:85a3:0:0:8a2e:370:7334
  • IPv6 with longest consecutive groups of 0's (break-tie by leftmost) removed: 2001:db8:85a3::8a2e:370:7334, 2001:db8::1:0:0:1
  • IPv6 dotted-quad notation: ::ffff:192.0.2.128
  • IPv4: 192.0.2.128

Note that plain IPv4 is allowed, probably due to author's decision to support. It can be disallowed easily by removing ? where I commented below.

The regex matches all valid IPv6 according to section 2.2 of RFC 4291. However, it is not suitable for checking whether the IPv6 is in its canonical form as suggested by RFC 5952

Pattern explanation

I use the term hexa-group to refer to a 16-bit group in an IPv6 address that is written in dotted-hexadecimal notation. And deci-group to refer to an 8-bit group in an IPv4 address that is written in dotted-decimal notation.

^
(?>
  (?>
                                    # Below matches expanded IPv6
    ([a-f0-9]{1,4})                 # (Hexa-group) One to 4 hexadecimal digits
    (?>:(?1)){7}                    # Match 7 (: hexa-group)

    |                               # OR

                                    # Below matches shorthand notation :: IPv6
    (?!(?:.*[a-f0-9](?>:|$)){8,})   # Can't find 8 or more hexa-groups ahead
    ((?1)(?>:(?1)){0,6})?           # Match 0 to 7 hexa-groups, delimited by :
    ::                              # ::
    (?2)?                           # Match 0 to 7 hexa-groups, delimited by :
  )
  |
                                 # Below match IPv4 or IPv6 dotted-quad notation
  (?>                            
                                 # Below matches first 96-bit of IPv6
    (?>                          
                                 # Below matches expanded notation
      (?1)(?>:(?1)){5}:          # Match one hexa-group then 5 times (: hexa-group)

      |                          # OR

                                 # Below matches shorthand notation
      (?!(?:.*[a-f0-9]:){6,})    # Can't find 6 or more hexa-groups ahead
      (?3)?                      # Match 0 to 7 hexa-groups, delimited by :
      ::                         # ::
      (?>((?1)(?>:(?1)){0,4}):)? # Match 0 to 7 hexa-groups, delimited by :
    )?                           # Optional, so the regex can also match IPv4

                                 # Below matches IPv4 in dotted-decimal notation
    (25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9]) # (Deci-group) One IPv4 deci-group
    (?>\.(?4)){3}                               # Match 3 (. deci-group)
  )
)
$

You may wonder why I wrote # Match 0 to 7 hexa-groups, delimited by : for the part where you match short-hand notation for a IPv6 dotted-quad notation. It is due to pattern reuse via subroutine call (?3). However, the regex is not wrong: due to the earlier look-ahead (?!(?:.*[a-f0-9]:){6,}), it is not possible to find more than 5 hexa-groups when you are matching short-hand notation for IPv6 dotted-quad notation.

Bug

By the way, there is a bug in the original regex. It fails to match ::129.144.52.38 due to the first non-backtracking group (?>pattern) disallowing backtracking, while the part of the pattern that matches IPv6 shorthand doesn't have sufficient check to make sure there is no IPv6 dotted-quad notation ahead. To put it simply: :: can be a short-hand IPv6 and can also be the prefix to a IPv6 dotted-quad notation, and without backtracking the engine fails to match ::129.144.52.38.

DEMO (Note: g and m flags are for testing purpose)

One quick way to fix is to change the first > to :. All IPv6 should be matched correctly as intended.

DEMO (Note: g and m flags are for testing purpose)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top