Question

I can't seem to find out why this regular expression is not working in PL/SQL.

if ( REGEXP_LIKE(v,'/^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$/iD') ) then

It's for validating IPv4 and IPv6, it came from here: https://stackoverflow.com/a/1934546/3112803

Not sure if this has anything to do with it but I also asked this question about the D flag on the end: What Does This Regular Expression (RegEx) Flag Mean /iD

For some reason this regular expression works for most of my tests on this site: http://regex101.com/ but in PL/SQL everything is invalid.

What I mean by most is that there are some cases where I find it fails, but I've been searching for days and this is the best one I could find that is under 512 characters (512 is the limit when using REGEXP_LIKE in PL/SQL)

I'd appreciate any help. Thanks!

These are the test cases I'm using...

{1: Initial address, regex should say valid/match}
select isValid('2001:0db8:0000:0000:0000:ff00:0042:8329','ipv6') from dual;

{2: After removing all leading zeroes, regex should say valid/match}
select isValid('2001:db8:0:0:0:ff00:42:8329','ipv6') from dual;

{3: After omitting consecutive sections of zeroes, regex should say valid/match}
select isValid('2001:db8::ff00:42:8329','ipv6') from dual;

{4: The loopback address, regex should say valid/match}
select isValid('0000:0000:0000:0000:0000:0000:0000:0001','ipv6') from dual;

{5: The loopback address be abbreviated to ::1 by using both rules, regex should say valid/match}
select isValid('::1','ipv6') from dual;

{6: This should be valid/match}
select isValid('ABCD:ABCD:ABCD:ABCD:ABCD:ABCD:192.168.158.190','ipv6') from dual;

{7: This should be valid/match}
select isValid('::','ipv6') from dual;

{8: IPv6 applications to communicate directly with IPv4 applications, regex should say valid/match}
select isValid('0:0:0:0:0:ffff:192.1.56.10','ipv6') from dual;

{9: should NOT be valid/match}
select isValid('::ffff:192.1.56.10/96','ipv6') from dual;

{old formats used for tunneling, these should NOT be valid/matches}
{10}
select isValid('0:0:0:0:0:0:192.1.56.10','ipv6') from dual;
{11}
select isValid('::192.1.56.10/96','ipv6') from dual;

{These 4 should be valid/match}
{12}
select isValid('::FFFF:129.144.52.38','ipv6') from dual;
{13}
select isValid('::129.144.52.38','ipv6') from dual;
{14}
select isValid('::FFFF:d','ipv6') from dual;
{15}
select isValid('1080:0:0:0:8:800:200C:417A','ipv6') from dual;

{These 4 should NOT be valid/match}
{16}
select isValid('::FFFF:d.d.d','ipv6') from dual;
{17}
select isValid('::FFFF:d.d','ipv6') from dual;
{18}
select isValid('::d.d.d','ipv6') from dual;
{19}
select isValid('::d.d','ipv6') from dual;

I was told test #6 was wrong, ABCD:ABCD:ABCD:ABCD:ABCD:ABCD:192.168.158.190 is not a valid IPv6 address, is that correct?

Test cases 8-11 came from here: http://publib.boulder.ibm.com/infocenter/iseries/v5r3/index.jsp?topic=%2Frzai2%2Frzai2ipv6addrformat.htm but I was told 10&11 are no longer used.

Was it helpful?

Solution

Instead of doing everything in a single regex, it is better to break the regex into smaller ones and test them:

if (
    /* IPv6 expanded */
    REGEX_LIKE(v, '\A[a-f0-9]{1,4}(:[a-f0-9]{1,4}){7}\z', 'i')
    /* IPv6 shorthand */
    OR (NOT REGEX_LIKE(v, '\A(.*?[a-f0-9](:|\z)){8}', 'i')
        AND REGEX_LIKE(v, '\A([a-f0-9]{1,4}(:[a-f0-9]{1,4}){0,6})?::([a-f0-9]{1,4}(:[a-f0-9]{1,4}){0,6})?\z', 'i'))
    /* IPv6 dotted-quad notation, expanded */
    OR REGEX_LIKE(v, '\A[a-f0-9]{1,4}(:[a-f0-9]{1,4}){5}:(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}\z', 'i')
    /* IPv6 dotted-quad notation, shorthand */
    OR (NOT REGEX_LIKE(v, '\A(.*?[a-f0-9]:){6}', 'i')
        AND REGEX_LIKE(v, '\A([a-f0-9]{1,4}(:[a-f0-9]{1,4}){0,4})?::([a-f0-9]{1,4}:){0,5}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(\.(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])){3}\z', 'i'))
   ) then

This only tests for IPv6. IPv4 is not allowed.

Since PL/SQL flavor doesn't have subroutine calls (?n), there is no choice but to expand everything out. And the lack of negative look-ahead (?!pattern) forces us to simulate it with 2 regex testing operations.

\A and \z are used for matching beginning and the end of the string, since both of them are not affected by flags, and \z behavior is the same as $ under D mode in PCRE.

OTHER TIPS

You have to get rid of the / at the start and the /iD from the end this is part of the perl syntax indicating it is a regex.

the i switch at the end means ignore case and can be given as an extra argument of your regexp_like, so:

if ( REGEXP_LIKE(v,'^(?>(?>([a-f0-9]{1,4})(?>:(?1)){7}|(?!(?:.*[a-f0-9](?>:|$)){8,})((?1)(?>:(?1)){0,6})?::(?2)?)|(?>(?>(?1)(?>:(?1)){5}:|(?!(?:.*[a-f0-9]:){6,})(?3)?::(?>((?1)(?>:(?1)){0,4}):)?)?(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])(?>\.(?4)){3}))$','i') ) the

There are more issues, as perl regular expressions are not 100% equivalent to oracle regular expressions, and I see patter used here that are not available like ?> Maybe you can split up the regexp between ipv4 and ipv6 to avoid hitting the limit in oracle. And just do REGEXP_LIKE(ip,'ipv4pattern') or REGEXP_LIKE(ip,'ipv6pattern')

Adjusting the ipv4 part of the above regex to something that works in oracle gives me:

REGEXP_LIKE(ip,'^((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\.){3}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])$','i')
REGEXP_LIKE(ip,'^(([\dA-F]{1,4}:([\dA-F]{1,4}:([\dA-F]{1,4}:([\dA-F]{1,4}:([\dA-F]{1,4}:[\dA-F]{0,4}|:[\dA-F]{1,4})?|(:[\dA-F]{1,4}){0,2})|(:[\dA-F]{1,4}){0,3})|(:[\dA-F]{1,4}){0,4})|:(:[\dA-F]{1,4}){0,5})((:[\dA-F]{1,4}){2}|:(25[0-5]|(2[0-4]|1\d|[1-9])?\d)(\.(25[0-5]|(2[0-4]|1\d|[1-9])?\d)){3})|(([\dA-F]{1,4}:){1,6}|:):[\dA-F]{0,4}|([\dA-F]{1,4}:){7}:)\z', 'i')

Modified from the XML regex at http://home.deds.nl/~aeron/regex/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top