سؤال

I posted the same on stackoverflow (please, let me know if I have to delete one).

I'm working on a DB2 database and as far as I can see regexp is not supported (without additional libraries).

So I cannot implement something similar to what is explained in this article "Bringing the Power of Regular Expression Matching to SQL"

Do you know if I can "emulate", with a SQL statement, a regular expression like this?

^a[aofdmep]{1}[a-z]{1}[a-z0-9]{4}[sidbfkfpo]{1}

Edit

In the above hypothesis I found that it's acceptable for my case a WHERE Like predicate:

WHERE USER_NAME NOT LIKE 'a_______'

But it's a unsafe and doesn't cover other cases where I don't have a fixed char that I can match.

هل كانت مفيدة؟

المحلول

First off, the {1}s are redundant, so it's really just:

^[aofdmep][a-z][a-z0-9]{4}a[sidbfkfpo]

This is actually a pretty simple patterh ... it's just looking at what the first 8 characters of the string are, and it's always a fixed length string, so you could possibly build a table of all of the permutations, and then do:

WHERE SUBSTR(string_to_match, 1, 8) IN (
  SELECT valid_prefixes FROM table_of_things_to_match
)

Unfortunately, it's 7*26*36^4*9 ... 2.75 billion possible combinations, but, still, fixed strings, so you could do:

WHERE SUBSTR(string_to_match, 1, 1) IN ('a','o','f','d','m','e','p')
  AND SUBSTR(string_to_match, 2, 1) IN ('a','b','c','d' ... 'z')
  AND SUBSTR(string_to_match, 3, 1) IN ('a','b','c','d' ... 'z','0','1' ... '9')
  AND SUBSTR(string_to_match, 4, 1) IN ('a','b','c','d' ... 'z','0','1' ... '9')
  AND SUBSTR(string_to_match, 5, 1) IN ('a','b','c','d' ... 'z','0','1' ... '9')
  AND SUBSTR(string_to_match, 6, 1) IN ('a','b','c','d' ... 'z','0','1' ... '9')
  AND SUBSTR(string_to_match, 7, 1) = 'a'
  AND SUBSTR(string_to_match, 8, 1) IN ('s','i','d','b','f','k','p','o')

(fill in the ... bits, of course)

oops, there's two fs in that last character class, so there's only 2.45 billion permutations.

I'm not going to pretend that it's going to be fast ... it likely won't be, but it'll get you the pattern you're looking for. If you tend to do this a lot, I'd likely build a table of characters so you had an easy way to select out alpha / numeric or alpha-numeric.

نصائح أخرى

Since this old question has been dug up anyway, I'll mention that you can use the built-in XQuery support in DB2 for regular expression matching, something along the lines of

select whatever
from users
where
   xmlcast(
     xmlquery('fn:matches($USER_NAME,"^a[aofdmep][a-z][a-z0-9]{4}[sidbfkfpo]")') 
     as integer) = 1

XMLQUERY above calls the XQuery matches function against the column USER_NAME. The result is an XML boolean, so the XMLCAST is used to convert it to an SQL data type.

SELECT * FROM
   (SELECT 'afr923zs' MyString FROM SYSIBM.SYSDUMMY1)
WHERE substr(MyString,1,1) = 'a' 
AND   substr(MyString,2,1) IN ('a','o','f','d','m','e','p')
AND   substr(MyString,3,1) BETWEEN 'a' AND 'z'
AND  (substr(MyString,4,1) BETWEEN 'a' AND 'z' 
   OR substr(MyString,4,1) BETWEEN '0' AND '9')
AND  (substr(MyString,5,1) BETWEEN 'a' AND 'z' 
   OR substr(MyString,5,1) BETWEEN '0' AND '9')
AND  (substr(MyString,6,1) BETWEEN 'a' AND 'z' 
   OR substr(MyString,6,1) BETWEEN '0' AND '9')
AND  (substr(MyString,7,1) BETWEEN 'a' AND 'z' 
   OR substr(MyString,7,1) BETWEEN '0' AND '9')
AND   substr(MyString,8,1) IN ('s','i','d','b','f','k','p','o');

Building on Leigh Riffel and Joe's answers, you might consider using LIKE when you have a long list of individual characters, or when you have multiple character ranges.

SELECT *
  FROM (SELECT 'afr923zs' MyString FROM SYSIBM.SYSDUMMY1) T
 WHERE substr(MyString,1,1) = 'a'
   AND 'aofdmep' like '%'||substr(MyString,2,1)||'%'
   AND substr(MyString,3,1) BETWEEN 'a' AND 'z'
   AND 'abcdefghijklmnopqrstuvwxyz0123456789' like '%'||substr(MyString,4,1)||'%'
   AND 'abcdefghijklmnopqrstuvwxyz0123456789' like '%'||substr(MyString,5,1)||'%'
   AND 'abcdefghijklmnopqrstuvwxyz0123456789' like '%'||substr(MyString,6,1)||'%'
   AND 'abcdefghijklmnopqrstuvwxyz0123456789' like '%'||substr(MyString,7,1)||'%'
   AND 'sidbfkpo' like '%'||substr(MyString,8,1)||'%'
;

Since you use the same list of characters multiple times, you might also consider using a CROSS JOINed constant column.

SELECT *
  FROM (SELECT 'afr923zs' MyString FROM SYSIBM.SYSDUMMY1) T
  CROSS JOIN (SELECT 'abcdefghijklmnopqrstuvwxyz0123456789' alphanum FROM SYSIBM.SYSDUMMY1) T2
 WHERE substr(MyString,1,1) = 'a'
   AND 'aofdmep' like '%'||substr(MyString,2,1)||'%'
   AND substr(MyString,3,1) BETWEEN 'a' AND 'z'
   AND alphanum like '%'||substr(MyString,4,1)||'%'
   AND alphanum like '%'||substr(MyString,5,1)||'%'
   AND alphanum like '%'||substr(MyString,6,1)||'%'
   AND alphanum like '%'||substr(MyString,7,1)||'%'
   AND 'sidbfkpo' like '%'||substr(MyString,8,1)||'%'
;

Not needed for your example, but the CROSS JOINed "table" could define multiple named character class columns.

REGEXP_LIKE is now available in DB2 for iSeries - see: http://www.itjungle.com/fhg/fhg051915-story01.html

In DB2 for z/OS the SQL would include PASSING as follows

select whatever
from users
where
   xmlcast(
     xmlquery('fn:matches($USER_NAME,"^a[aofdmep][a-z][a-z0-9]{4}[sidbfkfpo]")'
      PASSING USER as "USER_NAME") 
     as integer) = 1
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى dba.stackexchange
scroll top