سؤال

I'm trying to create a query that selects strings that contain the number 21, without hitting strings such as 210 or 121. My efforts below have shown my progress, in that I've managed to apply a constraint for the beginning of the word, but I'm stumped in terms of providing delimiter at the end. I would like to match 21 and 21X where X is a nondigit character. Normally I'd use [] match both situations where there is a nondigit character or the end of the string, but I'm stymied on this.

I can filter based on the start of the word and character classes work:

# SELECT "buildings"."street_name" FROM "buildings"  WHERE (street_name ~ '\m21[s0]');
 street_name
-------------
 21st
 210th

And here I can successfully use the nondigit (\D) constraint:

 # SELECT "buildings"."street_name" FROM "buildings"  WHERE (street_name ~ '\m21\D');
 street_name
-------------
21st

But when I try to use the nondigit constraint to define a class i.e. \m21[\D] (in an effort to match the string '21') it will fail.

How can I define a regex that matches my target (in the case 21) with a non-digit suffix or string terminator ?

هل كانت مفيدة؟

المحلول

You can invert a character class with ^ as first character:

street_name ~ '\m21[^\d]'

Which is exactly the same as:

street_name ~ '\m21\D'

Or the long form:

street_name ~ '\m21[^[:digit:]]

Neither matches at the end of the string. I suggest two branches:

street_name ~ '\m21(\D|$)'

... which implements your requirements literally:

with a non-digit suffix or string terminator

Proper test tool

To check your requirements:

SELECT *
     , street_name ~ '\m21(\D|$)' AS erwin
     , street_name ~ '\m21\D*\M'  AS ruy  -- just without parentheses
FROM  (
   VALUES
   (1, '21'    , 'true'::bool)
 , (2, '21X'   , 't')
 , (3, '21 foo', 't')
 , (4, '21X11' , 't') -- !
 , (5, '121'   , 'f')
 , (6, '210'   , 'f')
 , (7, 'X21'   , 'f')
   ) t(id, street_name, should_match);

What about '21X11': digits *after a non-digit suffix?
My regexp accepts it, @Ruy's regexp rejects it.
As I read your question it should pass?


The (already inverted) class shorthand \D is not allowed inside another character classes. The manual:

Within bracket expressions, \d, \s, and \w lose their outer brackets, and \D, \S, and \W are illegal.

Bold emphasis mine.

You did not rule out 'X21' explicitly, but I assume you want to start with a word boundary like in your example: \m.

نصائح أخرى

SELECT "buildings"."street_name" 
FROM "buildings"  
WHERE (street_name ~ '\m21\D*\M');

Explanation:

  • \m: beginning of word
  • 21: the string to match
  • \D*: zero or more non-digit characters
  • \M: end of word
  • m21\D*\M: a word that starts with 21 followed by any number of non-digits up to the end of the word

If you wanted to limit the query to only ordinal street names (e.g. "21st") and avoid matching things like "21A", you could instead use \m14(st|nd|rd|th)?\M for the pattern.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى dba.stackexchange
scroll top