Domanda

I have this regular expression:

\ba\.?b\.?c\.?\b( something)?

that matches

  • abc
  • a.b.c.
  • a.b.c. something
  • ...

I use it 2 times in order of importance: first I try to add ^ at the begin and $ at the end of the line because I'd like to find a string exactly those cases above. If nothing is found, the constraints are removed and I accepted strings like

  • foo abc foo
  • blah a.b.c. something blah

The problem is in the first case with a.b.c., where the \b mess with the $. So if I use

^\ba\.?b\.?c\.?\b( something)?$

the simple a.b.c. is not matched because the part in the round brackets is "ignored" and the \b near to the $ has a behavior that I cannot understand. On the other hand a.b.c (without the last dot) will match

If I change the second \b with \W everything works but I'm not sure I will match other unwanted string. Any ideas of how I can resolve this with only one regular expression?

I'm using Python if this can be relevant

È stato utile?

Soluzione

The problem simply comes from the meaning of \b (see source). This part \.\b$ will never match anything, as there is no word boundary position to match (the position between a dot and the end of the string is not a word boundary position).
You should try:

^\ba\.?b\.?c\.?(?:\b|$)

instead.

With the "something" part, it'd give:

^\ba\.?b\.?c\.?(?:\b|$)( something)?$

(there's maybe some improvement to do here, but it should work)

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top