Pregunta

I have this regular expression:

\ba\.?b\.?c\.?\b( something)?

that matches

  • abc
  • a.b.c.
  • a.b.c. something
  • ...

I use it 2 times in order of importance: first I try to add ^ at the begin and $ at the end of the line because I'd like to find a string exactly those cases above. If nothing is found, the constraints are removed and I accepted strings like

  • foo abc foo
  • blah a.b.c. something blah

The problem is in the first case with a.b.c., where the \b mess with the $. So if I use

^\ba\.?b\.?c\.?\b( something)?$

the simple a.b.c. is not matched because the part in the round brackets is "ignored" and the \b near to the $ has a behavior that I cannot understand. On the other hand a.b.c (without the last dot) will match

If I change the second \b with \W everything works but I'm not sure I will match other unwanted string. Any ideas of how I can resolve this with only one regular expression?

I'm using Python if this can be relevant

¿Fue útil?

Solución

The problem simply comes from the meaning of \b (see source). This part \.\b$ will never match anything, as there is no word boundary position to match (the position between a dot and the end of the string is not a word boundary position).
You should try:

^\ba\.?b\.?c\.?(?:\b|$)

instead.

With the "something" part, it'd give:

^\ba\.?b\.?c\.?(?:\b|$)( something)?$

(there's maybe some improvement to do here, but it should work)

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top