Using end of word mark with unicode in regular expressions in Python
-
27-09-2019 - |
Question
The following matches in Idle, but does not match when run in a method in a module file:
import re
re.search('\\bשלום\\b','שלום עולם',re.UNICODE)
while the following matches in both cases:
import re
re.search('שלום','שלום עולם',re.UNICODE)
(Notice that stackoverflow erroneously switches the first and second items in the line above as this is a right to left language)
How can I make the first code match inside a py file?
Update: What I should have written for the first segment is that it matches in Idle, but does not match when run in eclipse console with PyDev.
Solution
Seems to work for me when I'm using unicode strings:
# -*- coding: utf-8 -*-
import re
match = re.search(u'\\bשלום\\b', u'שלום עולם', re.U)
See it in action: http://codepad.org/xWz5cZj5
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow