Using end of word mark with unicode in regular expressions in Python

Question

The following matches in Idle, but does not match when run in a method in a module file:

import re
re.search('\\bשלום\\b','שלום עולם',re.UNICODE)

while the following matches in both cases:

import re
re.search('שלום','שלום עולם',re.UNICODE)

(Notice that stackoverflow erroneously switches the first and second items in the line above as this is a right to left language)

How can I make the first code match inside a py file?

Update: What I should have written for the first segment is that it matches in Idle, but does not match when run in eclipse console with PyDev.

Solution

Seems to work for me when I'm using unicode strings:

# -*- coding: utf-8 -*-

import re
match = re.search(u'\\bשלום\\b', u'שלום עולם', re.U)

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow