Make the string that generates your expression be in unicode, so that the sequences are interpreted as unicode characters, instead of plain u
, 2
, 0
, and so on. Try the following:
regex = re.compile(u"\s*([\u00a4\u00b7]|[\u2010-\u2017]|" + \
"[\u2020-\u206f]|[\u2300-\u23f3]|[\u25a0-\u25ff]|" + \
"[\u2600-\u26ff]|[\u2700-\u27bf]|[\u2b00-\u2bff])\s*", re.UNICODE)
And you're most probably not using Python 3.*, in which all strings are unicode AFAIK.