You are trying to match against encoded input; raw_input()
in Python 2 always returns a byte string. This means that the terminal, console or IDE you are using determines what encoding is used for the input.
Trying to match non-ASCII characters with a regular expression, using byte strings requires you to match the encoded bytes exactly, which usually means that any change in the terminal environment or your source code editor settings will lead to the match failing.
You want to explicitly decode the raw_input()
here, and use Unicode matching:
import sys
import re
def ordnaText(text):
text = text.lower()
text = re.sub(u'\W', '', text, flags=re.UNICODE)
if text.isalnum() == True:
return text
userinput = raw_input('....')
userinput = userinput.decode(sys.stdin.encoding)
something = ordnaText(userinput)
sys.stdin.encoding
tells you what Python thinks the input codec is. Using flags=re.UNICODE
specifically switches on unicode support in the regular expression engine. And u'\W'
gives the engine a Unicode string literal; the latter is optional but it is better to be explicit.
If you want to learn more about Unicode, encoded byte strings and how it relates to Python, I recommend you read: