سؤال

for removing all punctuations from a string, x. i want to use re.findall(), but i've been struggling to know what to write in it.. i know that i can get all the punctuations by writing:

import string
y = string.punctuation

but if i write:

re.findall(y,x) 

it says:

 raise error("multiple repeat")
 sre_constants.error: multiple repeat

can someone explain what exactly we should write in re.findall function?

هل كانت مفيدة؟

المحلول

You may not even need RegEx for this. You can simply use translate, like this

import string
print data.translate(None, string.punctuation)

نصائح أخرى

Several characters in string.punctuation have special meaning in regular expression. They should be escaped.

>>> import re
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> import re
>>> re.escape(string.punctuation)
'\\!\\"\\#\\$\\%\\&\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^\\_\\`\\{\\|\\}\\~'

And if you want to match any one of them, use character class ([...])

>>> '[{}]'.format(re.escape(string.punctuation))
'[\\!\\"\\#\\$\\%\\&\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^\\_\\`\\{\\|\\}\\~]'

>>> import re
>>> pattern = '[{}]'.format(re.escape(string.punctuation))
>>> re.sub(pattern, '', 'Hell,o World.')
'Hello World'
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top