Вопрос

for removing all punctuations from a string, x. i want to use re.findall(), but i've been struggling to know what to write in it.. i know that i can get all the punctuations by writing:

import string
y = string.punctuation

but if i write:

re.findall(y,x) 

it says:

 raise error("multiple repeat")
 sre_constants.error: multiple repeat

can someone explain what exactly we should write in re.findall function?

Это было полезно?

Решение

You may not even need RegEx for this. You can simply use translate, like this

import string
print data.translate(None, string.punctuation)

Другие советы

Several characters in string.punctuation have special meaning in regular expression. They should be escaped.

>>> import re
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> import re
>>> re.escape(string.punctuation)
'\\!\\"\\#\\$\\%\\&\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^\\_\\`\\{\\|\\}\\~'

And if you want to match any one of them, use character class ([...])

>>> '[{}]'.format(re.escape(string.punctuation))
'[\\!\\"\\#\\$\\%\\&\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^\\_\\`\\{\\|\\}\\~]'

>>> import re
>>> pattern = '[{}]'.format(re.escape(string.punctuation))
>>> re.sub(pattern, '', 'Hell,o World.')
'Hello World'
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top