Frage

for removing all punctuations from a string, x. i want to use re.findall(), but i've been struggling to know what to write in it.. i know that i can get all the punctuations by writing:

import string
y = string.punctuation

but if i write:

re.findall(y,x) 

it says:

 raise error("multiple repeat")
 sre_constants.error: multiple repeat

can someone explain what exactly we should write in re.findall function?

War es hilfreich?

Lösung

You may not even need RegEx for this. You can simply use translate, like this

import string
print data.translate(None, string.punctuation)

Andere Tipps

Several characters in string.punctuation have special meaning in regular expression. They should be escaped.

>>> import re
>>> string.punctuation
'!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
>>> import re
>>> re.escape(string.punctuation)
'\\!\\"\\#\\$\\%\\&\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^\\_\\`\\{\\|\\}\\~'

And if you want to match any one of them, use character class ([...])

>>> '[{}]'.format(re.escape(string.punctuation))
'[\\!\\"\\#\\$\\%\\&\\\'\\(\\)\\*\\+\\,\\-\\.\\/\\:\\;\\<\\=\\>\\?\\@\\[\\\\\\]\\^\\_\\`\\{\\|\\}\\~]'

>>> import re
>>> pattern = '[{}]'.format(re.escape(string.punctuation))
>>> re.sub(pattern, '', 'Hell,o World.')
'Hello World'
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top