Question

I tried multiple solutions here, and although they strip some code, they dont seem to work on multiple punctuations ex. "[ or ', This code:

regex = re.compile('[%s]' % re.escape(string.punctuation))
    for i in words:
        while regex.match(i):
            regex.sub('', i)

I got from: Best way to strip punctuation from a string in Python was good but i still encounter problems with double punctuations. I added While loop in hope to ittirate over each word to remove multiple punctuations but that does not seem to work it just gets stuck on the first item "[ and does not exit it

Am I just missing some obvious piece that I am just being oblivious too?

I solved the problem by adding a redundancy and double looping my lists, this takes extremely long time (well into the minutes) due to fairly large sets

I use Python 2.7

Was it helpful?

Solution

Your code doesn't work because regex.match needs the beginning of the string or complete string to match.

Also, you did not do anything with the return value of regex.sub(). sub doesn't work in place, but you need to assign its result to something.

regex.search returns a match if the pattern is found anywhere in the string and works as expected:

import re
import string

words = ['a.bc,,', 'cdd,gf.f.d,fe']

regex = re.compile('[%s]' % re.escape(string.punctuation))
for i in words:
    while regex.search(i):
        i = regex.sub('', i)
    print i

Edit: As pointed out below by @senderle, the while clause isn't necessary and can be left out completely.

OTHER TIPS

this will replace everything not alphanumeric ...

re.sub("[^a-zA-Z0-9 ]","",my_text)


>>> re.sub("[^a-zA-Z0-9 ]","","A [Black. Cat' On a Hot , tin roof!")
'A Black Cat On a Hot  tin roof'

Here is a simple way:

>>> print str.translate("My&& Dog's {{{%!@#%!@#$L&&&ove Sal*mon", None,'~`!@#$%^&*()_+=-[]\|}{;:/><,.?\"\'')
>>> My Dogs Love Salmon

Using this str.translate function will eliminate the punctuation. I usually use this for eliminating numbers from DNA sequence reads.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top