سؤال

What would be a short simple way to cleanup an user entered string. Here is code I rely on while cleaning up a mess. It would be great if a shorter smarter version of it would be available.

invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
for c in invalid: 
    if len(line)>0: line=line.replace(c,'')

PS How would I put this for (with nested if) function onto a single line?

هل كانت مفيدة؟

المحلول

Fastest way to do this is to use str.translate:

>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> s = '@#$%^&*fdsfs#$%^&*FGHGJ'
>>> s.translate(None, ''.join(invalid))
'fdsfsFGHGJ'

Timing comparisons:

>>> s = '@#$%^&*fdsfs#$%^&*FGHGJ'*100

>>> %timeit re.sub('[#@$%^&*()-+!]', '', s)
1000 loops, best of 3: 766 µs per loop

>>> %timeit re.sub('[#@$%^&*()-+!]+', '', s)
1000 loops, best of 3: 215 µs per loop

>>> %timeit "".join(c for c in s if c not in invalid)
100 loops, best of 3: 1.29 ms per loop

>>> %timeit re.sub(invalid_re, '', s)
1000 loops, best of 3: 718 µs per loop

>>> %timeit s.translate(None, ''.join(invalid))         #Winner
10000 loops, best of 3: 17 µs per loop

On Python3 you need to do something like this:

>>> trans_tab = {ord(x):None for x in invalid}
>>> s.translate(trans_tab)
'fdsfsFGHGJ'

نصائح أخرى

import re
re.sub('[#@$%^&*()-+!]', '', line)

re is the regular expression module. Using square brackets means "match any one of these things inside the brackets". So the call says, "find anything in line inside the brackets and replace it with nothing ('').

You can do it like this:

from string import punctuation # !"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~

line = "".join(c for c in line if c not in punctuation)

For example:

'hello, I @m pleased to meet you! How *about (you) try something > new?'

becomes

'hello I m pleased to meet you How about you try something  new'

This is one case in which a regex actually is useful.

>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> import re
>>> invalid_re = '|'.join(map(re.escape, invalid))
>>> re.sub(invalid_re, '', 'foo * bar')
'foobar'

Here's a snippet that I use in my own code. You're basically using regex to specify what characters are allowed, matching on those, and then concatenating them together.

import re

def clean(string_to_clean, valid='ACDEFGHIKLMNPQRSTVWY'):
    """Remove unwanted characters from string.

    Args:
    clean: (str) The string from which to remove
     unwanted characters.

     valid_chars: (str) The characters that are valid and should be
     included in the returned sequence. Default character
     set is: 'ACDEFGHIKLMNPQRSTVWY'.

     Returns: (str) A sequence without the invalid characters, as a string.

     """
    valid_string = r'([{}]+)'.format(valid)
    valid_regex = re.compile(valid_string, re.IGNORECASE)

    # Create string of matching characters, concatenate to string
    # with join().
    return (''.join(valid_regex.findall(string_to_clean)))

Use a simple list comprehension:

>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> "".join(i for i in x if i not in invalid)
'foobar'

Use list comprehension with string.punctuation+\s:

>>> import string
>>> x = 'foo * bar'
>>> "".join(i for i in x if i not in string.punctuation)
'foo  bar'
>>> "".join(i for i in x if i not in string.punctuation+" ")
'foobar'

Use str.translate:

>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> x.translate(None,"".join(invalid))
'foobar'

Use re.sub:

>>> import re
>>> invalid = ['#','@','$','$','%','^','&','*','(',')','-','+','!',' ']
>>> x = 'foo * bar'
>>> y = "["+"".join(invalid)+"]"
>>> re.sub(y,'',x)
'foobar'
>>> re.sub(y+'+','',x)
'foobar'

This works

invalid = '#@$%^_ '
line = "#master_Of^Puppets#@$%Yeah"
line = "".join([for l in line if l not in invalid])
#line will be - 'masterOfPuppetsYeah'
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top