Question

I have this example string: happy t00 go 129.129 and I want to keep only the spaces and letters. All I have been able to come up with so far that is pretty efficient is:

print(re.sub("\d", "", 'happy t00 go 129.129'.replace('.', '')))

but it is only specific to my example string. How can remove all characters other than letters and spaces?

Was it helpful?

Solution

whitelist = set('abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ')
myStr = "happy t00 go 129.129$%^&*("
answer = ''.join(filter(whitelist.__contains__, myStr))

Output:

>>> answer
'happy t go '

OTHER TIPS

Use a set complement:

re.sub(r'[^a-zA-Z ]+', '', 'happy t00 go 129.129')

Slight variation on inspectorG4dget's method - import from string & generator comprehension:

from string import ascii_letters

allowed = set(ascii_letters + ' ')
myStr = 'happy t00 go 129.129'
answer = ''.join(l for l in myStr if l in allowed)
answer
# >>> 'happy t go '

Performance comparison:

(I made myStr a bit longer and pre-compiled the regex to make things a bit more interesting)

import re
from string import ascii_letters, digits
myStr = 'happy t00 go 129.129'*20
allowed = set(ascii_letters + ' ')

# Generator
%timeit answer = ''.join(l for l in myStr if l in allowed)

# filter/__contains__
%timeit answer = ''.join(filter(allowed.__contains__, myStr))

# Regex
pat = re.compile(r'[^a-zA-Z ]+')
%timeit answer = re.sub(pat, '', myStr)

53 µs ± 6.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
43.3 µs ± 7.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
26 µs ± 509 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top