Question

I use part of code to read a website and scrap some information and place it into Google and print some directions.

I'm having an issue as some of the information. the site i use sometimes adds a # followed by 3 random numbers then a / and another 3 numbers e.g #037/100

how can i use python to ignore this "#037/100" string?

I currently use

for i, part in enumerate(list(addr_p)):
        if '#' in part:
                del addr_p[i]
                break

to remove the # if found but I'm not sure how to do it for the random numbers

Any ideas ?

Was it helpful?

Solution

If you find yourself wanting to remove "three digits followed by a forward slash followed by three digits" from a string s, you could do

import re
s = "this is a string #123/234 with other stuff"
t = re.sub('#\d{3}\/\d{3}', '', s)
print t

Result:

'this is a string  with other stuff'

Explanation:

#    - literal character '#'
\d{3} - exactly three digits
\/    - forward slash (escaped since it can have special meaning)
\d{3} - exactly three digits

And the whole thing that matches the above (if it's present) is replaced with '' - i.e. "removed".

OTHER TIPS

import re

re.sub('#[0-9]+\/[0-9]+$', '', addr_p[i])

I'm no wizzard with regular expressions but i'd imagine you could so something like this. You could even handle '@' in the regexp as well.

If the format is always the same, then you could check if the line starts with a #, then set the string to itself without the first 8 characters.

if part[0:1] == '#': part = part[8:]

if the first letter is a #, it sets the string to itself, from the 8th character to the end.

I'd double your problems and match against a regular expression for this.

import re

regex = re.compile(r'([\w\s]+)#\d+\/\d+([\w\s]+)')
m = regex.match('This is a string with a #123/987 in it')
if m: 
    s = m.group(1) + m.group(2)
    print(s)

A more concise way:

import re
s = "this is a string #123/234 with other stuff"
t = re.sub(r'#\S+', '', s)
print(t)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top