import string
def cleanup(s, remove=('\n', '\t')):
newString = ''
for c in s:
# Remove special characters defined above.
# Then we remove anything that is not printable (for instance \xe2)
# Finally we remove duplicates within the string matching certain characters.
if c in remove: continue
elif not c in string.printable: continue
elif len(newString) > 0 and c == newString[-1] and c in ('\n', ' ', ',', '.'): continue
newString += c
return newString
Throw something like that in there in order to cleanup your code?
The net result is:
>>> s = 'UAE \xe2\x80\xaa>\xe2\x80\xaa\n \n Dubai \xe2\x80\xaa>\xe2\x80\xaa\n \n Business Bay \xe2\x80\xaa>\xe2\x80\xaa\n \n Executive Towers \n \n\n\n \n\n\n\t \n\t \n\t \n\t\n\n\n \n ;\n \n \n \n 1.4 km from Burj Khalifa Tower'
>>> cleanup(s)
'UAE > Dubai > Business Bay > Executive Towers 1.4 km from Burj Khalifa Tower'
Here's a good SO reference to the string library.
Going back to the question is see that the user don't want the first two blocks (between >
) to be present, quite simply do:
area_result = cleanup(area_result).split('>')[3].replace(';', '>')