Question

I have some scraped data that varies in format slightly, however in order to standadise it I need to remove anything within the parenthesis including the parenthesis, if they exist that is. I have attempted to useing strip in various ways but to no avail.

Some example data:

Text (te)
Text Text (tes)
Text-Text (te)
Text Text
Text-Text (tes)

And how I need to appear after standardisation:

Text
Text Text
Text-Text
Text Text
Text-Text

Can anyone offer me a solution for this? Thanks SMNALLY

Was it helpful?

Solution 2

Assuming the parenthesis do not nest, and that there is at most one pair per string, try this:

import re
myString = re.sub(r'\(.*\)', '', myString)

A more specific pattern might be:

myString = re.sub(r'\s*\(\w+\)\s*$', '', myString)

The above pattern deletes the whitespace that surrounds the parenthetical expression, and only deletes from the end of the line.

OTHER TIPS

from re import sub
x = sub("(?s)\(.*\)", "", x)

This will remove everything between the parenthesis (including newlines) as well as the parenthesis themselves.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top