سؤال

I have some scraped data that varies in format slightly, however in order to standadise it I need to remove anything within the parenthesis including the parenthesis, if they exist that is. I have attempted to useing strip in various ways but to no avail.

Some example data:

Text (te)
Text Text (tes)
Text-Text (te)
Text Text
Text-Text (tes)

And how I need to appear after standardisation:

Text
Text Text
Text-Text
Text Text
Text-Text

Can anyone offer me a solution for this? Thanks SMNALLY

هل كانت مفيدة؟

المحلول 2

Assuming the parenthesis do not nest, and that there is at most one pair per string, try this:

import re
myString = re.sub(r'\(.*\)', '', myString)

A more specific pattern might be:

myString = re.sub(r'\s*\(\w+\)\s*$', '', myString)

The above pattern deletes the whitespace that surrounds the parenthetical expression, and only deletes from the end of the line.

نصائح أخرى

from re import sub
x = sub("(?s)\(.*\)", "", x)

This will remove everything between the parenthesis (including newlines) as well as the parenthesis themselves.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top