The re.sub
function accepts a function as replacement text, which is called in order to get the text that should be substituted in the given match. So you could do this:
import re
def make_regex(countries):
escaped = (re.escape(country) for country in countries)
states = '|'.join(escaped)
return re.compile(r'\s+(of(\sthe)?\s)?(?P<state>{})'.format(states))
def remove_name(match):
name = match.group()
if name.lstrip().startswith('of'):
return name
else:
return name.replace(match.group('state'), '').strip()
regex = make_regex(['China', 'Italy', 'America'])
regex.sub(remove_name, 'Embassy of China, International Italy').strip()
# result: 'Embassy of China, International'
The result might contain some spurious space (in the above case a last strip()
is needed). You can fix this modifying the regex to:
\s*(of(\sthe)?\s)?(?P<state>({}))
To catch the spaces before of
or before the country name and avoid the bad spacing in the output.
Note that this solution can handle a whole text, not just text of the form Something of Country
and Something Country
. For example:
In [38]: regex = make_regex(['China'])
...: text = '''This is more complex than just "Embassy of China" and "International China"'''
In [39]: regex.sub(remove_name, text)
Out[39]: 'This is more complex than just "Embassy of China" and "International"'
an other example usage:
In [33]: countries = [
...: 'China', 'India', 'Denmark', 'New York', 'Guatemala', 'Sudan',
...: 'France', 'Italy', 'Australia', 'New Zealand', 'Brazil',
...: 'Canada', 'Japan', 'Vietnam', 'Middle-Earth', 'Russia',
...: 'Spain', 'Portugal', 'Argentina', 'San Marino'
...: ]
In [34]: template = 'Embassy of {0}, International {0}, Language of {0} is {0}, Government of {0}, {0} capital, Something {0} and something of the {0}.'
In [35]: text = 100 * '\n'.join(template.format(c) for c in countries)
In [36]: regex = make_regex(countries)
...: result = regex.sub(remove_name, text)
In [37]: result[:150]
Out[37]: 'Embassy of China, International, Language of China is, Government of China, capital, Something and something of the China.\nEmbassy of India, Internati'