Using regex to parse HTML is a very bad practice (see @Lutz Horn's link in the comment).
Use an HTML parser instead. For example, here's how you can set the value of the first td
tag to empty using BeautifulSoup:
Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
from bs4 import BeautifulSoup
data = """
<table>
<tr>
<td WIDTH="49%">
<p><a href="...1.htm"> cell to remove</a></p>
</td>
<td WIDTH="51%">
some text
</td>
</tr>
</table>"""
soup = BeautifulSoup(data, 'html.parser')
cell = soup.table.tr.td
cell.string = ''
cell.attrs = {}
print soup.prettify(formatter='html')
prints:
<table>
<tr>
<td>
</td>
<td width="51%">
some text
</td>
</tr>
</table>
See also:
Hope that helps.