Remove text from first cell HTML using python

Question 1

Using regex to parse HTML is a very bad practice (see @Lutz Horn's link in the comment).

Use an HTML parser instead. For example, here's how you can set the value of the first td tag to empty using BeautifulSoup:

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

from bs4 import BeautifulSoup


data = """
<table>
    <tr>
        <td WIDTH="49%">
            <p><a href="...1.htm"> cell to remove</a></p>
        </td>
        <td WIDTH="51%">
            some text
        </td>
    </tr>
</table>"""

soup = BeautifulSoup(data, 'html.parser')
cell = soup.table.tr.td
cell.string = ''
cell.attrs = {}

print soup.prettify(formatter='html')

prints:

<table>
 <tr>
  <td>
  </td>
  <td width="51%">
   some text
  </td>
 </tr>
</table>

See also:

Hope that helps.

Question 2

Using regex to parse HTML is a very bad practice. If you are actually trying to modify HTML, use an HTML parser.

If the question is academic, or you are only trying to make the limited transformation you describe in the question, here is a regex program that will do it:

#!/usr/bin/python
import re
ret = open('rec1.txt').read()
ret = re.sub('<td.*?/td>','<td> </td>',ret, 1, re.DOTALL)
final= open('rec2.txt', 'w')
final.write(ret)
final.close()

Notes:

The expression [/td] means match any one of /, t, or d in any order. Note instead how I used .* to match an arbitrary string followed by /td.
The final, optional, argument to re.sub() is a flags argument. re.DOTALL allows . to match new lines.
The ? means to perform a non-greedy search, so it will only consume one cell.
re.sub() returns the resulting string, it does not modify the string in place.