This appears to be a bug or feature in the way BeautifulSoup4 handles unknown HTML entity references. As Ignacio says in the comment above, it would be probably be better to pre-process the input and replace the '&' symbols with HTML entities ('&').
But if you don't want to do that for some reason - the only way I could only find a way to fix the problem was by "monkey-patching" the code. This script worked for me (Python 2.73 on Mac OS X):
import bs4
def my_handle_entityref(self, name):
character = bs4.dammit.EntitySubstitution.HTML_ENTITY_TO_CHARACTER.get(name)
if character is not None:
data = character
else:
#the original code mishandles unknown entities (the following commented-out line)
#data = "&%s;" % name
data = "&%s" % name
self.handle_data(data)
bs4.builder._htmlparser.BeautifulSoupHTMLParser.handle_entityref = my_handle_entityref
soup = bs4.BeautifulSoup('AT&T announces new plans')
print soup.text
soup = bs4.BeautifulSoup('AT&TOP announces new plans')
print soup.text
It produces the output:
AT&T announces new plans
AT&TOP announces new plans
You can see the method with the problem here:
http://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/bs4/builder/_htmlparser.py#L81
And the line with the problem here:
http://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/view/head:/bs4/builder/_htmlparser.py#L86