When you call soup.tag.extract()
, BeautifulSoup removes and returns the first instance of tag
from the soup. Observe the following:
from bs4 import BeautifulSoup
soup = BeautifulSoup('''
<frame src='foo'>Spam</frame>
<frame src='bar'>Eggs</frame>
''')
print(soup)
soup.frame.extract()
print(soup)
This gives the following output:
<frame src="foo">Spam</frame>
<frame src="bar">Eggs</frame>
<frame src="bar">Eggs</frame>
I'm guessing this isn't the behavior you want - the first try
block is kicking the frame
out of the soup, and so it isn't available to the second try
block. You probably want to keep the soup intact, in which case, you shouldn't use .extract()
. Replace your calls to soup.frame.extract()
with just references to frame
(the variable in your for
loop).
That is, change these lines:
t_iFrames_src.append(force_text(soup.frame.extract().get("src"), encoding='utf-8', strings_only=False, errors='strict'))
t_full_frame.append(force_text(soup.frame.extract(), encoding='utf-8', strings_only=False, errors='strict'))
to these lines:
t_iFrames_src.append(force_text(frame.get("src"), encoding='utf-8', strings_only=False, errors='strict'))
^^^^^
t_full_frame.append(force_text(frame, encoding='utf-8', strings_only=False, errors='strict'))
^^^^^