Question

The following is an example of the HTML code I want to parse:

<html>
<body>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>

I am using beautiful soup to parse the HTML code by selecting style8 as follows (where html reads the result of my http request):

html = result.read()
soup = BeautifulSoup(html)

content = soup.select('.style8')

In this example, the content variable returns a list of 4 Tags. I want to check the content.text, which contains the text of each style8 class, for each item in the list if it contains Example and appends that to a variable. If it proceeds through the entire list and Example does not occur within the list, it then appends Not present to the variable.

I have got the following so far:

foo = []

for i, tag in enumerate(content):
    if content[i].text == 'Example':
        foo.append('Example')
        break
    else:
        continue

This will only append Example to foo if it occurs, however it will not append Not Present if it does not occur within the entire list.

Any method of doing so is appreciated, or better way of searching the entire results to check if a string is present would be great

Was it helpful?

Solution 2

If you just want to check whether it was found or not, you could use a simple boolean flag as follow :

foo = []
found = False
for i, tag in enumerate(content):
    if content[i].text == 'Example':
        found = True
        foo.append('Example')
        break
    else:
        continue
if not found:
    foo.append('Not Example')

If I get what you want, this may be a simple approach, though the solution of alecxe looks amazing.

OTHER TIPS

You can use find_all() to find all td elements with class='style8' and use list comprehension to construct the foo list:

from bs4 import BeautifulSoup


html = """<html>
<body>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> Example BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
<td style="PADDING-LEFT:  5px"bgcolor="ffffff" class="style8"> BLAB BLAB BLAB </td>
</body>
</html>"""

soup = BeautifulSoup(html)

foo = ["Example" if "Example" in node.text else "Not Present" 
       for node in soup.find_all('td', {'class': 'style8'})]
print foo

prints:

['Example', 'Not Present', 'Not Present', 'Not Present']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top