Question

As it says all. Is there anyway to search entire DOM for a specific text, for instance CAPTCHA word?

Was it helpful?

Solution

You can use find and specify the text argument:

With text you can search for strings instead of tags. As with name and the keyword arguments, you can pass in a string, a regular expression, a list, a function, or the value True.

>>> from bs4 import BeautifulSoup
>>> data = """
... <div>test1</div>
... <div class="myclass1">test2</div>
... <div class="myclass2">CAPTCHA</div>
... <div class="myclass3">test3</div>"""
>>> soup = BeautifulSoup(data)
>>> soup.find(text='CAPTCHA').parent
<div class="myclass2">CAPTCHA</div>

If CAPTCHA is just a part of a text, you can pass a lambda function into text and check if CAPTCHA is inside the tag text:

>>> data = """
... <div>test1</div>
... <div class="myclass1">test2</div>
... <div class="myclass2">Here CAPTCHA is a part of a sentence</div>
... <div class="myclass3">test3</div>"""
>>> soup = BeautifulSoup(data)
>>> soup.find(text=lambda x: 'CAPTCHA' in x).parent
<div class="myclass2">Here CAPTCHA is a part of a sentence</div>

Or, the same can be achieved if you pass a regular expression into text:

>>> import re
>>> soup.find(text=re.compile('CAPTCHA')).parent
<div class="myclass2">Here CAPTCHA is a part of a sentence</div>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top