Question

I am trying to scrape the following section (only an excerpt) of an XML code. The second form-item is what I'm trying to scrape:

<div class="form-item">
<a href="http://www.avaopera.org" target="_blank" rel="" class="">http://www.avaopera.org</a>
</div>
<div class="form-item">
<script type="text/javascript">
document.write('*[block of text]*')
</script>
<a href="mailto:ademarco@avaopera.org">ademarco@avaopera.org</a>
</div>

I used the following xpath query with the contain function because there are multiple form-item tags: //div[@class='form-item' and contains(.,'@')]/a/text()

This query does not work. I tried removing /a/text() which displays the text within the <script> but not the tag text.

What am I doing wrong?

Was it helpful?

Solution

You're targeting the text within the <div> instead of the text within the <a>, if I understand your goal correctly.

Try using //div[@class='form-item' and contains(a/text(),'@')]/a/text() instead, as this will search the child <a> element within the <div> and not its parent.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top