I solved this by adding xml
argument. Make sure you have lxml
installed.
soup = BeautifulSoup(xmlData, 'xml')
Question
The method findAll() in BeautifulSoup does not return all elements in XML. If you look the code below and open URL, you can see that there are 10 PubmedArticle nodes in XML. However the findAll method only finds 6 of them. There is only 6 * on the output instead of 10. What am I doing wrong?
import urllib2
from bs4 import BeautifulSoup
URL = 'http://www.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&rettype=abstract&id=23858559,23858558,23858557,23858521,23858508,23858506,23858494,23858473,23858461,23858404'
data = urllib2.urlopen(URL).read()
soup = BeautifulSoup(data)
for x in soup.findAll('pubmedarticle'):
print '*'
Solution 2
I solved this by adding xml
argument. Make sure you have lxml
installed.
soup = BeautifulSoup(xmlData, 'xml')
OTHER TIPS
Edit: I've discovered that 'findAll' is relative to the current node, you can set the root node with soup.
The entities in the provided xml are named "PubMedArticle", so try with the following:
for x in soup.pubmedarticleset.findAll('pubmedarticle'):
print '*'