Question

I have some code that accesses articles in PubMed and parses some information from each XML. The program works fine on my computer, but needs a lot of time to finish. So when i run it on a unix machine especially for this kind of things, every request i make is blocked. There is a limit to how many can be made each minute before the machine considers it a virus, but that is not the problem, as all the requests are blocked. I checked and this happens only with requests at the PubMed site.

Thanks in advance

EDIT: I use jsoup for connection. Running wget with ProcessBuilder from my program works without getting blocked but then efficiency becomes an issue because the output from wget can only be read with while(br.readline() != null) and this ads a lot of run time.

Was it helpful?

Solution

Instead of accessing pubmed webpage you can try to connect via other API, dedicated to retrieving pubmed data, like RESTful Web Service: http://europepmc.org/RestfulWebService. It allows you to get all the data you need in xml format and I think that there is no limit for number of queries.

For instance if you want to get all the information about article with pubmed_id=9481671, you need to access webpage: http://www.ebi.ac.uk/europepmc/webservices/rest/search/resulttype=core&query=ext_id:9481671.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top