Pregunta

I'm trying to build an array of ID's from an XML output generated by PubMed's Eutils.

Here is the code on GitHub. And below is the specific subroutine.

What's the best way to go about this?

getUID($query);

sub getUID {

  # First, build the Eutils query
  my $utils = 'http://eutils.ncbi.nlm.nih.gov/entrez/eutils'; # Base URL for searches
  my $db = 'pubmed'; # Default to PubMed database; this may be changed.
  my $retmax = 10; # Get 10 results from Eutils

  my $esearch = $utils . '/esearch.fcgi?db=' . $db . '&retmax=' . $retmax . '&term=';

  my $esearch_result = get( $esearch . $query ); # Downloads the XML

  # Second, extract the UIDs
  $esearch_result =~ m(<Id>*</Id>);      

  print $esearch_result; # This should return a list of ID's (numbers), but doesn't.

}

Here is what the PubMed XML result looks like:

<?xml version="1.0" ?>
<!DOCTYPE eSearchResult PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" "http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd">
<eSearchResult><Count>2768671</Count><RetMax>10</RetMax><RetStart>0</RetStart><IdList>
<Id>23682407</Id>
<Id>23682406</Id>
<Id>23682388</Id>
<Id>23682359</Id>
<Id>23682336</Id>
<Id>23682331</Id>
<Id>23682325</Id>
<Id>23682320</Id>
<Id>23682315</Id>
<Id>23682311</Id>
</IdList><TranslationSet><Translation>     <From>cancer</From>     <To>"neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All Fields]</To>    </Translation></TranslationSet><TranslationStack>   <TermSet>    <Term>"neoplasms"[MeSH Terms]</Term>    <Field>MeSH Terms</Field>    <Count>2430901</Count>    <Explode>Y</Explode>   </TermSet>   <TermSet>    <Term>"neoplasms"[All Fields]</Term>    <Field>All Fields</Field>    <Count>1920766</Count>    <Explode>Y</Explode>   </TermSet>   <OP>OR</OP>   <TermSet>    <Term>"cancer"[All Fields]</Term>    <Field>All Fields</Field>    <Count>1192293</Count>    <Explode>Y</Explode>   </TermSet>   <OP>OR</OP>   <OP>GROUP</OP>  </TranslationStack><QueryTranslation>"neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All Fields]</QueryTranslation></eSearchResult>
¿Fue útil?

Solución

If you want the match to return a string, you have to add capturing parentheses. If there are several matches, use the g option. Store the result in an array:

 my @matches = $esearch_result =~ m(<Id>(.*)</Id>)g;
 print "$_\n" for @matches;

Otros consejos

You may have a reason for wanting to manually use eutils this way but I wanted to at least make you aware there are easier ways. For these tasks, I use the Bio::DB::EUtilities module in BioPerl because it makes this sort of thing much easier and saves time (there is a section in the EUtilities Cookbook that shows what information is available from PubMed). Also, there is the recently updated Bio::Biblio module with a number of methods to access PubMed records.

Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top