Access abstract from pubmed using Bio::DB::EUtilities

Question

If you are looking for an object method like $factory->get_abstract, it does not exist. Using esummary will tell you if the entry has an abstract. For example,

#!/usr/bin/env perl

use 5.010;
use strict;
use warnings;
use Bio::DB::EUtilities;

my @ids = (23298400);
my $factory = Bio::DB::EUtilities->new(-eutil   => 'esummary',
                                       -email   => 'mymail@foo.bar',
                                       -db      => 'pubmed',
                                       -retmode => 'xml',
                                       -id      => \@ids);

while (my $doc = $factory->next_DocSum) {
    while (my $item = $doc->next_Item('flattened')) {
        if ($item->get_name eq 'HasAbstract') {
            printf("%-20s: %s\n",$item->get_name,$item->get_content) if $item->get_content;
        }
    }
}

This just prints, HasAbstract : 1. If you want to get the abstract, there are a couple of options. One would be to use efetch to return the xml and you could store the content instead of writing to a file with my $xml = $factory->get_Response->content and then look for the "Abstract" nodes therein.

#!/usr/bin/env perl                                                                                                                                                

use 5.010;
use utf8;
use strict;
use warnings;
use Bio::DB::EUtilities;
use XML::LibXML;

my @ids = (23298400);
my $factory = Bio::DB::EUtilities->new(-eutil   => 'efetch',
                                       -email   => 'mymail@foo.bar',
                                       -db      => 'pubmed',
                                       -retmode => 'xml',
                                       -id      => \@ids);

my $xml = $factory->get_Response->content;

my $xml_parser = XML::LibXML->new();
my $dom = $xml_parser->parse_string($xml);
my $root = $dom->documentElement();

for my $node ($root->findnodes('//*[text()]')) {
    my $name = $node->nodeName();
    if ($name eq 'Abstract') {
        for my $child ($node->findnodes('*')) {
            binmode STDOUT, ":utf8";
            say $child->textContent();
        }
    }
}

This code prints the abstract (this is the same answer I provided on biostars but included it here for completeness). Another option would be to use just use curl in a Bash script, or use LWP::UserAgent in a Perl script to form the query yourself. If you take a look at the guidelines for EFetch you can see that it is possible to set the retmode to "text" and rettype to "abstract". Also, under the "Examples" section there are few examples of how to form a query with PMIDs to get only text of the abstract.

The BioPerl methods will give you access to a lot more information, but you may have to do a little parsing (or reading up on the APIs) on your own. Alternatively, you could fetch just the abstracts if that is what you are interested in, but that approach is more limited in that you are only getting the abstract, not other information associated with the publication.