Question

I have the following xml code

<?xml version="1.0"?>
<!DOCTYPE pathway SYSTEM "http://www.kegg.jp/kegg/xml/KGML_v0.7.1_.dtd">
<!-- Creation date: Aug 26, 2013 10:02:03 +0900 (GMT+09:00) -->
<pathway name="path:ko01200" >
    <reaction id="14" name="rn:R01845" type="irreversible">
        <substrate id="108" name="cpd:C00447"/>
        <product id="109" name="cpd:C05382"/>
     </reaction>
    <reaction id="15" name="rn:R01641" type="reversible">
        <substrate id="109" name="cpd:C05382"/>
        <substrate id="104" name="cpd:C00118"/>
        <product id="110" name="cpd:C00117"/>
        <product id="112" name="cpd:C00231"/>
     </reaction>
</pathway>

I am trying to print the substrate id and product id with following code which I am stuck for the one that have more than one ID. Tried to use dumper to see the data structure but I don't know how to proceed. I have already used XML simple for the rest of my parsing script (this part is a small part of my whole script ) and I can not change that now

use strict;
use warnings;
use XML::Simple;
use Data::Dumper;
my $xml=new XML::Simple;
my $data=$xml->XMLin("test.xml",KeyAttr => ['id']);
print Dumper($data);
    foreach my $reaction ( sort  keys %{$data->{reaction}} ) {
        print $data->{reaction}->{$reaction}->{substrate}->{id}."\n"; 
        print $data->{reaction}->{$reaction}->{product}->{id}."\n";  

}

Here is the output

$VAR1 = {
      'name' => 'path:ko01200',
      'reaction' => {
                    '15' => {
                            'substrate' => {
                                           '104' => {
                                                    'name' => 'cpd:C00118'
                                                  },
                                           '109' => {
                                                    'name' => 'cpd:C05382'
                                                  }
                                         },
                            'name' => 'rn:R01641',
                            'type' => 'reversible',
                            'product' => {
                                         '112' => {
                                                  'name' => 'cpd:C00231'
                                                },
                                         '110' => {
                                                  'name' => 'cpd:C00117'
                                                }
                                       }
                          },
                    '14' => {
                            'substrate' => {
                                           'name' => 'cpd:C00447',
                                           'id' => '108'
                                         },
                            'name' => 'rn:R01845',
                            'type' => 'irreversible',
                            'product' => {
                                         'name' => 'cpd:C05382',
                                         'id' => '109'
                                       }
                          }
                  }
    };
 108
109
Use of uninitialized value in concatenation (.) or string at  line 12.
Use of uninitialized value in concatenation (.) or string at line 13.
Était-ce utile?

La solution

First of all, don't use XML::Simple. it is hard to predict what exact data structure it will produce from a bit of XML, and it's own documentation mentions it is deprecated.

Anyway, your problem is that you want to access an id field in the product and substrate subhashes – but they don't exist in one of the reaction subhashes

'15' => {
    'substrate' => {
         '104' => {
             'name' => 'cpd:C00118'
         },
         '109' => {
             'name' => 'cpd:C05382'
         }
     },
     'name' => 'rn:R01641',
     'type' => 'reversible',
     'product' => {
         '112' => {
             'name' => 'cpd:C00231'
         },
         '110' => {
             'name' => 'cpd:C00117'
         }
     }
 },

Instead, the keys are numbers, and each value is a hash containing a name. The other reaction has a totally different structure, so special-case code would have been written for both. This is why XML::Simple shouldn't be used – the output is just to unpredictable.

Enter XML::LibXML. It is not extraordinary, but it implememts standard APIs like the DOM and XPath to traverse your XML document.

use XML::LibXML;
use feature 'say'; # assuming perl 5.010

my $doc = XML::LibXML->load_xml(file => "test.xml") or die;

for my $reaction_item ($doc->findnodes('//reaction/product | //reaction/substrate')) {
  say $reaction_item->getAttribute('id');
}

Output:

108
109
109
104
110
112
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top