Correct syntax for parsing an SGML to XML file using Perl?
Pregunta
I'm a Perl newbie attempting to read an SGML file, parse it then convert it to XML so I can get the key/value pairs of all the elements. I found the SGML::DTDParse and XML::Simple modules as I think this is what I want for the task. My problem is I can't find any documentation on DTDParse or any code examples.
My code is below:
# use modules
use SGML::DTDParse;
use XML::Simple;
use Data::Dumper;
use warnings;
use strict;
my $xml;
my $data;
my $convert;
$/ = undef;
open FILE, "C:/..." or die $!;
my $file = <FILE>;
# Convert the DTD file to XML
dtdParse $file;
# Create the XML object
$xml = new XML::Simple;
# Read the XML file
$data = $xml->XMLin($file);
# print the output
print Dumper($data);
I get an error with the dtdParse $file line as follows: Can't call method "dtdParse" without a package or object reference at "my script name"
Any ideas as to the proper syntax here and is this a valid approach for the task?
I reworked the code the code again and was able to do the dtd parsing with this:
$dtd = SGML::DTDParse::DTD->new();
$dtd->parse($file);
print $dtd;
I don't believe the parsed file can be considered xml though, so maybe the correct way to get all the elements from the parsed file is a for loop.
Solución
There is no function dtdParse.
dtdparse is a program coming with the SGML::DTDParse module.
You can use it to dump xml from a dtd file. A quick example how you could use dtdparse:
use strict;
use warnings;
use SGML::DTDParse;
use XML::Simple;
use Data::Dumper;
# Convert the DTD file to XML
my $result = qx{dtdparse test.dtd};
# Create the XML object
my $xml = new XML::Simple;
# Read the XML file
$result = $xml->XMLin($result);
# print the output
$Data::Dumper::Indent = 1;
print Dumper($result);
where test.dtd looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT DatabaseInventory (DatabaseName+)>
<!ELEMENT DatabaseName ( GlobalDatabaseName
, OracleSID
, DatabaseDomain
, Administrator+
, DatabaseAttributes
, Comments)
>
<!ELEMENT GlobalDatabaseName (#PCDATA)>
<!ELEMENT OracleSID (#PCDATA)>
<!ELEMENT DatabaseDomain (#PCDATA)>
<!ELEMENT Administrator (#PCDATA)>
<!ELEMENT DatabaseAttributes EMPTY>
<!ELEMENT Comments (#PCDATA)>
<!ATTLIST Administrator EmailAlias CDATA #REQUIRED>
<!ATTLIST Administrator Extension CDATA #IMPLIED>
<!ATTLIST DatabaseAttributes Type (Production|Development|Testing) #REQUIRED>
<!ATTLIST DatabaseAttributes Version (7|8|8i|9i) "9i">
<!ENTITY AUTHOR "Jeffrey Hunter">
<!ENTITY WEB "www.iDevelopment.info">
<!ENTITY EMAIL "jhunter@iDevelopment.info">
Which will output something like this:
$VAR1 = {
'namecase-entity' => '0',
'created-by' => 'DTDParse V2.00',
'public-id' => '',
'version' => '1.0',
'attlist' => {
'DatabaseAttributes' => {
'attribute' => {
'Type' => {
'value' => 'Production Development Testing',
'type' => '#REQUIRED',
'default' => '',
'enumeration' => 'yes'
},
'Version' => {
'value' => '7 8 8i 9i',
'type' => '',
'default' => '9i',
'enumeration' => 'yes'
}
},
'attdecl' => ' Type (Production|Development|Testing) #REQUIRED'
},
'Administrator' => {
'attribute' => {
'EmailAlias' => {
'value' => 'CDATA',
'type' => '#REQUIRED',
'default' => ''
},
'Extension' => {
'value' => 'CDATA',
'type' => '#IMPLIED',
'default' => ''
}
},
'attdecl' => ' EmailAlias CDATA #REQUIRED'
}
},
'element' => {
'OracleSID' => {
'content-type' => 'mixed',
'content-model-expanded' => {
'sequence-group' => {
'pcdata' => {}
}
},
'content-model' => {
'sequence-group' => {
'pcdata' => {}
}
}
},
'Comments' => {
'content-type' => 'mixed',
'content-model-expanded' => {
'sequence-group' => {
'pcdata' => {}
}
},
'content-model' => {
'sequence-group' => {
'pcdata' => {}
}
}
},
'DatabaseAttributes' => {
'content-type' => 'element',
'content-model-expanded' => {
'empty' => {}
},
'content-model' => {
'empty' => {}
}
},
'GlobalDatabaseName' => {
'content-type' => 'mixed',
'content-model-expanded' => {
'sequence-group' => {
'pcdata' => {}
}
},
'content-model' => {
'sequence-group' => {
'pcdata' => {}
}
}
},
'Administrator' => {
'content-type' => 'mixed',
'content-model-expanded' => {
'sequence-group' => {
'pcdata' => {}
}
},
'content-model' => {
'sequence-group' => {
'pcdata' => {}
}
}
},
'DatabaseInventory' => {
'content-type' => 'element',
'content-model-expanded' => {
'sequence-group' => {
'element-name' => {
'occurrence' => '+',
'name' => 'DatabaseName'
}
}
},
'content-model' => {
'sequence-group' => {
'element-name' => {
'occurrence' => '+',
'name' => 'DatabaseName'
}
}
}
},
'DatabaseDomain' => {
'content-type' => 'mixed',
'content-model-expanded' => {
'sequence-group' => {
'pcdata' => {}
}
},
'content-model' => {
'sequence-group' => {
'pcdata' => {}
}
}
},
'DatabaseName' => {
'content-type' => 'element',
'content-model-expanded' => {
'sequence-group' => {
'element-name' => {
'Comments' => {},
'OracleSID' => {},
'DatabaseAttributes' => {},
'DatabaseDomain' => {},
'GlobalDatabaseName' => {},
'Administrator' => {
'occurrence' => '+'
}
}
}
},
'content-model' => {
'sequence-group' => {
'element-name' => {
'Comments' => {},
'OracleSID' => {},
'DatabaseAttributes' => {},
'DatabaseDomain' => {},
'GlobalDatabaseName' => {},
'Administrator' => {
'occurrence' => '+'
}
}
}
}
}
},
'entity' => {
'WEB' => {
'text-expanded' => 'www.iDevelopment.info',
'text' => 'www.iDevelopment.info',
'type' => 'gen'
},
'AUTHOR' => {
'text-expanded' => 'Jeffrey Hunter',
'text' => 'Jeffrey Hunter',
'type' => 'gen'
},
'EMAIL' => {
'text-expanded' => 'jhunter@iDevelopment.info',
'text' => 'jhunter@iDevelopment.info',
'type' => 'gen'
}
},
'system-id' => 'test.dtd',
'unexpanded' => '1',
'created-on' => 'Tue Feb 28 00:44:52 2012',
'declaration' => '',
'xml' => '0',
'title' => '?untitled?',
'namecase-general' => '1'
};
Otros consejos
dtdparse
isn't a Perl function; it's a script for processing an SGML DTD from the command line. The documentation for the script is here.
Since you want to do the parsing in your own Perl script, you can use the source of dtdparse
as an example if you like.
For SGML, use James Clark's SP, which includes an SGML to XML converter called SX. This is a professional system, and it does have documentation. If you need Perl in there, use system
or open
to call SP/SX as an external program.