Question

I'm a Perl newbie attempting to read an SGML file, parse it then convert it to XML so I can get the key/value pairs of all the elements. I found the SGML::DTDParse and XML::Simple modules as I think this is what I want for the task. My problem is I can't find any documentation on DTDParse or any code examples.

My code is below:

# use modules
use SGML::DTDParse;
use XML::Simple;
use Data::Dumper;

use warnings;
use strict;

my $xml;
my $data;
my $convert;

$/ = undef;
open FILE, "C:/..." or die $!;
my $file = <FILE>;

# Convert the DTD file to XML
dtdParse $file;

# Create the XML object
$xml = new XML::Simple;

# Read the XML file
$data = $xml->XMLin($file);

# print the output
print Dumper($data);

I get an error with the dtdParse $file line as follows: Can't call method "dtdParse" without a package or object reference at "my script name"

Any ideas as to the proper syntax here and is this a valid approach for the task?

I reworked the code the code again and was able to do the dtd parsing with this:

$dtd = SGML::DTDParse::DTD->new();
$dtd->parse($file);
print $dtd;

I don't believe the parsed file can be considered xml though, so maybe the correct way to get all the elements from the parsed file is a for loop.

Was it helpful?

Solution

There is no function dtdParse.

dtdparse is a program coming with the SGML::DTDParse module.

You can use it to dump xml from a dtd file. A quick example how you could use dtdparse:

use strict;
use warnings;

use SGML::DTDParse;
use XML::Simple;
use Data::Dumper;

# Convert the DTD file to XML
my $result = qx{dtdparse test.dtd};

# Create the XML object
my $xml = new XML::Simple;

# Read the XML file
$result = $xml->XMLin($result);

# print the output
$Data::Dumper::Indent = 1;
print Dumper($result);

where test.dtd looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!ELEMENT DatabaseInventory (DatabaseName+)>
<!ELEMENT DatabaseName (   GlobalDatabaseName
                         , OracleSID
                         , DatabaseDomain
                         , Administrator+
                         , DatabaseAttributes
                         , Comments)
>
<!ELEMENT GlobalDatabaseName (#PCDATA)>
<!ELEMENT OracleSID          (#PCDATA)>
<!ELEMENT DatabaseDomain     (#PCDATA)>
<!ELEMENT Administrator      (#PCDATA)>
<!ELEMENT DatabaseAttributes EMPTY>
<!ELEMENT Comments           (#PCDATA)>

<!ATTLIST Administrator       EmailAlias CDATA #REQUIRED>
<!ATTLIST Administrator       Extension  CDATA #IMPLIED>
<!ATTLIST DatabaseAttributes  Type       (Production|Development|Testing) #REQUIRED>
<!ATTLIST DatabaseAttributes  Version    (7|8|8i|9i) "9i">

<!ENTITY AUTHOR "Jeffrey Hunter">
<!ENTITY WEB    "www.iDevelopment.info">
<!ENTITY EMAIL  "jhunter@iDevelopment.info">

Which will output something like this:

$VAR1 = {
  'namecase-entity' => '0',
  'created-by' => 'DTDParse V2.00',
  'public-id' => '',
  'version' => '1.0',
  'attlist' => {
    'DatabaseAttributes' => {
      'attribute' => {
        'Type' => {
          'value' => 'Production Development Testing',
          'type' => '#REQUIRED',
          'default' => '',
          'enumeration' => 'yes'
        },
        'Version' => {
          'value' => '7 8 8i 9i',
          'type' => '',
          'default' => '9i',
          'enumeration' => 'yes'
        }
      },
      'attdecl' => '  Type       (Production|Development|Testing) #REQUIRED'
    },
    'Administrator' => {
      'attribute' => {
        'EmailAlias' => {
          'value' => 'CDATA',
          'type' => '#REQUIRED',
          'default' => ''
        },
        'Extension' => {
          'value' => 'CDATA',
          'type' => '#IMPLIED',
          'default' => ''
        }
      },
      'attdecl' => '       EmailAlias CDATA #REQUIRED'
    }
  },
  'element' => {
    'OracleSID' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'Comments' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'DatabaseAttributes' => {
      'content-type' => 'element',
      'content-model-expanded' => {
        'empty' => {}
      },
      'content-model' => {
        'empty' => {}
      }
    },
    'GlobalDatabaseName' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'Administrator' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'DatabaseInventory' => {
      'content-type' => 'element',
      'content-model-expanded' => {
        'sequence-group' => {
          'element-name' => {
            'occurrence' => '+',
            'name' => 'DatabaseName'
          }
        }
      },
      'content-model' => {
        'sequence-group' => {
          'element-name' => {
            'occurrence' => '+',
            'name' => 'DatabaseName'
          }
        }
      }
    },
    'DatabaseDomain' => {
      'content-type' => 'mixed',
      'content-model-expanded' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      },
      'content-model' => {
        'sequence-group' => {
          'pcdata' => {}
        }
      }
    },
    'DatabaseName' => {
      'content-type' => 'element',
      'content-model-expanded' => {
        'sequence-group' => {
          'element-name' => {
            'Comments' => {},
            'OracleSID' => {},
            'DatabaseAttributes' => {},
            'DatabaseDomain' => {},
            'GlobalDatabaseName' => {},
            'Administrator' => {
              'occurrence' => '+'
            }
          }
        }
      },
      'content-model' => {
        'sequence-group' => {
          'element-name' => {
            'Comments' => {},
            'OracleSID' => {},
            'DatabaseAttributes' => {},
            'DatabaseDomain' => {},
            'GlobalDatabaseName' => {},
            'Administrator' => {
              'occurrence' => '+'
            }
          }
        }
      }
    }
  },
  'entity' => {
    'WEB' => {
      'text-expanded' => 'www.iDevelopment.info',
      'text' => 'www.iDevelopment.info',
      'type' => 'gen'
    },
    'AUTHOR' => {
      'text-expanded' => 'Jeffrey Hunter',
      'text' => 'Jeffrey Hunter',
      'type' => 'gen'
    },
    'EMAIL' => {
      'text-expanded' => 'jhunter@iDevelopment.info',
      'text' => 'jhunter@iDevelopment.info',
      'type' => 'gen'
    }
  },
  'system-id' => 'test.dtd',
  'unexpanded' => '1',
  'created-on' => 'Tue Feb 28 00:44:52 2012',
  'declaration' => '',
  'xml' => '0',
  'title' => '?untitled?',
  'namecase-general' => '1'
};

OTHER TIPS

dtdparse isn't a Perl function; it's a script for processing an SGML DTD from the command line. The documentation for the script is here.

Since you want to do the parsing in your own Perl script, you can use the source of dtdparse as an example if you like.

For SGML, use James Clark's SP, which includes an SGML to XML converter called SX. This is a professional system, and it does have documentation. If you need Perl in there, use system or open to call SP/SX as an external program.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top