Question

I wrote the following SGML DTD:

<!DOCTYPE tvguide[
    <!ELEMENT tvguide--(date,channel+)>
    <!ELEMENT date--(#PCDATA)>
    <!ELEMENT channel--(channel_name,format?,program*)>
    <!ELEMENT channel--(#PCDATA)>
    <!ATTLIST channel teletext (yes|no) "no">
    <!ELEMENT format--(#PCDATA)>
    <!ELEMENT program--(name,start_time,(end_time|duration))>
    <!ATTLIST program
        min_age CDATA #REQUIRED
        lang CDATA #IMPLIED es>
    <!ELEMENT name--(#PCDATA)>
    <!ELEMENT start_time--(#PCDATA)>
    <!ELEMENT end_time--(#PCDATA)>
    <!ELEMENT duration--(#PCDATA)>]>

Which tool should I use to check if there are any syntax errors and where, and if it's a valid SGML DTD?

Which tool should I use to validate files using this DTD? I would prefer a program for windows but a linux binary or a library written in PHP, C, C++, Java or Javascript also would be fine.

Was it helpful?

Solution

Take a look at SP from James Clark. I use OmniMark to validate SGML, but I don't think you can find copies anymore.

You should get errors about tag minimization (you need spaces before/after/between the --). You should also get errors about the element channel being declared twice and an error about the "es" in the lang attribute declaration for program.

Here's a valid version for reference:

 <!DOCTYPE tvguide [
 <!ELEMENT tvguide - - (date,channel+)>
 <!ELEMENT date - - (#PCDATA)>
 <!ELEMENT channel - - (channel_name,format?,program*)>
 <!ATTLIST channel teletext (yes|no) "no">
 <!ELEMENT format - - (#PCDATA)>
 <!ELEMENT program - - (name,start_time,(end_time|duration))>
 <!ATTLIST program
      min_age CDATA #REQUIRED
      lang CDATA "es">
 <!ELEMENT name - - (#PCDATA)>
 <!ELEMENT start_time - - (#PCDATA)>
 <!ELEMENT end_time - - (#PCDATA)>
 <!ELEMENT duration - - (#PCDATA)>
 ]>

OTHER TIPS

Note that "valid SGML DTD" is technically a bit ambiguous: it depends with respect to which "SGML declaration" is used, which is where things are specified like the maximum length of a name or even which characters can occur in a name. That's the "concrete syntax" used in a particular DTD (and hence in a particular SGML document).

The "default" syntax for SGML is called the "reference concrete syntax" and is defined in ISO 8879:1986. Because in this syntax the maximum length of names (the so-called "NAMELEN quantity") is set to 8 (eight), and LOW LINE (_) can't be used in names (it is not a so-called name character there), your DTD would not be valid with respect to the reference concrete syntax.

HTML—for example—uses its own SGML declaration: it hugely increases the NAMELEN quantity and adds _ to the name characters, among other changes. With respect to this concrete syntax, your DTD would indeed be syntactically valid.


But then there is no element declaration for channel_name, but the DTD requires at least one such element to be present in tvguide (namely, contained in the required element channel). [Leaving out a declaration for an element type that does not occur in the document is, by itself, not an error or a problem.]

So the DTD is not (yet) "valid" in the sense that you can't write any document elements (ie, tvguide elements, or simply "documents") that are valid according to it.


Adding a straightforward declaration for channel_name like

<!ELEMENT channel_name (#PCDATA)>

remedies that—now for example the document element

<tvguide>
  <date>2016</date>
  <channel><channel_name>XTV</channel_name></channel>
</tvguide>

is valid according to your DTD. (I tried it out using the SP parser, mentioned in the other answer.)


Simplifying the names in your DTD would make the whole thing a valid "Basic SGML Document", and even a valid "Minimal SGML Document"—these terms (again from ISO 8879) come closest to the notion of "valid SGML" when no specific context and SGML declaration is given: they basically mean "portable and acceptable to any SGML system". Here is my proposed version:

<!DOCTYPE tvguide [
 <!ELEMENT tvguide  - - (date,channel+)>
 <!ELEMENT date     - - (#PCDATA)>

 <!ELEMENT channel  - - (ch-name,format?,program*)>
 <!ATTLIST channel
           teletext (yes|no) no>

 <!ELEMENT format   - - (#PCDATA)>
 <!ELEMENT ch-name  - - (#PCDATA)>

 <!ELEMENT program  - - (name,start-tm,(end-tm|duration))>
 <!ATTLIST program
           min-age  CDATA    #REQUIRED
           lang     CDATA    "es">

 <!ELEMENT name     - - (#PCDATA)>
 <!ELEMENT start-tm - - (#PCDATA)>
 <!ELEMENT end-tm   - - (#PCDATA)>
 <!ELEMENT duration - - (#PCDATA)>
]>
<tvguide>
 <date>2016-12-11</date>
 <channel><ch-name>X-TV 4</ch-name></channel>
</tvguide>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top