Question

I know the purpose of DOCTYPE (and what each url/identifier on the line is) as far as web standards and page validation goes, but I am unsure about what it actually "is" in the context of an XML document.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
  <head>
    <title>My Page</title>
  </head>
  <body>
    <p>Hello</p>
  </body>
</html>

Is it part of the actual XML document structure, or is it some kind of comment-like "hint" that is noted then stripped?

What is the significance of the "!" before the name? Does this denote a special type of "element"? What are they called?

The example I posted is XHTML for the web, but is DOCTYPE also used in general purpose XML documents?

Was it helpful?

Solution

DOCTYPE has been "inherited" from SGML (it was supposed to point to DTD file that explains how to parse the file), however self-explanatory XML syntax and namespaces made it largely irrelevant. The only real use for DOCTYPE/DTD in XML is to define allowed named entities (e.g. &nbsp;).

XML spec even allows "non-validating" parsers that ignore DTD file completely (web browsers use such parsers, unless you've fallen into the text/html trap in which case XML parser is not used at all).

DTD is quite poor for purpose of validation (hard to specify rules for more than one level of nesting, no way to specify types of attributes beyond few predefined types). Schema, RelaxNG can be far more precise.

DTD doesn't fully suppport namespaces either, which leads to ridiculous workarounds like XHTMLplusMathMLplusSVG DOCTYPE.

In web browsers certain DOCTYPEs have desirable side-effect of triggering standards-compliant rendering mode. This is more of a hack than intended use DOCTYPEs.

  • If you're using real XHTML (application/xhtml+xml – the one that doesn't open in IE at all), then don't use DOCTYPE at all (that's recommendation from XHTML 5). XML mode will trigger standards-compliant rendering regardless of DOCTYPE.

  • If you're using text/html mode, then use <!DOCTYPE html>. That's HTML 5 DOCTYPE and it's a shortest one that triggers best possible rendering in all browsers. Browsers don't use DOCTYPE for any other purpose, so you're not missing out on anything.

  • If you're processing XHTML files with XML parsers (outside browsers), then please don't forget to set up DTD Catalog properly, otherwise your parser may be DoS-ing w3.org trying to fetch DTD every time. If you can't use DTD catalog, then disable "externals" in the parser or omit DOCTYPE and don't use named entities (i.e. use &#160; rather than &nbsp;)

OTHER TIPS

DOCTYPE is part of the XML specification (see the relevant subsection here) and can include either a link to a DTD, "internal" DTD declarations, or both. Many "modern" uses of XML don't use a DOCTYPE at all, though - as porneL mentions, both XML Schema and RelaxNG are more powerful ways to specify a document's syntax. See this Tim Bray blog post for a bit more background.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top